Skip to Content

Improving LLM Accuracy: How SLED Leverages Every Model Layer for Factual Results

Google Research Introduces Self Logits Evolution Decoding (SLED)

Large language models (LLMs) have transformed how we interact with AI, but ensuring their outputs are consistently accurate remains a challenge. Hallucinations, confident but incorrect responses, often threaten the reliability of LLMs in practical deployments. 

Addressing this, Google Research has introduced Self Logits Evolution Decoding (SLED), a new technique that enhances factual accuracy by tapping into the model’s entire internal knowledge, all without extra data or training.

Factuality: The Core Challenge

LLMs can produce plausible-sounding but factually incorrect statements. This issue arises from training data gaps, vague prompts, or a tendency to overfit. While some solutions use retrieval-augmented generation to reference external facts, these approaches can add complexity and don’t always prevent errors.

The SLED Approach: Decoding with Depth

SLED takes a fundamentally different route by improving accuracy during the model’s decoding phase. Traditional LLMs rely solely on the final layer’s predictions, which may overlook key context from earlier stages. SLED innovates by aggregating outputs from all layers, generating a more nuanced and reliable prediction for each word.

  • Layer Utilization: SLED extracts prediction scores (logits) from every intermediate model layer, not just the final one.

  • Weighted Averaging: Each layer’s logits are transformed into probability distributions, then combined using a weighted system that highlights the most informative layers.

  • Factual Alignment: This process integrates the model’s broader understanding, reducing the chance of common but incorrect answers.

  • No Extra Data Needed: SLED requires no external databases or retraining, making it easy to implement with existing models.

Real-World Impact: Solving for Subtlety

In practice, SLED proves its worth on complex problems. For instance, in math word problems, where standard LLMs might overlook nuances like discounts or multiple steps, SLED detects these elements by analyzing early-layer insights, resulting in more correct answers. Its effectiveness extends to both multiple-choice and open-ended questions, consistently steering output toward factual responses, even when common misconceptions are popular.

Performance and Versatility

Google’s researchers applied SLED to LLMs like Gemma, GPT-OSS, and Mistral, spanning tasks from arithmetic to open-ended fact-checking using datasets such as FACTOR and TruthfulQA.

SLED uniformly boosted factual accuracy, sometimes achieving up to a 16% improvement over other advanced decoding strategies. Importantly, this leap in reliability came with minimal trade-offs, just a 4% increase in inference time compared to previous best methods.

  • Versatility: SLED works across diverse model architectures and sizes.
  • Compatibility: It can layer atop other accuracy-boosting techniques for cumulative benefits.

Looking Ahead: Broader Applications

SLED’s design allows for seamless adoption across open-source LLMs, eliminating the need for complex external integrations or retraining. Google’s team envisions extending SLED to domains like visual question answering, code completion, and long-form writing. Future work may also combine SLED with supervised fine-tuning for even greater task-specific accuracy.

Conclusion

By leveraging the full spectrum of internal model reasoning, SLED delivers a practical, flexible, and highly effective solution to LLM hallucinations. Its robust gains in factuality and ease of integration make it a strong candidate for improving next-generation AI systems.

Source: Google Research Blog

Improving LLM Accuracy: How SLED Leverages Every Model Layer for Factual Results
Joshua Berkowitz September 21, 2025
Views 77
Share this post