Skip to Content

SmolLM3: Small Language Models with Multilingual Reasoning and Transparency

Get All The Latest Research & News!

Thanks for registering!

SmolLM3, engineered by Hugging Face, allows you to harness the power of advanced language reasoning, multilingual fluency, and massive context processing all from within a remarkably compact 3B-parameter model

Why SmolLM3 Stands Out

The surge in demand for efficient models hasn’t always matched the need for capability or openness. SmolLM3 closes the divide with a unique blend of performance, transparency, and versatility. 

It not only surpasses its 3B contemporaries like Llama-3.2-3B and Qwen2.5-3B but often rivals larger 4B models, giving users robust reasoning and multilingual support, all in a fully open source package.

Key Features and Innovations

  • Compact Powerhouse: 3 billion parameters, trained with 11 trillion tokens, showing competitive results across industry benchmarks.

  • Dual-Mode Reasoning: Effortlessly switch between detailed reasoning (/think) and direct answers (/no_think).

  • Multilingual Expertise: Natively supports English, French, Spanish, German, Italian, and Portuguese.

  • Long-Context Mastery: Processes up to 128k tokens, thanks to advanced NoPE and YaRN techniques.

  • Transparent Blueprint: Every detail, from architecture and data to training recipes, is openly shared.

Engineering Blueprint: Architecture and Training

SmolLM3’s architecture builds on Llama, with several efficiency upgrades:

  • Grouped Query Attention (GQA): Lowers inference memory without compromising quality.

  • No Positional Embedding (NoPE): Selective rotary position embedding removal enhances long-context performance.

  • Intra-Document Masking: Prevents attention across documents during training, boosting stability. (Note* I could not find a good resource to explain Intra-Document Masking, which suggests either novelty or author error)

  • Stable Training: Incorporates best practices from OLMo 2 for reliable embedding norms.

Training ran for 24 days on 384 H100 GPUs, using open-source tools like nanotron, datatrove, and lighteval. A three-stage data blend, increasing the share of high-quality code and math, built robust general and domain-specific abilities.

Mid-Training: Enhancing Context and Reasoning

SmolLM3’s mid-training phases further elevate its capabilities:

  • Long Context Adaptation: Gradually extends context from 4k to 64k tokens, then extrapolates to 128k during inference using YaRN.

  • Reasoning Adaptation: Introduces 35B tokens from open reasoning datasets, structured via ChatML for optimal diversity and organization.

Post-Training: Dual Modes and Alignment

With an open, reproducible recipe for dual-mode models, SmolLM3 supports both explicit reasoning and concise answers. Users can toggle modes, utilize tool calling, and customize system prompts. Supervised finetuning balances datasets for each mode, while synthetic data fills gaps in reasoning traces.

Alignment leverages Anchored Preference Optimization (APO), an off-policy method, blending public and synthetic preference data. To restore any lost long-context skill post-alignment, model merging combines APO-aligned and mid-trained checkpoints to ensure optimal performance.

Real-World Performance

Across benchmarks like HellaSwag, ARC, MMLU, GSM8K, and HumanEval+, SmolLM3 not only outperforms all 3B peers but also matches or beats some 4B models. Its excellence spans knowledge, reasoning, math, coding, and multilingual tasks. Notably, reasoning mode shines in complex benchmarks, allowing users to balance speed with depth.

How to Deploy SmolLM3

Ready-to-use with Hugging Face Transformers (v4.53.0+), SmolLM3 offers GPU and CPU deployment options. Dual-mode functionality is easily accessed via /think or /no_think in prompts. It supports agentic tool calling, with recipes for both XML and Python integration, and comes with comprehensive documentation.

The Takeaway

SmolLM3 sets a new bar for small models—merging transparency, efficiency, and advanced reasoning in a multilingual, accessible framework. Hugging Face’s open approach empowers the community to adapt, extend, or innovate on SmolLM3’s foundation. This release marks a major leap toward democratizing high-performance language modeling.

Source: Hugging Face Blog - SmolLM3: smol, multilingual, long-context reasoner


SmolLM3: Small Language Models with Multilingual Reasoning and Transparency
Joshua Berkowitz September 13, 2025
Views 44
Share this post