SmolLM3, engineered by Hugging Face, allows you to harness the power of advanced language reasoning, multilingual fluency, and massive context processing all from within a remarkably compact 3B-parameter model
Why SmolLM3 Stands Out
The surge in demand for efficient models hasn’t always matched the need for capability or openness. SmolLM3 closes the divide with a unique blend of performance, transparency, and versatility.
It not only surpasses its 3B contemporaries like Llama-3.2-3B and Qwen2.5-3B but often rivals larger 4B models, giving users robust reasoning and multilingual support, all in a fully open source package.
Key Features and Innovations
- Compact Powerhouse: 3 billion parameters, trained with 11 trillion tokens, showing competitive results across industry benchmarks.
- Dual-Mode Reasoning: Effortlessly switch between detailed reasoning (
/think
) and direct answers (/no_think
).- Multilingual Expertise: Natively supports English, French, Spanish, German, Italian, and Portuguese.
- Long-Context Mastery: Processes up to 128k tokens, thanks to advanced NoPE and YaRN techniques.
- Transparent Blueprint: Every detail, from architecture and data to training recipes, is openly shared.
Engineering Blueprint: Architecture and Training
SmolLM3’s architecture builds on Llama, with several efficiency upgrades:
- Grouped Query Attention (GQA): Lowers inference memory without compromising quality.
-
No Positional Embedding (NoPE): Selective rotary position embedding removal enhances long-context performance.
- Intra-Document Masking: Prevents attention across documents during training, boosting stability. (Note* I could not find a good resource to explain Intra-Document Masking, which suggests either novelty or author error)
- Stable Training: Incorporates best practices from OLMo 2 for reliable embedding norms.
Training ran for 24 days on 384 H100 GPUs, using open-source tools like nanotron, datatrove, and lighteval. A three-stage data blend, increasing the share of high-quality code and math, built robust general and domain-specific abilities.
Mid-Training: Enhancing Context and Reasoning
SmolLM3’s mid-training phases further elevate its capabilities:
- Long Context Adaptation: Gradually extends context from 4k to 64k tokens, then extrapolates to 128k during inference using YaRN.
- Reasoning Adaptation: Introduces 35B tokens from open reasoning datasets, structured via ChatML for optimal diversity and organization.
Post-Training: Dual Modes and Alignment
With an open, reproducible recipe for dual-mode models, SmolLM3 supports both explicit reasoning and concise answers. Users can toggle modes, utilize tool calling, and customize system prompts. Supervised finetuning balances datasets for each mode, while synthetic data fills gaps in reasoning traces.
Alignment leverages Anchored Preference Optimization (APO), an off-policy method, blending public and synthetic preference data. To restore any lost long-context skill post-alignment, model merging combines APO-aligned and mid-trained checkpoints to ensure optimal performance.
Real-World Performance
Across benchmarks like HellaSwag, ARC, MMLU, GSM8K, and HumanEval+, SmolLM3 not only outperforms all 3B peers but also matches or beats some 4B models. Its excellence spans knowledge, reasoning, math, coding, and multilingual tasks. Notably, reasoning mode shines in complex benchmarks, allowing users to balance speed with depth.
How to Deploy SmolLM3
Ready-to-use with Hugging Face Transformers (v4.53.0+), SmolLM3 offers GPU and CPU deployment options. Dual-mode functionality is easily accessed via /think
or /no_think
in prompts. It supports agentic tool calling, with recipes for both XML and Python integration, and comes with comprehensive documentation.
The Takeaway
SmolLM3 sets a new bar for small models—merging transparency, efficiency, and advanced reasoning in a multilingual, accessible framework. Hugging Face’s open approach empowers the community to adapt, extend, or innovate on SmolLM3’s foundation. This release marks a major leap toward democratizing high-performance language modeling.
Source: Hugging Face Blog - SmolLM3: smol, multilingual, long-context reasoner
SmolLM3: Small Language Models with Multilingual Reasoning and Transparency