Skip to Content

IBM’s Bamba 9B v2: Redefining Fast, Open Source AI

Get All The Latest Research & News!

Thanks for registering!

IBM AI Platform and Bamba 9B v2: A Leap Forward in Fast, Open Source AI

Breaking New Ground in Open Source AI

If you’re looking for a powerful, efficient, and open-source language model, IBM and its collaborators have just raised the bar. The new Bamba 9B v2 model, jointly developed by IBM, Princeton, CMU, and UIUC, brings impressive speed and performance to the AI landscape—outpacing industry benchmarks, all while being transparent and reproducible.

Key Highlights: What Makes Bamba 9B v2 Stand Out

  • Performance: Bamba 9B v2 surpasses Meta’s Llama 3.1 8B on both L1 and L2 leaderboard scores, despite using only a fifth of the training data.
  • Speed: Thanks to its Mamba2-based architecture, inference is 2–2.5x faster than comparable transformer models, with even more improvements anticipated through ongoing vLLM integration.
  • Open Data and Reproducibility: The team is committed to fully open datasets and reproducible results, fostering further research and development.

Benchmark Results: How Does Bamba 9B v2 Compare?

  • On the HF OpenLLM v1 benchmarks, Bamba 9B v2 delivers standout results in tasks like Hellaswag (83.85), OpenbookQA (51.0), and Piqa (83.62), outperforming Llama 3.1 8B and closely rivalling larger models from other leading labs.
  • In OpenLLM v2 benchmarks, while Bamba 9B v2 trails in some advanced categories, it shows competitive performance and substantial promise, especially given its efficient training process.

Innovative Training Recipe: Efficient and Effective

The Bamba 9B v2 model was designed with efficiency in mind, leveraging only 192 A100 GPUs and a thoughtful blend of training data and techniques:
  • Started with the 2T-token base checkpoint (Bamba 9B v1).
  • Integrated Olmo Mix data for an additional 0.5T tokens, using a constant learning rate.
  • Further training on 0.5T tokens from synthetic and curated datasets, employing both constant and cosine learning rates.
  • Final model “annealing” leveraged high-quality data for 100B tokens, followed by weighted model merging using MergeKit.
Notable Data Sources:
  • Dolma, OlmoMix, DolminoMix (AllenAI)
  • FineWeb and SmolLM corpus (Hugging Face)
  • Nemotron-CC (NVIDIA)

vLLM Integration: Supercharging Inference

IBM is actively collaborating with the vLLM community to enhance support for the Mamba2 architecture. Key improvements in the pipeline:
  • Advanced KV-cache management tailored for hybrid models like Mamba2.
  • Development of chunked prefill kernels for significant workload boosts.
  • Optimized decode kernels for AMD GPUs and reduced token generation latency.
With these, the team expects to achieve 4–5x faster inference compared to transformer-based models in certain scenarios.

Instruction Tuning and Future Directions

The team experimented with Tulu v3 data and Open Instruct recipes, noting significant performance gains—L1 average of 64.69 and L2 average of 24.68. They are now working to release an instruction-following model using unrestricted datasets.

Community Contributions and Open Collaboration

Bamba 9B v2 is the result of extensive collaboration and open-source ethos. The IBM team thanks contributors from AllenAI, Hugging Face, NVIDIA, and academic partners, as well as the broader community for enabling model training, dataset curation, and evaluation. Call to action: The team invites the community to:
  • Test model scaling and generalization

IBM’s Bamba 9B v2: Redefining Fast, Open Source AI
Joshua Berkowitz May 9, 2025
Share this post
Tags
Sign in to leave a comment
Mistral AI’s Le Chat: A European Challenger Enters the Chatbot Race