Skip to Content

Sakana's AB-MCTS Unlocks AI Collective Intelligence at Inference Time

Can AI Models Collaborate Like Human Experts?

Get All The Latest Research & News!

Thanks for registering!

Sakana AI introduces AB-MCTS (Adaptive Branching Monte Carlo Tree Search), a cutting-edge algorithm that enables multiple frontier AI models to collaborate during inference. Rather than relying solely on bigger or better-trained models, this approach unlocks powerful synergies as models join forces, much like expert teams tackling a tough challenge.

Evolutionary Merging Inspires Collaborative Inference

Previous research focused on evolutionary model merging, where existing models are fused to create more capable ones. Sakana AI went a step further and asked: can we also "mix to use", combining models only during inference? 

AB-MCTS is their answer, drawing inspiration from nature’s collective intelligence. By orchestrating cooperation among large language models (LLMs) such as o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528, AB-MCTS achieved significant performance gains on the challenging ARC-AGI-2 benchmark, surpassing the capabilities of individual models.

Understanding Inference-Time Scaling

Inference-time scaling enhances AI performance not by retraining or growing models, but by allocating more resources, time, or strategic diversity when answering complex questions. 

Similar to humans brainstorming, trying varied approaches, or collaborating, AI can also extend its reasoning, iterate solutions, and experiment to tackle difficult problems. AB-MCTS stands out as an advanced technique that balances deep, sequential answer refinement with broad, parallel exploration, all while enabling multiple models to work together.

How AB-MCTS Balances Search Depth and Breadth

Traditional methods forced a choice: refine one answer repeatedly (depth) or generate many independent answers (width). AB-MCTS unifies these strategies. 

It adaptively determines when to intensify focus on a promising solution or branch out to explore new options. This is achieved by extending the Monte Carlo Tree Search algorithm, famously used in Google's AlphaGo, with adaptive branching and probabilistic selection (Thompson Sampling). The result is a more human-like, trial-and-error process that extracts superior answers without increasing computational cost.

Results of AB-MCTS and Multi-LLM AB-MCTS on ARC-AGI-2, showing the Pass@250 success rate. The result of using AB-MCTS with o4-mini surpassed Repeated Sampling with o4-mini (light gray bar). Furthermore, Multi-LLM AB-MCTS, which combines Gemini-2.5-Pro and DeepSeek-R1-0528, showed an improved score at Pass@250. Image Credit: Sakana

Multi-LLM AB-MCTS: Dynamic Model Selection in Action

Each AI model has unique strengths some excel at code, others at creative logic. Multi-LLM AB-MCTS introduces a third dimension: dynamically choosing which model to use for each inference step, guided by real-time performance feedback. 

Separate probability models for each LLM, updated via Thompson Sampling, help the system allocate resources intelligently, discovering and leveraging the best model for each subtask as the search unfolds.

Results of AB-MCTS and Multi-LLM AB-MCTS on ARC-AGI-2, showing Pass@k as a function of the number of LLM calls. Image Credit: Sakana

Performance on ARC-AGI-2: The Power of Collaboration

On the human-level reasoning benchmark ARC-AGI-2, the benefits of collaborative inference were clear. Where repeated sampling from a single LLM solved 23% of tasks, AB-MCTS improved this to 27.5%. 

When o4-mini, Gemini-2.5-Pro, and DeepSeek-R1-0528 collaborated via Multi-LLM AB-MCTS, success rates surpassed 30%. Notably, problems unsolved by any one model became tractable through dynamic teamwork, with the system shifting its reliance between models based on in-the-moment results.

TreeQuest: Tools for the Next Generation of AI Inference

To make these advancements accessible, Sakana AI launched TreeQuest: a flexible framework for implementing AB-MCTS and its multi-model extension. TreeQuest offers a robust API, checkpointing, and customizable scoring and model selection, empowering researchers and developers to harness inference-time scaling in their own tasks. 

While the results are promising, future work aims to narrow the gap between experimental metrics and real-world standards, refining reward models and candidate selection for even smarter AI collaboration.

Takeaway: Towards Dynamic, Smarter AI Collaboration

AB-MCTS and Multi-LLM AB-MCTS mark a major leap toward AI systems that collaborate dynamically, reason through trial and error, and scale inference for complex tasks. This work signals a shift beyond bigger models, tapping into collective intelligence at inference time. Sakana AI’s vision: AI systems as dynamic teams, ready to tackle challenges that once seemed out of reach.

Source: Sakana AI - Inference-Time Scaling and Collective Intelligence for Frontier AI

Sakana's AB-MCTS Unlocks AI Collective Intelligence at Inference Time
Joshua Berkowitz August 4, 2025
Share this post