Evaluating the behavior of advanced AI models is a growing challenge as systems become more capable and complex. Manual assessment methods can’t keep up with rapid model evolution, risking outdated benchmarks and mounting workloads for alignment researchers. Anthropic’s new open-source tool, Bloom, addresses these obstacles by automating the design, execution, and scoring of behavioral evaluations, making the process faster, scalable, and more reliable.
Image Credit: Anthropic
The Urgency of Fast, Adaptable Evaluations
Behavioral evaluations play a critical role in determining whether AI models behave as intended and align with human values. Traditional methods often fall short, they’re slow, labor-intensive, and can quickly become obsolete, especially as models learn from or leak evaluation data.
To solve this, Anthropic developed Bloom as a rapid, flexible alternative that builds on lessons from their earlier tool, Petri. Bloom empowers researchers to target specific behaviors and generate robust, repeatable metrics across various models with minimal manual effort.
Inside Bloom: A Four-Stage Automated Pipeline
What sets Bloom apart is its comprehensive four-stage automation framework:
- Understanding: The agent analyzes the behavior description and example interactions to clarify what’s being measured and why.
- Ideation: Bloom creates diverse, targeted scenarios, including roles, prompts, and environments, tailored to bring out the behavior of interest.
- Rollout: Multiple scenarios run in parallel, simulating user and tool interactions to elicit responses from the AI model.
- Judgment: An AI judge evaluates each outcome, assigning both detailed scores and summary metrics for the behavior in question.
This process lets Bloom generate a wide range of test cases, overcoming the static limitations of traditional test sets. Users can fine-tune every stage, choosing models, varying scenarios, and adjusting evaluation rules, while ensuring reproducibility through standardized configuration files.
Image Credit: Anthropic
Proven Performance: Benchmarking and Validation
Anthropic put Bloom to the test with four critical behavioral benchmarks: delusional sycophancy, instructed sabotage, self-preservation, and self-preferential bias. Evaluations spanned 16 frontier models and were completed in days, demonstrating Bloom’s speed and adaptability. Its elicitation rate metric (measuring how of ten target behaviors are triggered) clearly differentiated between well-aligned and intentionally misaligned models, proving Bloom’s diagnostic power.
Validation studies reinforced Bloom’s reliability:
- Discriminative Accuracy: Bloom successfully distinguished production models from “model organisms” with engineered behaviors in the vast majority of cases.
- Human Agreement: When pitted against hand-labeled human judgments, Bloom’s AI-driven scores (using Claude Opus 4.1) showed strong correlation, especially at the top and bottom of the behavioral spectrum, where alignment matters most.
Case Study: Tackling Self-Preferential Bias
Bloom’s flexibility shines in nuanced research. Anthropic replicated an evaluation of self-preferential bias, where AI models favor themselves in decision-making. Bloom not only reproduced previous result rankings but also revealed that deeper reasoning in newer Claude models reduced bias. Researchers could filter out unrealistic outputs, further improving result quality and consistency.
A Versatile Tool for Alignment Research
Bloom’s configuration options make it suitable for a range of research tasks: probing for vulnerabilities, testing for hardcoded behaviors, gauging evaluation awareness, and more. As AI systems grow more powerful, scalable and open tools like Bloom become essential for ongoing model assessment and safe deployment.
Key Takeaway: Accelerating Responsible AI
By automating and standardizing behavioral evaluations, Bloom fills a crucial gap in the alignment research toolkit. Its open-source release and modular design allow researchers to keep pace with advancing AI, ensuring robust, transparent, and repeatable safety checks. For benchmarks, technical details, and source code, explore Anthropic’s full technical report and the official GitHub repository.

Automating AI Alignment: How Anthropic’s Bloom Reimagines Behavioral Evaluation