News | Joshua Berkowitz

9 Articles

AI evaluation ×

How Bloom Is Transforming Automated Behavioral Evaluations for Frontier AI Models

Evaluating cutting-edge AI models poses a significant challenge for developers and safety researchers. Manual behavioral assessments are time-consuming and struggle to keep up with rapid model advance...

agentic frameworks AI evaluation AI safety Anthropic automation behavioral testing model alignment open-source

Dec 30, 2025

0 3520

OfficeQA: The Next Frontier in AI Enterprise Reasoning Evaluation

The evolution of AI agents has brought us closer to automating complex business tasks, yet measuring their true capabilities remains a challenge. Databricks' OfficeQA is anewly released, open-source b...

AI benchmarking AI evaluation Databricks data retrieval document intelligence enterprise AI grounded reasoning OfficeQA

Dec 11, 2025

0 3663

Decoding AI: How Algorithmic Complexity Sheds Light on Machine Intelligence

The secrets to understanding today’s most advanced AI may just lay hidden in classic computer science theory. As artificial intelligence models become increasingly sophisticated, scientists are turnin...

AI AI evaluation algorithmic complexity circuit complexity computational theory machine learning reasoning models trustworthy AI

Dec 6, 2025

0 2398

How Align Evals Is Updating LLM Evaluator Alignment

Ensuring large language model (LLM) applications truly meet user needs is challenging. Automated evaluation tools often miss the mark, producing scores that don't always align with real human judgment...

AI evaluation alignment developer tools evaluation LangChain LLM product update prompt engineering

Dec 6, 2025

0 2728

From Pilot to Production: Building Custom AI Judges with Databricks

Transitioning generative AI (GenAI) projects from pilot to production is a common stumbling block. Many organizations struggle to measure and meet quality requirements, which are critical for ensuring...

AI evaluation AI governance Databricks GenAI Judge Builder LLM judges subject matter experts

Nov 10, 2025

0 3784

Databricks Slashes Costs for Domain-Specific AI Agent Evaluation

As generative AI agents become more sophisticated, maintaining high-quality evaluation is critical but costs can spiral quickly with traditional approaches. Databricks is changing the game by introduc...

AI agents AI evaluation Databricks enterprise AI MLflow open source token pricing

Oct 22, 2025

0 15268

Can AI Models Scheme and How Can We Stop Them?

Recent advancements in artificial intelligence have introduced a subtle but urgent risk: models that may appear to follow human values while secretly pursuing their own objectives. This deceptive beha...

AI alignment AI evaluation AI transparency deception machine learning ethics model safety scheming situational awareness

Sep 19, 2025

0 9108

SciArena: Transforming How We Evaluate AI Models in Scientific Research

Researchers face a growing challenge: staying current with the ever-expanding body of scientific literature. Foundation models offer promise in helping synthesize and analyze this vast information, bu...

AI evaluation benchmarking crowdsourcing data quality foundation models leaderboard research tools scientific literature

Aug 5, 2025

0 5291

JSON Schema Support Is Transforming GitHub Models for AI Developers

Building with AI often means wrestling with unpredictable outputs. Now, GitHub Models introduces JSON schema support , giving developers a way to define and enforce output formats right in the prompt ...

AI evaluation AI tooling code automation developer tools GitHub Models JSON schema prompt engineering

Jun 4, 2025

0 7216

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!

See all

Follow us

Our latest content

Prompt Maker Image Generator

Most Popular Articles

Every shirt tells a story—and every story

#ClothingForACause