News | Joshua Berkowitz

7 Articles

model evaluation ×

How OpenAI Transformed AI Agent Development in 2025

2025 was a turning point for OpenAI developers, as the company shifted from isolated model releases to a unified platform that streamlined production deployment of AI agents. This evolution made the c...

agent SDK AI agents APIs Codex developer tools model evaluation multimodality OpenAI

Jan 6, 2026

0 2343

Bringing Clarity to AI Benchmarking

Artificial intelligence is advancing at breakneck speed, yet understanding how AI models are evaluated remains a persistent hurdle. Inconsistent or incomplete descriptions of benchmarks often make it ...

AI benchmarks IBM machine learning model evaluation Notre Dame open-source transparency

Dec 30, 2025

0 2079

GPT-5.1-Codex-Max: Redefining AI-Powered Coding with Safety and Scale

AI-driven software development is entering a transformative era, thanks to OpenAI’s release of GPT-5.1-Codex-Max. This advanced “agentic” coding model is engineered to tackle challenges across softwar...

agentic AI AI safety coding models cybersecurity GPT-5.1 model evaluation OpenAI software engineering

Nov 20, 2025

0 6677

SEAL Showdown: How Real People Are Changing the AI Model Leaderboard

The explosion of large language models (LLMs) has unlocked new ways to interact with technology, but traditional benchmarks often fail to answer a critical question: Which AI model actually works best...

AI benchmarking data labeling demographics LLM comparison model evaluation Scale AI SEAL Showdown user preferences

Sep 30, 2025

0 21175

OpenAI’s GDPval Is Changing the Way We Measure AI’s Economic Impact

OpenAI’s new initiative, GDPval, aims to provide a clear, evidence-based measure of how AI models perform on real-world, economically valuable tasks. Artificial intelligence is no longer confined to a...

AI measurement economic impact future of work GDP knowledge work model evaluation productivity workforce

Sep 25, 2025

0 29128

Rubrics as Rewards: A New Paradigm for Training Reliable AI

AI models face significant challenges when applied to nuanced, high-stakes fields like medicine and science. Standard training techniques, such as Reinforcement Learning from Human Feedback (RLHF), of...

AI safety AI training expert guidance language models model evaluation RLHF rubrics

Sep 23, 2025

0 4917

MIT is Making Large Language Model Training Affordable: Insights from AI Scaling Laws

Training large language models (LLMs) requires immense computational resources and significant financial investment. For many AI researchers and organizations, predicting model performance while keepi...

AI efficiency AI research budget optimization LLM training machine learning model evaluation scaling laws

Sep 19, 2025

0 5335

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Try It

Most Popular Articles

Check out what the hot topics are!

See all

Follow us

Our latest content

Prompt Maker Image Generator

Most Popular Articles

Every shirt tells a story—and every story

#ClothingForACause