Rubrics As Rewards: Reinforcement Learning Beyond Verifiable Domains When AI Doctors Need Better Report Cards A future where AI is designed to help improve diagnostic medicine and even find rare diseases may be very close thanks to research from ScaleAI. But what does ... AI training healthcare AI interpretability machine learning reinforcement learning rubrics
How Direct Reasoning Optimization Teaches LLMs to Grade Their Own Thinking Large language models have learned to reason well in math and coding thanks to reinforcement learning with verifiable rewards, where an answer can be checked automatically. Open-ended tasks like rewri... chain-of-thought FinQA GRPO ParaRev R3 reinforcement learning RLVR
OpenEnv: Fueling the Future of Agentic AI with Open, Standardized Environments AI agents are getting smarter, but their ability to interact with the world safely and effectively hinges on more than just powerful models. They require environments purpose-built for safety , flexib... agentic systems AI agents environments Hugging Face Meta open source reinforcement learning standardization
AI Is Accelerating the Fusion Energy Revolution A future where energy is virtually limitless and pollution-free has been the promise of atomic energy systems for nearly 8 decades. While advancements in fusion energy hold this promise, it remains on... AI DeepMind fusion energy machine learning plasma simulation reinforcement learning sustainable energy tokamak
CoreWeave Unleashes Serverless Reinforcement Learning for All With the introduction of Serverless RL, CoreWeave is making high-performance RL accessible to everyone from startups to large enterprises. By removing the need for infrastructure management and loweri... AI agents AI innovation cloud infrastructure CoreWeave OpenPipe reinforcement learning serverless computing Weights & Biases
Agent Lightning: Decoupled RL Training for Any AI Agent Agent Lightning is a Microsoft Research project that turns existing agents into trainable systems with minimal code changes. Instead of rewriting your agent to fit a trainer loop, you attach a lightwe... AI agents AutoGen DPO LangGraph OpenAI Agents reinforcement learning RLHF VERL vLLM
Code World Model: A 32B Agentic Coding LLM Grounded In Execution Traces This article analyzes a Meta FAIR technical report introducing the Code World Model (CWM), a 32-billion-parameter decoder-only transformer trained to model program execution and agentic software engin... agents code generation execution traces LLM reinforcement learning software engineering
DeepSeek-R1 Is Redefining AI Reasoning Through Reinforcement Learning Reasoning underpins complex tasks like solving math problems, writing code, and making logical deductions. While recent LLMs have made headlines with their reasoning skills, these advances typically d... AI DeepSeek-R1 language models machine learning reasoning reinforcement learning safety STEM
AI Is Powering Gravitational Wave Detection and Cosmic Discovery Thanks to breakthrough advances in artificial intelligence, we are starting to be able to “hear” the universe’s faintest secrets. Google DeepMind’s Deep Loop Shaping method is now helping astronomers ... AI astrophysics DeepMind gravitational waves LIGO noise reduction reinforcement learning scientific discovery
Gemini 2.5 Deep Think: AI Achieves Gold-Level Performance at the ICPC World Finals Artificial intelligence continues to break new ground, and Gemini 2.5 Deep Think’s gold-level performance at the 2025 ICPC World Finals is a testament to how far machine problem-solving has come. This... AI breakthroughs artificial intelligence collaborative AI competitive programming Gemini ICPC problem solving reinforcement learning
Rethinking AI Collaboration: How CollabLLM Trains LLMs for Real Conversations While large language models (LLMs) have achieved remarkable feats in solving complex tasks recently, they often stumble in genuine, multi-turn conversations. Their typical training on isolated prompts... AI training collaboration human-AI interaction LLMs multi-turn dialogue reinforcement learning user-centric AI
PASS Puts Probabilities on Agentic Workflows for Safer, Adaptive Chest X-ray AI Chest X-rays are fast, cheap, and ubiquitous, but reading them well demands careful multi-structure reasoning. The paper PASS introduces a multimodal agentic system that treats chest X-ray (CXR) analy... agentic systems CXR medical AI multimodal radiology reinforcement learning