Agent Lightning: Decoupled RL Training for Any AI Agent Agent Lightning is a Microsoft Research project that turns existing agents into trainable systems with minimal code changes. Instead of rewriting your agent to fit a trainer loop, you attach a lightwe... AI agents AutoGen DPO LangGraph OpenAI Agents reinforcement learning RLHF VERL vLLM
Code World Model: A 32B Agentic Coding LLM Grounded In Execution Traces This article analyzes a Meta FAIR technical report introducing the Code World Model (CWM), a 32-billion-parameter decoder-only transformer trained to model program execution and agentic software engin... agents code generation execution traces LLM reinforcement learning software engineering
DeepSeek-R1 Is Redefining AI Reasoning Through Reinforcement Learning Reasoning underpins complex tasks like solving math problems, writing code, and making logical deductions. While recent LLMs have made headlines with their reasoning skills, these advances typically d... AI DeepSeek-R1 language models machine learning reasoning reinforcement learning safety STEM
AI Is Powering Gravitational Wave Detection and Cosmic Discovery Thanks to breakthrough advances in artificial intelligence, we are starting to be able to “hear” the universe’s faintest secrets. Google DeepMind’s Deep Loop Shaping method is now helping astronomers ... AI astrophysics DeepMind gravitational waves LIGO noise reduction reinforcement learning scientific discovery
Gemini 2.5 Deep Think: AI Achieves Gold-Level Performance at the ICPC World Finals Artificial intelligence continues to break new ground, and Gemini 2.5 Deep Think’s gold-level performance at the 2025 ICPC World Finals is a testament to how far machine problem-solving has come. This... AI breakthroughs artificial intelligence collaborative AI competitive programming Gemini ICPC problem solving reinforcement learning
Rethinking AI Collaboration: How CollabLLM Trains LLMs for Real Conversations While large language models (LLMs) have achieved remarkable feats in solving complex tasks recently, they often stumble in genuine, multi-turn conversations. Their typical training on isolated prompts... AI training collaboration human-AI interaction LLMs multi-turn dialogue reinforcement learning user-centric AI
PASS Puts Probabilities on Agentic Workflows for Safer, Adaptive Chest X-ray AI Chest X-rays are fast, cheap, and ubiquitous, but reading them well demands careful multi-structure reasoning. The paper PASS introduces a multimodal agentic system that treats chest X-ray (CXR) analy... agentic systems CXR medical AI multimodal radiology reinforcement learning
Jules: Google’s AI Code Reviewer Setting a New Standard for Quality Google is bringing you an AI collaborator that not only crafts code but also rigorously critiques its own output before you even see it. Google Developers have unveiled Jules , featuring a groundbreak... AI coding automated testing code review Google Developers Jules machine learning reinforcement learning software quality
SmallThinker: Bringing Powerful Language Models to Local Devices Researchers from Shanghai Jiao Tong University’s Institute of Parallel and Distributed Systems, the School of Artificial Intelligence, and Zenergize AI introduced SmallThinker : a family of large lang... AI Models AI training reinforcement learning
Gemini 2.5 Deep Think: The Next Leap in AI Problem Solving Artificial intelligence is evolving from simply providing answers to actively reasoning through complex problems. Google's latest Gemini 2.5 Deep Think update exemplifies this shift, offering Google A... AI AI safety coding Deep Think Gemini problem solving reinforcement learning research tools
Z.AI GLM-4.5: Redefining Unified AI Reasoning and Coding Innovation in artificial intelligence continues at an unprecedented pace, and GLM-4.5 is at the forefront of this evolution. Designed to unify reasoning, coding, and agentic functionalities, GLM-4.5 b... agentic AI AI benchmarks coding language models model architecture reasoning reinforcement learning
TextArena Uses Competitive Gameplay to Advance AI As language models quickly catch up with and surpass traditional benchmarks, the need for more effective measurement tools becomes urgent. TextArena steps in as an innovative, open-source platf... agentic AI AI benchmarking LLM evaluation open source reinforcement learning soft skills text-based games TrueSkill