How DiscoRL Is Changing the Rules: AI That Discovers Its Own Learning Algorithms What if artificial intelligence could not only learn from experience but also invent the very rules that govern its learning, outpacing even the best human-crafted algorithms? Google DeepMind has take... AI research algorithm discovery automation DeepMind generalization meta-learning neural networks reinforcement learning
Reinforcement Fine-Tuning: Amazon Bedrock's Breakthrough for Smarter AI Models Adapting AI models for business is often a trade-off between generic tools and high-cost, complex customization. Amazon Bedrock is revolutionizing this landscape by introducing reinforcement fine-tuni... AI customization Amazon Bedrock AWS machine learning model deployment model fine-tuning reinforcement learning
Rubrics As Rewards: Reinforcement Learning Beyond Verifiable Domains When AI Doctors Need Better Report Cards A future where AI is designed to help improve diagnostic medicine and even find rare diseases may be very close thanks to research from ScaleAI. But what does ... AI training healthcare AI interpretability machine learning reinforcement learning rubrics
How Direct Reasoning Optimization Teaches LLMs to Grade Their Own Thinking Large language models have learned to reason well in math and coding thanks to reinforcement learning with verifiable rewards, where an answer can be checked automatically. Open-ended tasks like rewri... chain-of-thought FinQA GRPO ParaRev R3 reinforcement learning RLVR
Agent Lightning: Decoupled RL Training for Any AI Agent Agent Lightning is a Microsoft Research project that turns existing agents into trainable systems with minimal code changes. Instead of rewriting your agent to fit a trainer loop, you attach a lightwe... AI agents AutoGen DPO LangGraph OpenAI Agents reinforcement learning RLHF VERL vLLM
Code World Model: A 32B Agentic Coding LLM Grounded In Execution Traces This article analyzes a Meta FAIR technical report introducing the Code World Model (CWM), a 32-billion-parameter decoder-only transformer trained to model program execution and agentic software engin... agents code generation execution traces LLM reinforcement learning software engineering
Gemini 2.5 Deep Think: AI Achieves Gold-Level Performance at the ICPC World Finals Artificial intelligence continues to break new ground, and Gemini 2.5 Deep Think’s gold-level performance at the 2025 ICPC World Finals is a testament to how far machine problem-solving has come. This... AI breakthroughs artificial intelligence collaborative AI competitive programming Gemini ICPC problem solving reinforcement learning
PASS Puts Probabilities on Agentic Workflows for Safer, Adaptive Chest X-ray AI Chest X-rays are fast, cheap, and ubiquitous, but reading them well demands careful multi-structure reasoning. The paper PASS introduces a multimodal agentic system that treats chest X-ray (CXR) analy... agentic systems CXR medical AI multimodal radiology reinforcement learning
SmallThinker: Bringing Powerful Language Models to Local Devices Researchers from Shanghai Jiao Tong University’s Institute of Parallel and Distributed Systems, the School of Artificial Intelligence, and Zenergize AI introduced SmallThinker : a family of large lang... AI Models AI training reinforcement learning
Gemini 2.5 Deep Think: The Next Leap in AI Problem Solving Artificial intelligence is evolving from simply providing answers to actively reasoning through complex problems. Google's latest Gemini 2.5 Deep Think update exemplifies this shift, offering Google A... AI AI safety coding Deep Think Gemini problem solving reinforcement learning research tools
Z.AI GLM-4.5: Redefining Unified AI Reasoning and Coding Innovation in artificial intelligence continues at an unprecedented pace, and GLM-4.5 is at the forefront of this evolution. Designed to unify reasoning, coding, and agentic functionalities, GLM-4.5 b... agentic AI AI benchmarks coding language models model architecture reasoning reinforcement learning
DeepSWE-Preview Sets a New Standard for Open-Source Coding Agents with Reinforcement Learning Imagine a coding agent that not only keeps pace with its open-source contemporaries but actually outshines them, all powered by reinforcement learning ( RL ). DeepSWE-Preview, a collaboration be... coding agents emergent behavior LLM open source reinforcement learning rLLM software engineering test-time scaling