Apriel-1.6-15B-Thinker: Redefining Multimodal AI Efficiency ServiceNow's Apriel-1.6-15B-Thinker is setting a new standard for efficient and accessible AI. This breakthrough model emphasizes how smart data strategies and targeted training can enable smaller mod... AI benchmarks efficient models enterprise AI multimodal AI reinforcement learning ServiceNow AI supervised finetuning token efficiency
How DiscoRL Is Changing the Rules: AI That Discovers Its Own Learning Algorithms What if artificial intelligence could not only learn from experience but also invent the very rules that govern its learning, outpacing even the best human-crafted algorithms? Google DeepMind has take... AI research algorithm discovery automation DeepMind generalization meta-learning neural networks reinforcement learning
Reinforcement Fine-Tuning: Amazon Bedrock's Breakthrough for Smarter AI Models Adapting AI models for business is often a trade-off between generic tools and high-cost, complex customization. Amazon Bedrock is revolutionizing this landscape by introducing reinforcement fine-tuni... AI customization Amazon Bedrock AWS machine learning model deployment model fine-tuning reinforcement learning
CoreWeave's Acquisition of OpenPipe: A New Era for AI Cloud Innovation The pace of artificial intelligence breakthroughs is picking up, and CoreWeave’s recent acquisition of OpenPipe demonstrates a bold commitment to advancing the field. By incorporating OpenPipe’s reinf... agent training AI cloud cloud infrastructure CoreWeave enterprise AI machine learning OpenPipe reinforcement learning
Expert Human Feedback Is Changing AI-Driven Drug Discovery AI has shown immense potential in many fields, but drug discovery has long stood apart due to its complexity. Insilico Medicine is bridging this gap with its Reinforcement Learning with Expert Human F... biotechnology Chemistry42 drug discovery expert feedback generative AI machine learning pharmaceutical innovation reinforcement learning
Rubrics As Rewards: Reinforcement Learning Beyond Verifiable Domains When AI Doctors Need Better Report Cards A future where AI is designed to help improve diagnostic medicine and even find rare diseases may be very close thanks to research from ScaleAI. But what does ... AI training healthcare AI interpretability machine learning reinforcement learning rubrics
How Direct Reasoning Optimization Teaches LLMs to Grade Their Own Thinking Large language models have learned to reason well in math and coding thanks to reinforcement learning with verifiable rewards, where an answer can be checked automatically. Open-ended tasks like rewri... chain-of-thought FinQA GRPO ParaRev R3 reinforcement learning RLVR
OpenEnv: Fueling the Future of Agentic AI with Open, Standardized Environments AI agents are getting smarter, but their ability to interact with the world safely and effectively hinges on more than just powerful models. They require environments purpose-built for safety , flexib... agentic systems AI agents environments Hugging Face Meta open source reinforcement learning standardization
AI Is Accelerating the Fusion Energy Revolution A future where energy is virtually limitless and pollution-free has been the promise of atomic energy systems for nearly 8 decades. While advancements in fusion energy hold this promise, it remains on... AI DeepMind fusion energy machine learning plasma simulation reinforcement learning sustainable energy tokamak
CoreWeave Unleashes Serverless Reinforcement Learning for All With the introduction of Serverless RL, CoreWeave is making high-performance RL accessible to everyone from startups to large enterprises. By removing the need for infrastructure management and loweri... AI agents AI innovation cloud infrastructure CoreWeave OpenPipe reinforcement learning serverless computing Weights & Biases
Agent Lightning: Decoupled RL Training for Any AI Agent Agent Lightning is a Microsoft Research project that turns existing agents into trainable systems with minimal code changes. Instead of rewriting your agent to fit a trainer loop, you attach a lightwe... AI agents AutoGen DPO LangGraph OpenAI Agents reinforcement learning RLHF VERL vLLM
Code World Model: A 32B Agentic Coding LLM Grounded In Execution Traces This article analyzes a Meta FAIR technical report introducing the Code World Model (CWM), a 32-billion-parameter decoder-only transformer trained to model program execution and agentic software engin... agents code generation execution traces LLM reinforcement learning software engineering