Blog Posts | Joshua Berkowitz

12 Articles

2025 × reinforcement learning ×

How DiscoRL Is Changing the Rules: AI That Discovers Its Own Learning Algorithms

What if artificial intelligence could not only learn from experience but also invent the very rules that govern its learning, outpacing even the best human-crafted algorithms? Google DeepMind has take...

AI research algorithm discovery automation DeepMind generalization meta-learning neural networks reinforcement learning

Dec 9, 2025

0 4444

News

Reinforcement Fine-Tuning: Amazon Bedrock's Breakthrough for Smarter AI Models

Adapting AI models for business is often a trade-off between generic tools and high-cost, complex customization. Amazon Bedrock is revolutionizing this landscape by introducing reinforcement fine-tuni...

AI customization Amazon Bedrock AWS machine learning model deployment model fine-tuning reinforcement learning

Dec 6, 2025

0 2937

News

Rubrics As Rewards: Reinforcement Learning Beyond Verifiable Domains

When AI Doctors Need Better Report Cards A future where AI is designed to help improve diagnostic medicine and even find rare diseases may be very close thanks to research from ScaleAI. But what does ...

AI training healthcare AI interpretability machine learning reinforcement learning rubrics

Nov 2, 2025

0 13079

Papers

How Direct Reasoning Optimization Teaches LLMs to Grade Their Own Thinking

Large language models have learned to reason well in math and coding thanks to reinforcement learning with verifiable rewards, where an answer can be checked automatically. Open-ended tasks like rewri...

chain-of-thought FinQA GRPO ParaRev R3 reinforcement learning RLVR

Nov 1, 2025

0 6325

Papers

Agent Lightning: Decoupled RL Training for Any AI Agent

Agent Lightning is a Microsoft Research project that turns existing agents into trainable systems with minimal code changes. Instead of rewriting your agent to fit a trainer loop, you attach a lightwe...

AI agents AutoGen DPO LangGraph OpenAI Agents reinforcement learning RLHF VERL vLLM

Oct 8, 2025

0 56925

Github Repos

Code World Model: A 32B Agentic Coding LLM Grounded In Execution Traces

This article analyzes a Meta FAIR technical report introducing the Code World Model (CWM), a 32-billion-parameter decoder-only transformer trained to model program execution and agentic software engin...

agents code generation execution traces LLM reinforcement learning software engineering

Oct 7, 2025

0 16445

Papers

Gemini 2.5 Deep Think: AI Achieves Gold-Level Performance at the ICPC World Finals

Artificial intelligence continues to break new ground, and Gemini 2.5 Deep Think’s gold-level performance at the 2025 ICPC World Finals is a testament to how far machine problem-solving has come. This...

AI breakthroughs artificial intelligence collaborative AI competitive programming Gemini ICPC problem solving reinforcement learning

Sep 17, 2025

0 22473

Gemini

PASS Puts Probabilities on Agentic Workflows for Safer, Adaptive Chest X-ray AI

Chest X-rays are fast, cheap, and ubiquitous, but reading them well demands careful multi-structure reasoning. The paper PASS introduces a multimodal agentic system that treats chest X-ray (CXR) analy...

agentic systems CXR medical AI multimodal radiology reinforcement learning

Aug 19, 2025

0 4114

Papers

SmallThinker: Bringing Powerful Language Models to Local Devices

Researchers from Shanghai Jiao Tong University’s Institute of Parallel and Distributed Systems, the School of Artificial Intelligence, and Zenergize AI introduced SmallThinker : a family of large lang...

AI Models AI training reinforcement learning

Aug 16, 2025

0 12892

Papers

Gemini 2.5 Deep Think: The Next Leap in AI Problem Solving

Artificial intelligence is evolving from simply providing answers to actively reasoning through complex problems. Google's latest Gemini 2.5 Deep Think update exemplifies this shift, offering Google A...

AI AI safety coding Deep Think Gemini problem solving reinforcement learning research tools

Aug 1, 2025

0 15950

Gemini

Z.AI GLM-4.5: Redefining Unified AI Reasoning and Coding

Innovation in artificial intelligence continues at an unprecedented pace, and GLM-4.5 is at the forefront of this evolution. Designed to unify reasoning, coding, and agentic functionalities, GLM-4.5 b...

agentic AI AI benchmarks coding language models model architecture reasoning reinforcement learning

Jul 30, 2025

0 11836

News

DeepSWE-Preview Sets a New Standard for Open-Source Coding Agents with Reinforcement Learning

Imagine a coding agent that not only keeps pace with its open-source contemporaries but actually outshines them, all powered by reinforcement learning ( RL ). DeepSWE-Preview, a collaboration be...

coding agents emergent behavior LLM open source reinforcement learning rLLM software engineering test-time scaling

Jul 21, 2025

0 14729

News

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Most Popular Articles

Check out what the hot topics are!

See all

Every shirt tells a story—and every story

#ClothingForACause