Blog Posts | Joshua Berkowitz

13 Articles

AI safety ×

Rubrics as Rewards: A New Paradigm for Training Reliable AI

AI models face significant challenges when applied to nuanced, high-stakes fields like medicine and science. Standard training techniques, such as Reinforcement Learning from Human Feedback (RLHF), of...

AI safety AI training expert guidance language models model evaluation RLHF rubrics

Sep 23, 2025

0 1243

News

New OpenAi Realtime API Now Available for Voice Agents

OpenAI's latest release, gpt-realtime , together with a revamped Realtime API, is redefining what's possible for voice agents. By directly processing and generating audio in a single model, this techn...

AI safety developer tools function calling GPT-4o OpenAI Realtime API speech-to-speech voice agents

Aug 29, 2025

0 29084

News

Claude for Chrome: Anthropic’s Bold Step Toward Secure, Browser-Based AI

Anthropic is piloting Claude for Chrome promising to streamline daily tasks while keeping safety at the forefront. By enabling Claude to interact with web pages, users could see major productivity boo...

AI safety beta testing browser security Chrome extension Claude AI prompt injection user permissions

Aug 28, 2025

0 4807

News

Gemini 2.5 Deep Think: The Next Leap in AI Problem Solving

Artificial intelligence is evolving from simply providing answers to actively reasoning through complex problems. Google's latest Gemini 2.5 Deep Think update exemplifies this shift, offering Google A...

AI AI safety coding Deep Think Gemini problem solving reinforcement learning research tools

Aug 1, 2025

0 8151

Gemini

Google Gemini’s New Photo-to-Video Tool Unlocks Creative Freedom

Turning a simple photograph into a captivating, animated video is now just a few taps away. Google Gemini's innovative photo-to-video feature empowers anyone to breathe life into their favorite images...

AI safety AI video creative tools digital watermarking Google Gemini photo animation Veo 3

Jul 11, 2025

0 2926

Gemini

Gemini Robotics On-Device: Bringing Advanced AI Directly to Robots

Imagine a world where robots react instantly, adapt to changing tasks, and operate independently of the cloud. Google's DeepMind is turning this vision into reality with Gemini Robotics On-Devic...

AI safety developer tools Gemini Robotics machine learning on-device AI robotic dexterity robotics

Jul 7, 2025

0 1771

News

Detecting AI Sabotage: Insights from the SHADE-Arena Project

As artificial intelligence becomes more powerful, ensuring these systems act in our best interests is more important than ever. Recent work from Anthropic , through the SHADE-Arena project, addresses ...

agentic behavior AI alignment AI safety language models monitoring tools sabotage detection SHADE-Arena

Jun 29, 2025

0 3201

News

When AI Becomes the Insider Threat: Lessons from Agentic Misalignment Research

As organizations hand more autonomy to AI systems, a pressing issue emerges: what if these intelligent tools act in ways that actively undermine their users? Recent research from Anthropic explores th...

agentic misalignment AI alignment AI ethics AI safety corporate security insider threats LLMs

Jun 27, 2025

0 3091

News

Ether0 Is Transforming Chemistry with AI-Powered Scientific Reasoning

ether0, FutureHouse's new open-source, 24-billion-parameter model, hints at a future where scientific breakthroughs are achieved faster thanks to AI models that excel at complex reasoning in fields li...

AI chemistry AI safety drug discovery FutureHouse molecular design open source AI reinforcement learning scientific reasoning

Jun 15, 2025

0 2585

News

Anthropic Launches Bug Bounty Program to Strengthen AI Safety Defenses

As artificial intelligence grows more advanced, ensuring its safe and ethical use is crucial. Anthropic is taking a bold step by launching a new bug bounty program, inviting top security experts to fi...

AI safety bug bounty Claude 3.7 Sonnet Constitutional Classifiers HackerOne Responsible Scaling Policy security research

May 20, 2025

0 3256

News

Jailbreaking AI Chatbots: Understanding the Flaw and the Path to Safer AI

Imagine asking an AI chatbot for dangerous instructions and having it comply simply by rephrasing your request. This alarming scenario is all too real, as Princeton engineers have discovered a fundame...

AI ethics AI safety chatbots cybersecurity deep alignment jailbreaking large language models Princeton research

May 19, 2025

0 1254

News

Unlocking Accuracy in RAG: The Crucial Role of Sufficient Context

When it comes to reducing hallucinations and improving accuracy in large language models (LLMs), the focus is shifting from mere relevance to the concept of sufficient context . Rather than simply ret...

AI safety Google Research hallucinations LLMs RAG retrieval systems sufficient context

May 15, 2025

0 3003

News

1
2

Get All The Latest Research & News!

Subscribe

Our latest content

Check out what's new !

See all

Ads

Prompt Maker Image Generator

Struggling with the perfect AI image prompt? My free app helps you generate brilliant ideas and instantly creates an image to match. Go from concept to creation in two clicks!

Most Popular Articles

Check out what the hot topics are!

See all

Every shirt tells a story—and every story

#ClothingForACause