Unlocking Speed, Smarts, and Savings: Claude Haiku 4.5 Raises the Bar for Small AI Models AI is entering a new era where speed and affordability no longer come at the expense of intelligence. Anthropic’s Claude Haiku 4.5 exemplifies this shift, making high-level AI capabilities accessible ... AI models AI safety Claude Haiku coding performance cost efficiency developer tools real-time AI
How a Handful of Malicious Documents Can Backdoor Massive AI Models It might seem that poisoning a huge AI model would require corrupting a substantial portion of its training data. However, groundbreaking research reveals this isn’t the case. Experts from Anthropic, ... adversarial machine learning AI safety AI security backdoor attacks data poisoning large language models model robustness research
Rubrics as Rewards: A New Paradigm for Training Reliable AI AI models face significant challenges when applied to nuanced, high-stakes fields like medicine and science. Standard training techniques, such as Reinforcement Learning from Human Feedback (RLHF), of... AI safety AI training expert guidance language models model evaluation RLHF rubrics
New OpenAi Realtime API Now Available for Voice Agents OpenAI's latest release, gpt-realtime , together with a revamped Realtime API, is redefining what's possible for voice agents. By directly processing and generating audio in a single model, this techn... AI safety developer tools function calling GPT-4o OpenAI Realtime API speech-to-speech voice agents
Claude for Chrome: Anthropic’s Bold Step Toward Secure, Browser-Based AI Anthropic is piloting Claude for Chrome promising to streamline daily tasks while keeping safety at the forefront. By enabling Claude to interact with web pages, users could see major productivity boo... AI safety beta testing browser security Chrome extension Claude AI prompt injection user permissions
Gemini Robotics On-Device: Bringing Advanced AI Directly to Robots Imagine a world where robots react instantly, adapt to changing tasks, and operate independently of the cloud. Google's DeepMind is turning this vision into reality with Gemini Robotics On-Devic... AI safety developer tools Gemini Robotics machine learning on-device AI robotic dexterity robotics
Detecting AI Sabotage: Insights from the SHADE-Arena Project As artificial intelligence becomes more powerful, ensuring these systems act in our best interests is more important than ever. Recent work from Anthropic , through the SHADE-Arena project, addresses ... agentic behavior AI alignment AI safety language models monitoring tools sabotage detection SHADE-Arena
When AI Becomes the Insider Threat: Lessons from Agentic Misalignment Research As organizations hand more autonomy to AI systems, a pressing issue emerges: what if these intelligent tools act in ways that actively undermine their users? Recent research from Anthropic explores th... agentic misalignment AI alignment AI ethics AI safety corporate security insider threats LLMs
Ether0 Is Transforming Chemistry with AI-Powered Scientific Reasoning ether0, FutureHouse's new open-source, 24-billion-parameter model, hints at a future where scientific breakthroughs are achieved faster thanks to AI models that excel at complex reasoning in fields li... AI chemistry AI safety drug discovery FutureHouse molecular design open source AI reinforcement learning scientific reasoning
Anthropic Launches Bug Bounty Program to Strengthen AI Safety Defenses As artificial intelligence grows more advanced, ensuring its safe and ethical use is crucial. Anthropic is taking a bold step by launching a new bug bounty program, inviting top security experts to fi... AI safety bug bounty Claude 3.7 Sonnet Constitutional Classifiers HackerOne Responsible Scaling Policy security research
Jailbreaking AI Chatbots: Understanding the Flaw and the Path to Safer AI Imagine asking an AI chatbot for dangerous instructions and having it comply simply by rephrasing your request. This alarming scenario is all too real, as Princeton engineers have discovered a fundame... AI ethics AI safety chatbots cybersecurity deep alignment jailbreaking large language models Princeton research
Unlocking Accuracy in RAG: The Crucial Role of Sufficient Context When it comes to reducing hallucinations and improving accuracy in large language models (LLMs), the focus is shifting from mere relevance to the concept of sufficient context . Rather than simply ret... AI safety Google Research hallucinations LLMs RAG retrieval systems sufficient context