Unleashing On-Device Agentic Power: How Fara-7B Transforms Human-Computer Interaction Microsoft Research’s Fara-7B is a small, open-weight agentic model that interacts with your device in a human-like way. It looks to fulfil the promise ofhaving a digital assistant that doesn’t just un... agentic AI AI safety benchmarking on-device AI open source small language models synthetic data web automation
Edison Analysis: Transforming Scientific Research with Automated Intelligence Researchers have long faced arduous, time-intensive data analysis processes that hinder discovery. Edison Analysis changes the game by offering a sophisticated analysistool that streamlines and enhanc... AI tools automation benchmarking bioinformatics data science Jupyter notebooks scientific analysis
IBM Granite 4.0 Nano: Compact AI Models Delivering Outsized Performance IBM’s Granite 4.0 Nano models are bringing high performance Ai to the edge. They represent a significant leap in compact, high-performance language models built specifically for edge and on-device com... benchmarking edge AI Granite 4.0 hybrid architecture IBM language models Nano models responsible AI
Toucan Dataset: Transforming AI Agents Into Digital Doers Toucan, a groundbreaking open-source dataset from IBM and the University of Washington is crafted to propel tool-calling capabilities in large language models (LLMs) to new heights. For AI to move bey... AI agents API integration benchmarking large language models machine learning open source tool-calling Toucan dataset
Perplexity is Redefining Search APIs for the Age of AI Today’s AI-driven products demand search infrastructure that is fast, scalable, and deeply context-aware. As intelligent agents and real-time knowledge access become central to new applications, tradi... AI search API architecture benchmarking context engineering information retrieval machine learning Perplexity Search API
AfriMed-QA is Setting the Standard for Health AI in Africa Artificial intelligence has the potential to revolutionize healthcare, but can large language models (LLMs) truly meet the needs of diverse communities? AfriMed-QA is leading the way by evaluating LLM... Africa benchmarking clinical evaluation healthcare AI LLMs medical questions multilingual datasets open source
Claude Sonnet 4.5: Redefining AI Coding and Developer Productivity Anthropic’s Claude Sonnet 4.5 emerges as a transformative force in the world of AI-driven software development. This release introduces significant advancements for businesses and developers, establis... AI agents AI coding alignment benchmarking Claude 4.5 developer tools productivity safety
OpenAI's GPT-5-Codex: The Next Evolution in AI-Powered Coding OpenAI has taken a bold step forward in the AI coding space by introducing GPT-5-Codex. This new release redefines what developers can expect from AI-powered coding assistants, offering new levels of ... AI coding benchmarking code review Codex GPT-5 OpenAI software development
AssetOpsBench Sets New Standards for AI in Industrial Asset Management Industrial asset management is undergoing a transformation as artificial intelligence agents are poised to take on complex tasks, from predictive maintenance to troubleshooting intricate machinery. At... AI agents asset management benchmarking failure analysis industrial automation LLM evaluation multi-agent systems open source
Unlocking the Power of Generalizable Tabular Models with Synthetic Priors Tabular data drives vital decisions across sectors like healthcare, finance, and retail, but most machine learning solutions for these datasets are narrowly optimized and lack broad applicability. Tod... AutoGluon benchmarking foundation models in-context learning machine learning synthetic data tabular data
PlanetScale Launches Fastest Postgres Hosting The world of cloud databases is evolving quickly, and PlanetScale’s announcement of private preview support for PostgreSQL is a major development. With a focus on speed, reliability, and a commitment ... benchmarking cloud databases performance PlanetScale PostgreSQL scalability Vitess
SciArena: Transforming How We Evaluate AI Models in Scientific Research Researchers face a growing challenge: staying current with the ever-expanding body of scientific literature. Foundation models offer promise in helping synthesize and analyze this vast information, bu... AI evaluation benchmarking crowdsourcing data quality foundation models leaderboard research tools scientific literature