AssetOpsBench: Industrial Agents Meet a Real-World Benchmark Industrial assets do not fail neatly; they fail in ways that force engineers to pull signals from sensors, recall failure modes, and translate insights into work orders. AssetOpsBench is IBM's open-so... AssetOpsBench benchmark IBM Research industrial AI multi-agent systems
MCP-Universe: Real-World Benchmarking For Agents That Use MCP The Model Context Protocol (MCP) has quickly become a common interface for connecting large language models to external tools and data. By design, it looks like a USB-C port for AI applications: a sta... benchmark LLM agents MCP Salesforce AI Research tool use
Devstral: Redefining Open-Source Coding Agents for Autonomous Software Engineering Open-source enthusiasts and professional developers alike have long awaited a model that could deliver true autonomy in software engineering. Enter Devstral , the latest innovation from Mistral AI and... AI models benchmark coding agent Devstral enterprise LLM open-source software engineering