Agentic Web Interfaces: Building Websites That Welcome AI Agents The paper "Build the web for agents, not agents for the web" proposes a reorientation of how we design the modern web: rather than forcing AI systems to operate through human-centered pages, we should... AWI MCP Playwright Safety web agents WebArena
MCP-Universe: Real-World Benchmarking For Agents That Use MCP The Model Context Protocol (MCP) has quickly become a common interface for connecting large language models to external tools and data. By design, it looks like a USB-C port for AI applications: a sta... benchmark LLM agents MCP Salesforce AI Research tool use
Introducing LiveMCPBench: Evaluating Models on Large Tool Set Usage A new arXiv preprint, LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools , from the Chinese Academy of Sciences and UCAS, introduces a benchmark to test AI agents in realistic tool-rich environme... AI benchmarking AI tools Artificial Intelligence MCP MCP Server