SWE-Bench Pro Sets A Higher Bar For AI Coding Agents As AI coding agents approach human-level performance on existing benchmarks, the research community faces a critical challenge: how do we continue measuring progress when current evaluation suites are... AI benchmarks coding agents software engineering
SciVer Puts Multimodal Claim Verification To The Test Scientific claim verification and reproducibility have emerged as a critical challenges in the era of information abundance and multimodal AI systems. Unlike traditional fact-checking that relies prim... AI benchmark claim verification multimodal scientific reasoning
simdjson: JSON Parsing at the Speed of Silicon JSON parsing is everywhere, but it's rarely fast enough at massive scales. Web servers process millions of API requests daily. Analytics pipelines transform terabytes of log data. Trading systems pars... c++ data-engineering json performance simd simdjson
Notion MCP Server: A Practical Bridge Between AI Agents and Your Workspace Notion's open-source Notion MCP Server turns the company's popular productivity platform into a first-class tool provider for AI agents. It implements the Model Context Protocol (MCP), a common langua... AI Agents Developer Tools LLM MCP Notion OpenAPI TypeScript
Defeating Nondeterminism In LLM Inference Reproducible outputs at temperature 0 should be straightforward in principle, the sampler always picks the highest probability token, yet production LLM endpoints still produce different completions f... attention batch-invariance determinism gpu-kernels llm-inference
ONNX Runtime : Inference Runtime for Portability, Performance, and Scale Deploying machine learning models efficiently is as important as training them. ONNX Runtime , an open-source accelerator from Microsoft, promises fast, portable inference across operating systems and... deployment inference ONNX runtime TensorFlow Serving Triton
Turning Agents Into Sharable Software: Inside Docker’s cagent A small utility that feels like a platform: Docker's cagent turns AI agents into sharable, runnable software. Teams often build a clever assistant or a small cluster of cooperating agents. Then real-w... AI agents Docker Go MCP Open Source Reproducibility
AutoGen: Microsoft's Agent Framework, Reimagined for Multi-Agent Workflows AutoGen is Microsoft's open-source framework for building AI agents that can collaborate, call tools, and even form teams - without making developers choose between speed and control. Housed in the mi... actor model agent collaboration agent orchestration AI framework AutoGen autogen-core AutoGen Studio MCP integration Microsoft multi-agent systems RoutedAgent
PosterGen: Academic Poster Creation with Multi-Agent AI Creating compelling conference posters is a challenge for any researcher. You have to accurately and compelling decide which content and how it will be presented. After months of rigorous research, wr... academic tools artificial intelligence design automation machine learning multi-agent systems poster design research tools scientific communication
Docker's Cagent Makes Building and Sharing AI Agents Effortless Docker’s open-source project, Cagent , lets users define AI agent behaviors, tools, and personas in a single YAML file. By removing the pain of dependency management and code complexity, Cagent shifts... AI agents developer tools Docker MCP toolkit no-code open source workflow automation YAML
Paper2Agent: Transforming Research Papers into Interactive AI Agents Research papers are traditionally require readers and reviewers to interpret code, methods, and results independently. Paper2Agent aims to transform published research into interactive AI agents allow... AgentScope AI agents AutoGen Azure AI Agents Claude Code Code2MCP computational biology LangGraph MCP NotebookLM OpenAI Assistants OpenDevin open source reproducibility Stanford tutorial extraction
How to Choose the Right Google AI Developer Tool for Your Workflow AI development is advancing fast, and Google’s portfolio of developer tools is evolving to keep pace. With so many options now available, selecting the right tool for your workflow can feel overwhelmi... AI development AI tools Code Assist developer workflow Firebase Studio Gemini Google Cloud Jules