SWE-Bench Pro Sets A Higher Bar For AI Coding Agents As AI coding agents approach human-level performance on existing benchmarks, the research community faces a critical challenge: how do we continue measuring progress when current evaluation suites are... AI benchmarks coding agents software engineering
Context Engineering: How to Build High‑Signal AI Agents Context Engineering: How to Build High‑Signal AI Agents Context is the new battleground for AI agents. While the focus had been on prompts and models, the real difference between demos and production ... AI agents coding agents compaction context engineering MCP RAG subagents tool design
DeepSWE-Preview Sets a New Standard for Open-Source Coding Agents with Reinforcement Learning Imagine a coding agent that not only keeps pace with its open-source contemporaries but actually outshines them, all powered by reinforcement learning ( RL ). DeepSWE-Preview, a collaboration be... coding agents emergent behavior LLM open source reinforcement learning rLLM software engineering test-time scaling