SWE-Bench Pro Sets A Higher Bar For AI Coding Agents As AI coding agents approach human-level performance on existing benchmarks, the research community faces a critical challenge: how do we continue measuring progress when current evaluation suites are... AI benchmarks coding agents software engineering