AI agents face a unique challenge when tackling complex, multi-session projects: each session starts with a blank slate, lacking the memory of what came before. This "forgetful" nature leads to duplicated work, missed steps, and unfinished projects. Even with advanced tools to manage context, agents can attempt too much at once or mark a project as complete before it's truly done. Anthropic’s engineering team recognized these limitations and looked to proven human engineering strategies for inspiration.
A Two-Agent System for Reliable Progress
To overcome memory limitations, Anthropic developed a two-part agent system. The Initializer Agent launches the project, setting up scripts, progress logs, and version control in a way that mirrors how human teams start new software projects. It creates structured documentation and a clear state for the project. Next, the Coding Agent takes over, working on one feature at a time and leaving detailed records for smooth handoffs. This division ensures that each session builds on the last in a clean, organized manner.
Structured Environments for Clarity
The process starts with the initializer agent expanding the user prompt into a detailed feature list in JSON, setting all features to "failing." This list acts as the single source of truth and prevents agents from skipping steps. Coding agents are only allowed to mark a feature complete after careful implementation and testing. Strict rules ensure that tests and documentation remain intact, maintaining project integrity.
Incremental Progress: One Step at a Time
Agents are prompted to focus on a single feature per session. They commit their changes with descriptive messages and update progress logs, making it easy to track what’s been done and revert to stable states if necessary. This approach minimizes onboarding time for new sessions and reduces the risk of regression—much like best practices in human software engineering.
Testing Beyond the Code
Initial experiments showed that code-only testing wasn't enough. Agents sometimes declared features complete without verifying them in a real-world environment. By instructing agents to use browser automation tools for end-to-end testing, Anthropic greatly improved reliability. Agents now "think like a user," catching bugs that otherwise would have slipped through. While some challenges remain, like handling browser-native modals, this shift has significantly boosted the accuracy of progress tracking.
Efficient Onboarding for Each Session
Every new coding session begins with the agent reviewing the project state. Key onboarding steps include:
- Checking the current directory
- Reviewing git logs and progress files
- Reading the feature list to select the next incomplete item
- Starting the development server to confirm core features work
This process prevents repeated work and helps fresh agents quickly spot issues before moving forward, saving time and computational resources.
Solving Common Failure Modes
- Premature Completion: A comprehensive feature list prevents agents from finishing early.
- Messy Handoffs: Structured notes and regular commits make transitions seamless.
- Incomplete Testing: Explicit end-to-end testing requirements ensure reliability.
- Setup Confusion: Scripts and onboarding steps provide clarity at every session.
What’s Next for Agentic Workflows?
Anthropic’s harness has proven effective for web app development, but there’s potential for even more specialization. Future systems might include dedicated agents for QA, testing, or code cleanup. These strategies could also benefit other domains, from scientific research to finance, wherever long-term, coordinated agent work is needed.
Conclusion
Through structured setup, incremental progress, rigorous testing, and clear communication, Anthropic’s approach to long-running agents brings AI one step closer to the disciplined reliability of human engineering teams. As AI workflows evolve, these lessons promise more robust and scalable solutions across industries.
Source: Anthropic, "Effective harnesses for long-running agents"

How Anthropic Builds Smarter Long-Running AI Agents Inspired by Human Engineering