Skip to Content

How Anthropic Builds Smarter Long-Running AI Agents Inspired by Human Engineering

Decoding Why AI Agents Struggle with Long-Term Tasks

Get All The Latest to Your Inbox!

Thanks for registering!

 

Advertise Here!

Gain premium exposure to our growing audience of professionals. Learn More

AI agents face a unique challenge when tackling complex, multi-session projects: each session starts with a blank slate, lacking the memory of what came before. This "forgetful" nature leads to duplicated work, missed steps, and unfinished projects. Even with advanced tools to manage context, agents can attempt too much at once or mark a project as complete before it's truly done. Anthropic's engineering team recognized these limitations and looked to proven human engineering strategies for inspiration.

A Two-Agent System for Reliable Progress

To overcome memory limitations, Anthropic developed a two-part agent system. The Initializer Agent launches the project, setting up scripts, progress logs, and version control in a way that mirrors how human teams start new software projects. It creates structured documentation and a clear state for the project. Next, the Coding Agent takes over, working on one feature at a time and leaving detailed records for smooth handoffs. This division ensures that each session builds on the last in a clean, organized manner.

Structured Environments for Clarity

The process starts with the initializer agent expanding the user prompt into a detailed feature list in JSON, setting all features to "failing." This list acts as the single source of truth and prevents agents from skipping steps. Coding agents are only allowed to mark a feature complete after careful implementation and testing. Strict rules ensure that tests and documentation remain intact, maintaining project integrity.

Incremental Progress: One Step at a Time

Agents are prompted to focus on a single feature per session. They commit their changes with descriptive messages and update progress logs, making it easy to track what's been done and revert to stable states if necessary. This approach minimizes onboarding time for new sessions and reduces the risk of regression—much like best practices in human software engineering.

Testing Beyond the Code

Initial experiments showed that code-only testing wasn't enough. Agents sometimes declared features complete without verifying them in a real-world environment. By instructing agents to use browser automation tools for end-to-end testing, Anthropic greatly improved reliability. Agents now "think like a user," catching bugs that otherwise would have slipped through. While some challenges remain, like handling browser-native modals, this shift has significantly boosted the accuracy of progress tracking.

Efficient Onboarding for Each Session

Every new coding session begins with the agent reviewing the project state. Key onboarding steps include:

  • Checking the current directory
  • Reviewing git logs and progress files
  • Reading the feature list to select the next incomplete item
  • Starting the development server to confirm core features work

This process prevents repeated work and helps fresh agents quickly spot issues before moving forward, saving time and computational resources.

Solving Common Failure Modes

  • Premature Completion: A comprehensive feature list prevents agents from finishing early.
  • Messy Handoffs: Structured notes and regular commits make transitions seamless.
  • Incomplete Testing: Explicit end-to-end testing requirements ensure reliability.
  • Setup Confusion: Scripts and onboarding steps provide clarity at every session.

What's Next for Agentic Workflows?

Anthropic's harness has proven effective for web app development, but there's potential for even more specialization. Future systems might include dedicated agents for QA, testing, or code cleanup. These strategies could also benefit other domains, from scientific research to finance, wherever long-term, coordinated agent work is needed.

Let's Turn AI Theory Into Your Business Reality

Thanks for reading! Anthropic's engineering strategies show just how sophisticated AI agents can become when designed with the right architecture. But reading about cutting-edge AI is one thing—implementing solutions that transform your daily operations is another entirely. With over two decades of experience helping startups and tech giants alike, I specialize in building intelligent automation systems that actually work in the real world, not just in research papers.

Are you ready to stop wrestling with repetitive tasks and start scaling your impact? Whether you need custom AI agents that remember context across sessions, automated workflows that eliminate manual bottlenecks, or intelligent systems that test themselves for reliability, I can help. Let's discuss how my software development and automation expertise can build solutions tailored exactly to your needs—no vendor lock-in, just results.

If you're curious about how my experience can help you, I'd love to schedule a free consultation.

Conclusion

Through structured setup, incremental progress, rigorous testing, and clear communication, Anthropic's approach to long-running agents brings AI one step closer to the disciplined reliability of human engineering teams. As AI workflows evolve, these lessons promise more robust and scalable solutions across industries.

Source: Anthropic, "Effective harnesses for long-running agents"


How Anthropic Builds Smarter Long-Running AI Agents Inspired by Human Engineering
Joshua Berkowitz November 27, 2025
Views 13508
Share this post