Effective harnesses for long-running agents

Anthropic Engineering explores the challenge of enabling AI agents to work continuously on complex tasks that span multiple context windows. The core problem is that agents lose memory between sessions, often leading to attempts to "one-shot" entire applications or premature declarations of completion. This results in broken code and reduced reliability for long-running tasks.

To solve this, they implemented a two-part solution inspired by human software engineering practices: using an Initializer Agent to set up a robust environment and a Coding Agent to make incremental, feature-by-feature progress. By enforcing a clean state at the end of every session and using structured artifacts like a feature list and progress logs, agents can effectively pick up where the previous instance left off, ensuring consistent progress on large-scale projects.

Key Concepts

Initializer Agent: A specialized agent that runs first to set up the environment, creating essential files like init.sh, claude-progress.txt, and an initial git commit to lay the foundation.
Coding Agent: Subsequent agents that work iteratively, tackling one feature at a time from a prioritized list, ensuring they don't try to specific too much at once.
Feature List: A comprehensive, status-tracked JSON list of requirements (e.g., "User can login") that prevents the agent from guessing what to do next or stopping early.
Clean State: The requirement that every session ends with the codebase in a working, committable state (passing tests, no major bugs), allowing the next agent to start immediately.
End-to-End Testing: Using browser automation tools (like Puppeteer) to verify that features actually work from a user's perspective, rather than just relying on unit tests or code static analysis.