Skip to Content

Context Engineering: How to Build High‑Signal AI Agents

Context is the New Battleground for AI Agents

Get All The Latest Research & News!

Thanks for registering!

Context Engineering: How to Build High‑Signal AI Agents

Context is the new battleground for AI agents. While the focus had been on prompts and models, the real difference between demos and production systems lies in context engineering, that is systematically assembling the right information, tools, and constraints before the model thinks. 

Context Engineering Makes AI Agents Useful

The leap from impressive demos to reliable products doesn't come from a new model weights, it comes from the discipline of assembling the right inputs before the model vene starts thinking. That discipline is what we refer to as context engineering, a termed coined by Andrej Karpathy only a couple months ago. 

As Philipp Schmid explains, context engineering is "the discipline of designing and building dynamic systems that provides the right information and tools, in the right format, at the right time, to give a LLM everything it needs to accomplish a task." 

It's not a static prompt; it's a dynamic system that composes instructions, memory, fresh knowledge, tools, and output constraints tailored to the immediate goal.

The difference between a "cheap demo" and a "magical" agent isn't about the code you write, it's about the quality of context you provide. As Schmid notes, "Agent failures aren't only model failures; they are context failures."

What "context" Really Includes

Context is everything the model sees before it generates a response. Schmid's framework breaks this down into six essential layers that work together to create agent intelligence.

At the foundation sits the instructions and system prompt: the rules, examples, and behavioral guidelines that define how the agent should operate. Think of this as the agent's core personality and operating manual. 

Above that, the user prompt provides the immediate task or question, while state and history maintain short-term memory of the current conversation and recent interactions that inform context-aware responses.

The deeper layers involve long-term memory: persistent knowledge including user preferences, past project summaries, and learned facts that accumulate over time. 

Retrieved information through RAG brings in external, up-to-date knowledge from documents, databases, or APIs when needed. 

Finally, available tools define the function calls and capabilities the agent can invoke, from file operations to API calls to complex calculations.

Additionally, modern agents need structured output definitions that specify the expected format of responses through JSON schemas, contracts, and templates. The gulf between a "cheap demo" and a "magical" agent lies in how deliberately you assemble this package for each task making it concise, relevant, and immediately actionable rather than overwhelming the model with irrelevant noise.

The Case Against RAG for Coding Agents

While retrieval-augmented generation (RAG) works well for support bots, document Q&A, and general knowledge tasks, Nik Pash argues it's often counterproductive for autonomous coding agents. 

As Nik puts it: "If you're optimizing for quality, if you're trying to build something that codes like a senior engineer, RAG is a blackhole that will drain your resources, time, and degrade reasoning."

The problem is fundamental: complex coding tasks require multi-hop reasoning. 

As Cursor's Aman Sanger notes, "The hardest questions and queries in a codebase require several hops. Vanilla retrieval only works for one hop." 

Real coding work involves scanning folder structures, following import chains, reading multiple related files, running tests, and iterating not consuming isolated code snippets.

Overfeeding chunked snippets creates what Pash calls "context pollution" which dilutes the agent's judgment with a "swamp of chunked snippets" that degrade reasoning. With today's larger context windows and stronger models, the bottleneck isn't size; it's context quality.

How Modern Agents Explore Information Landscapes

The fundamental shift, as Jason Liu explains, is that "agents don't just consume information, they explore information spaces." This requires a different approach to tool design and system architecture that prioritizes navigation over passive consumption.

Tool response design shapes cognition. Instead of returning raw data dumps, effective tools provide what Liu calls "peripheral vision" that is metadata, facets, IDs, types, paths, and brief summaries that help agents understand the landscape of available information and decide what to explore next. This architectural choice fundamentally changes how agents think about problems, moving from linear processing to strategic exploration.

Architecture choices create dramatic differences in context quality. Liu's research reveals a stark contrast: slash commands that dump logs and diagnostics into the main reasoning thread create an overwhelming 91% noise with only 9% signal. 

In contrast, subagents that work off-thread and return compact, high-signal results maintain remarkable clarity with 76% signal and just 24% noise. This isn't a marginal improvement, it is the difference between an agent that drowns in information and one that navigates it effectively.

Techniques like compaction (rolling up results and state into clean summaries) preserve progress while keeping the working set lean and focused. As Liu notes, "If in-context learning is gradient descent, then compaction is momentum." It allows agents to build understanding incrementally without losing the trajectory of their reasoning or getting bogged down in historical details.

Practical patterns for better context engineering

Successful context engineering follows several key principles that distinguish production systems from academic experiments.

  1. System design starts with treating context as program output. Rather than hand-crafting static prompts, build a pre-LLM system that dynamically composes instructions, memory, retrieved facts, and tool schemas based on the specific task at hand. This programmatic approach allows for consistency, testability, and systematic improvement over time. As Liu emphasizes, teams must choose their form factor deliberately deciding between chatbot, workflow, or research artifact based on economic realities rather than following the latest hype. The key is owning your context window by summarizing long-term memory, capturing key decisions, and specifying precise structured outputs that guide model behavior. 

  2. For coding agents specifically, the conventional wisdom about RAG needs to be challenged. Rather than feeding agents chunked code snippets, give them tools for directory listing, file reading, AST (Abstract Syntax Tree) parsing, test execution, and diff analysis essentially mirroring how senior engineers actually navigate codebases. As Pash observes, successful coding agents like Cline "read code like a human: walking folder trees, analyzing imports, parsing ASTs, reasoning about where to go next." This agentic exploration approach proves far more effective than passive consumption of retrieved snippets. Liu's research reinforces this with a practical recommendation: get one passing test with perfect tool access before building any orchestration infrastructure.

  3. Tool and response engineering requires thoughtful design. Effective tools return metadata, IDs, types, paths, facets, and brief summaries rather than overwhelming agents with raw data dumps. The goal is providing enough context for intelligent navigation without cognitive overload. Using subagents for noisy work keeps the main reasoning thread clean by having specialists fetch, analyze, and compact results off-thread before presenting them to the primary agent. Strategic compaction becomes essential regularly rolling up context into concise state summaries that preserve trajectory without losing essential information about the agent's reasoning path.

Measuring Context Quality

Context engineering isn't just about design it is also about measurement and continuous improvement. Treating context like a product surface requires specific KPIs that reveal the health of your agent's cognitive environment.

Signal-to-noise ratio in the main reasoning thread serves as the primary health metric, while time-to-first-useful-tool-call measures how quickly agents can orient themselves and begin productive work, a clear indicator of context clarity. 

Compaction frequency and information loss help optimize summarization strategies, ensuring you're preserving essential reasoning while eliminating cognitive clutter. While success rate by task form factor guides architectural choices by revealing which patterns work best for different types of problems. 

Tracking regressions tied to tool response changes helps maintain system reliability as you evolve your context engineering approaches.

Finally, for RAG scenarios specifically, measuring retrieval contribution to final answers (rather than just recall metrics) helps teams prefer structured summaries over raw chunks. 

This outcome-focused measurement drives better design decisions about when and how to use retrieval versus other context sources.

Operational Practices Matter as Much as Technical Metrics 

Keep an evidence log and compact it regularly to maintain historical context without overwhelming current reasoning. Version your tool schemas and output contracts to ensure consistency as systems evolve. 

Add lightweight governance for new tool integrations to prevent context pollution from well-meaning but poorly designed additions. Use explicit structured outputs as guardrails wherever possible to maintain predictable agent behavior. 

Remember Schmid's insight: "Most agent failures are not model failures anymore, they are context failures." This perspective shifts debugging focus from model capabilities to context quality, often revealing more actionable improvement opportunities.

Core principles for implementation

Great agents aren't born from better prompts they're engineered through context. The discipline involves three core principles that separate successful implementations from academic exercises.

First, design for exploration, not consumption. Build tools and architectures that support agentic navigation of information spaces rather than passive ingestion of pre-selected content. This means giving agents the ability to make decisions about what information to seek next based on their current understanding and goals. The exploration mindset fundamentally changes how agents approach problems, moving from linear processing to strategic investigation.

Second, optimize for signal, not size. Use techniques like subagents, compaction, and structured responses to maintain context quality as systems scale. The goal isn't to cram more information into context—it's to ensure that every piece of information contributes meaningfully to agent reasoning. This optimization often means saying no to additional data sources that seem helpful but actually introduce noise.

Third, measure and iterate relentlessly. Treat context as a product surface with specific KPIs and operational practices that drive continuous improvement. Without measurement, context engineering becomes an art rather than an engineering discipline. The most successful teams track both technical metrics and business outcomes, creating feedback loops that improve agent reliability over time.

Use RAG selectively and with structure for knowledge tasks; lean into agentic exploration for code. Invest in tool design, compaction systems, subagent architectures, and outcome-oriented form factors. The goal isn't to build cleverer demos—it's to create dependable systems that turn AI potential into business value.

The future of context engineering

Context engineering represents a fundamental shift in how we think about AI systems. As Schmid concludes, "Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates. It is about the engineering of context."

The convergence of several trends makes this especially relevant now. Larger context windows mean the constraint has shifted from quantity to quality, agents can now process vast amounts of information, but only if it's well-structured and relevant. 

Better reasoning models can handle complex, multi-step tasks when given proper context, but they're also more sensitive to context pollution that can derail their reasoning. 

Standardization efforts like the Model Context Protocol (MCP) and Agent Client Protocol (ACP) are making context engineering more systematic, providing shared vocabularies and patterns that enable better tooling and interoperability. 

Perhaps most importantly, economic pressure is pushing teams toward agents that actually deliver business value rather than cool demos, creating market demand for the reliability that only good context engineering can provide.

As Liu notes, "The future of RAG isn't about better embeddings or larger context windows—it's about teaching agents to navigate information spaces systematically." This navigation metaphor captures the essential shift from passive consumption to active exploration that defines modern agent architecture.

Suggested Further Reading

For deeper exploration, see Liu's full series on faceted search, subagent architectures, compaction techniques, and rapid prototyping methods.

"Context Engineering Series: Building Better Agentic RAG Systems" - Advanced patterns and measurement approaches

"Why I No Longer Recommend RAG for Autonomous Coding Agents" - Critical analysis of RAG limitations

"The New Skill in AI is Not Prompting, It's Context Engineering" - Foundational framework and definitions


Context Engineering: How to Build High‑Signal AI Agents
Joshua Berkowitz September 5, 2025
Views 242
Share this post