claude-code/02 · Agents

Why agents matter

Why agents matter (context cleanliness and focus), built-in agents, briefing agents well, foreground vs background, parallel execution, and custom agent specs.

📝 Wholesale AI Champions ⏱ 9 min read 📚 Building agents & skills

The single most underused capability in Claude Code is the ability to spawn an agent for work you need done but do not need to watch happen. Agents do the work in their own context and report back only the answer. Your main session never sees the searches, the file reads, or the intermediate reasoning that produced the result.

This matters for two reasons, and the second is the one most people underestimate.

The obvious reason is window size. Your main session has a finite context window, and every file Claude reads, every command it runs, and every search it performs takes a slice of that window. Agents keep that slice from ever landing in your session.

The less-obvious reason is signal-to-noise. Everything in your context is something Claude has to consider on every subsequent turn. The more junk that piles up, the more there is to scan through, the higher the chance that Claude latches onto something irrelevant, misreads the task, or gets distracted by a tangent in a file it read 15 minutes ago. As context fills with noise, focus degrades and answer quality drops, even when you are nowhere near the window limit. A clean session does not just last longer; it produces sharper results from the start.

There is also a second major benefit on top of context quality: agents can run in the background. Your main session stays fully responsive while the agent works. You can keep typing, keep iterating, keep making progress, and the agent’s report shows up when it is ready. You can also kick off multiple background agents in parallel and let them all churn while you carry on. The slow operations (deep codebase searches, multi-file analysis, test runs) stop being a thing you wait for and become a thing you launch and forget.

These benefits compound. A focused session produces better results. A background-driven workflow keeps you moving while heavy work happens elsewhere. The other benefits (specialization, model selection) are real but secondary.

Why Agents Matter

Imagine you ask Claude “how does the cert completion event flow from inspection-nexus-api through inspection-workflow into the auto-advance handlers?” To answer that, Claude needs to read maybe 15 files, grep for a half-dozen patterns, follow a couple of inheritance chains, and reason through what it found. The answer at the end might be three paragraphs.

If you ask that question in your main session, you keep the answer and you also keep all 15 file reads, all the grep output, and all the intermediate “let me check this” reasoning. The cost is not just window space. Every later turn in the session, Claude is sifting through that same pile of files looking for what is relevant to your next question. It will sometimes anchor on a pattern from one of those files when it is not what you are asking about. It will sometimes quote line numbers from a file you read an hour ago as if they are still relevant. It will sometimes give a longer, less focused answer because it has more it feels obligated to consider.

If you spawn an agent for the same question, the agent does all 15 reads, all the greps, all the analysis in its own context, and returns three paragraphs. Your main session sees only the three paragraphs. The next question you ask gets Claude’s full attention on what you actually said, not on the residual noise from the last hour of work.

The principle: when you need an answer but not the process that produced it, send the work to an agent. You get back the answer, and you protect the focus of the rest of the session.

What an Agent Actually Is

An agent is a sub-Claude instance spawned via the Agent tool. It runs in a fresh context with no memory of your conversation, executes the prompt you give it, and returns a single message back to your main session. Once it is done, it goes away. The next agent you spawn starts fresh.

Three properties matter:

Isolated context: The agent does not see your conversation history. Anything it needs to know has to be in the prompt.
Single return: The agent reports once, at the end. For one-shot agents, you do not interact with it mid-task. (Long-running teammate agents in guide 07 are the exception: you can tab into their sessions directly.)
Tool access: The agent has its own tool set (often the same as yours, sometimes restricted). It can read files, run commands, call MCP tools, spawn its own sub-agents in some configurations.

The mental model: you are giving a smart colleague a specific task with all the context they need, walking away, and reading their report when they come back.

The Clearest Signal

The strongest indicator that work belongs in an agent: you would describe the goal as “find out X” or “summarize Y” rather than “do X step by step.” If you do not need to participate in the discovery, the discovery is a perfect fit for an agent.

Common shapes that fit this pattern:

Codebase questions: “How is feature toggle X evaluated across the workflow service?” “Where does the AVWeb portal call the inspection-services API?” “What test patterns are used for MediatR handlers in this repo?”
Cross-file analysis: “Compare how Repository A and Repository B handle retries. Are they doing the same thing?” “Which of these three classes is the right pattern to copy for a new HTTP client?”
Quick research: “Read the docs at this URL and tell me the three things I need to know to use the new SDK.” “What does the Microsoft.Azure.Functions.Worker 2.x release notes say about gRPC channel handling?”
Status-check work: “Look at PR #1646 and tell me whether the test changes match the spec in the implementation plan.” “Run the test suite and tell me what failed and why.”
Triage: “Skim the last 50 commits on the develop branch and group them by area. I want a sense of what has been changing.”

In every one of these, you want a paragraph or a list, not a transcript of the search.

Built-In Agents

Claude Code ships with a set of agents available immediately. Three are worth knowing about for everyday use.

`general-purpose`

The default. Runs in a fresh context with full tool access and executes whatever prompt you give it. Use this when no specialized agent fits and you want broad capability without thinking about which one to pick.

Spawn a general-purpose agent: "Research how the inspection-workflow service
handles dead-letter queues. Cover the trigger pattern, the retry policy, and
where the messages end up if all retries fail. Reply with a short summary
and references to the key files."

`Explore`

Specialized for fast codebase exploration. Use when the task is “find files matching X,” “search for Y across the repo,” or “answer this question about how the codebase works.” The agent is tuned to be fast and not over-explore.

Spawn an Explore agent (medium thoroughness): "Find all Azure Function entry
points in this repo that subscribe to Service Bus topics. List the function
name, the topic, and the subscription filter for each."

The thoroughness level matters: quick for basic searches, medium for moderate exploration, very thorough for comprehensive analysis. Pick the lowest level that will produce a useful answer.

`Plan`

Designs implementation strategies without writing code. Use when you want a step-by-step plan for a task before you start, or when you want a second opinion on an approach.

Spawn a Plan agent: "I need to add a new HTTP client to inspection-workflow
that calls the inspection-services API to complete a service. There is no
existing client of this type. Design the implementation, including DI
registration, retry/timeout policy, and where the client interface should
live. Return the plan as numbered steps."

Briefing an Agent Well

The agent has zero context from your conversation. Every prompt has to stand on its own. Three rules cover most cases.

1. State the goal up front. The agent’s first read of the prompt should make the objective unambiguous. Avoid “I am working on X and was wondering if you could look into Y…” Lead with what you want.

2. Give the agent the context it needs, not what is convenient. If your question depends on a specific file, name it. If it depends on a recent change, summarize the change. The agent cannot ask you a follow-up question, so anticipate what it needs to know.

3. Specify the response format. “Reply with a short summary.” “Return a bulleted list of files and what each does.” “Cap the response at 200 words.” The agent will optimize toward whatever shape you ask for. If you do not specify, you get whatever shape it picks, which is often longer than you wanted.

A weak prompt: “Look into how dead letter queues work.”

A strong prompt: “Research how inspection-workflow handles dead-letter queues for Service Bus messages. Specifically: what trigger pattern is used, what retry policy is in place, and where DLQ’d messages end up. Reply in under 200 words with file references for each finding.”

The strong version produces a useful answer on the first try. The weak version produces a wandering response that you have to follow up on, which costs you another agent spawn.

When NOT to Use an Agent

Agents are not free. The setup cost is the prompt you have to write, and the iteration cost is real if the agent comes back with the wrong shape of answer. Some work is faster to do in your main session.

Conversational refinement: When you need to iterate (“show me that file,” “now look at the related test,” “what about line 84?”), an agent is the wrong tool. The whole point of an agent is the one-shot return. Refinement is what your main session is for.
Trivial lookups: If the answer is one file or one command away, just do it in the main session. Spawning an agent for “what does this file say” is overhead with no benefit.
Tasks that need your judgment mid-stream: If you would interrupt the agent at step 3 to redirect it, you should be doing the work yourself.
Things you actually want to learn: If the goal is for you to understand the system, watching the searches happen is the value. Send the work to an agent only when the answer is the value, not the journey.

Custom Agents

Claude Code also lets you define your own agents. They live in .claude/agents/{name}.md (project-level) or ~/.claude/agents/{name}.md (global). Each agent is a markdown file with frontmatter (name, description, model, tools) and a prompt body that defines its behavior.

The Sith planning repo uses this heavily, with specialized agents for code review, repo discovery, ADO sync, framework analysis, and others. See github.com/CVNA-Wholesale/Wholesale-Sith-Happens/tree/master/.claude/agents for working examples and AGENT-GUIDELINES.md for the conventions we follow.

The two highest-impact agents in the catalog are worth a closer look because they make the context-boundary value visible at scale.

`feature-decomposition`

View the agent spec

Takes raw feature-level input (a product brief, a PRD, a meeting transcript, a wall of comments) and progressively breaks it down into a story-level outline and full story specs ready for implementation. It is a multi-phase orchestrator: builds a knowledge base from source materials, identifies open questions, defines the shape of work, decomposes into stories, and then delegates the per-story discovery work to planning-discovery agents.

The spec is roughly 350 lines. It carries a methodology guide, story templates, the repo catalog, fetch scripts, and verification rules using the DB and Roslyn MCPs. Running this work inline would mean loading all of that into your main session, then coordinating multiple discovery passes, then synthesizing the results, then writing the stories yourself. The output you actually want is a folder of finished story specs.

`planning-discovery`

View the agent spec

Takes a single business requirement or ADO work item and produces a complete implementation-ready story with self-contained tasks. It does the deep work: figures out which repositories need changes, queries the live database schema to verify column names and types, uses Roslyn to confirm method signatures and interface contracts, finds existing patterns in the codebase to copy, and writes a story spec precise enough that an implementation agent can execute it without asking clarifying questions.

The spec is roughly 420 lines. A single invocation might read 30 files, run 10 schema queries, walk 5 type hierarchies, and synthesize all of that into a story document. The output you want is the story document.

Why these are the right examples

These two agents make the case for the context boundary more clearly than any abstract argument. The amount of detail they consume to produce their output is enormous. The amount of detail you actually need to see is small: a finished story spec, a finished feature breakdown.

If this work happened in your main session, every later turn would be wading through hundreds of file reads, dozens of schema query results, and several rounds of intermediate analysis. By the time you sat down to review the story, your session would already be cluttered with everything that went into producing it, and Claude’s focus on whatever you do next would be diluted by all of it.

By delegating to these agents, you keep the inputs (a brief, a work item number) and the outputs (the spec) in your session, and nothing in between. The 30 file reads and 10 schema queries happen in someone else’s context.

When to build your own

Custom agents are worth building when:

You find yourself writing the same long prompt for the same kind of agent task more than twice.
The task has team-specific context (conventions, file layouts, terminology) that a generic agent would not know.
You want a specific model (opus, sonnet, haiku) for a specific kind of work.
The work routinely involves heavy discovery whose details you do not need to see.

A starter pattern: write the agent as if you were briefing a new team member who has full repo access but zero project context. The agent’s prompt is your standing brief; the per-invocation prompt is the specific assignment.

Foreground vs Background

Agents run in the foreground by default: your session waits for them to finish before continuing. That is the right choice when the agent’s output drives your next decision and there is nothing useful you could do in the meantime.

Background agents (run_in_background: true) are the default you should reach for everywhere else. The session stays interactive while the agent runs. You keep typing, keep asking questions, keep editing files, keep making progress. Claude Code notifies you when the agent completes, so you do not have to poll or check on it. The cost of running an agent in the background versus foreground is essentially zero, and the productivity gain is large because the slow part of the work is no longer blocking you.

Multiple background agents can run in parallel. This is the bigger unlock. If you have three independent questions, you spawn three agents at once and let them all churn. If you need to analyze five repos for the same pattern, you launch five Explore agents in a single message and they run concurrently. By the time you have finished writing the next paragraph or implementing the next change, several reports are waiting for you.

The decision rule:

Foreground: the agent’s answer is the next thing you need, and you have nothing else productive to do while you wait. Common for short, targeted lookups.
Background: the agent’s answer is useful but not blocking, or you have other work you can make progress on in parallel. This covers the vast majority of agent use.
Multiple background agents at once: any time you have two or more independent tasks. Spawn them in a single message so they start concurrently rather than sequentially.

A practical pattern: when you spot three or four exploratory questions stacking up, send them all out as background agents in one batch and keep working on whatever is in front of you. By the time you need their answers, they are sitting in your message queue.

Putting It Together

A common pattern once you build the habit: you are mid-implementation and you hit a question. “How does the existing OdometerUpdate handler publish its event after the transaction commits?” You could read the code yourself, but you do not actually need to know the details. You need one piece of information: the pattern.

You spawn a background general-purpose agent with a tight prompt: “Find OdometerUpdateCommandHandler.cs in inspection-nexus-api. Read it and explain how the event publishing works, specifically the placement relative to the database transaction. Reply in under 100 words.” You keep working on your current change. Thirty seconds later, the answer arrives in your message queue. Your main session never opened the file and never paused.

The bigger pattern is parallelism. Say you are about to start a new feature that touches three repos and you want to know what existing patterns to follow in each. Spawn three background Explore agents in a single message, each pointed at one repo, each asking for the relevant pattern. Continue planning your approach while all three run. By the time you are ready to start writing code, all three reports are waiting for you, and your main session has none of the search noise from any of them, no irrelevant files clouding Claude’s view of what you are doing, just three clean answers ready to act on.

That habit, sending exploratory and analytical work to background agents instead of doing it inline, is the difference between Claude staying sharp and focused for a multi-hour session and Claude getting progressively distracted by accumulated noise. It is also the difference between waiting on Claude and Claude waiting on you.

The next guide in this series covers skills: turning the prompts you write more than twice into named slash commands you can invoke in one keystroke.

Built-in agents: general, Explore, Plan

→