claude-code/09

Reflection and feedback loops

Manual reflection — telling Claude to update skills, agents, and CLAUDE.md directly — plus the structured capture-analyze-apply loop, the process-feedback agent, and how to start the loop on your team.

📝 Wholesale AI Champions ⏱ 11 min read 📚 Team workflows

The previous guides cover the tools you use to get work done. This one covers the system that makes those tools better over time. Every implementation session produces signal about what worked and what did not. That signal either evaporates when the session ends or it gets captured, analyzed, and turned into concrete changes to the system that produced it. The difference between a workflow that stays static and one that improves is whether there is a mechanism that closes that loop.

Start Small: Manual Reflection

Before you build any infrastructure, the simplest form of reflection is just telling Claude what to change. Skills, agents, and CLAUDE.md are all markdown files — Claude can edit them directly.

Refining a skill:

"The docs-writer skill keeps adding overly long introductions.
Update it to keep opening paragraphs to 1-2 sentences."

Claude edits .claude/commands/docs-writer.md and the behavior changes immediately.

Refining CLAUDE.md:

"We keep getting test files without assertions. Add a rule to
CLAUDE.md: every test must have at least one meaningful assertion."

Claude adds the rule to your project’s CLAUDE.md. Every future session loads it automatically.

Refining an agent:

"The code-reviewer agent flags too many minor style issues. Update
its system prompt to only report Critical and Important severity."

Claude edits agents/code-reviewer.md and the agent’s behavior shifts next time it’s spawned.

This is one of the most underused features of Claude Code. Your configuration files are not static — they are living documents that should evolve as you learn what works. Treat every friction point as an opportunity to update a skill, agent, or CLAUDE.md rule. The structured loop below is what you scale to once manual edits start to feel inefficient.

The Core Idea

AI-assisted development is iterative in two senses. The obvious one is within a session: you implement, test, refine, repeat. The less obvious one is across sessions: patterns emerge, friction recurs, conventions get discovered, rules get violated, and the system either learns from this or it does not.

The feedback loop we built has three stages:

Capture: every implementation session produces a structured artifact documenting what happened, what worked, what did not, and what the developer would change.
Analyze: a dedicated agent reads all unprocessed artifacts, identifies recurring patterns, searches the existing rules and agents for coverage gaps, and proposes specific changes.
Apply: approved changes are applied directly to the rule files, agent specs, templates, and guides that govern future sessions.

The output of stage 3 becomes the input to stage 1 of the next session. The planning system, the agents, and the skills all improve from the evidence of their own use.

Stage 1: Capture (RPIV Sessions)

The RPIV (Research, Plan, Implement, Validate) session guide produces a structured artifact for every implementation session. The artifact is committed to .process-feedback/{workItemId}-rpiv-session.md in the implementation repo.

Guide 08 covers the validation aspects of RPIV in detail. For the feedback loop, the important sections are the Session Log and the Reflection.

The Session Log captures observations as they happen: decisions made and why, friction encountered and what caused it, deviations from the plan and whether they were justified, moments where the agent needed correction, moments where the spec was wrong. This is raw signal, recorded in real time rather than reconstructed after the fact.

The Reflection distills the session into three categories:

What worked well: specific things that went smoothly, which parts of the workflow or specifications contributed to success, and why. Not “the story was good” but “the story’s acceptance criteria were specific enough to validate without asking questions.”
What didn’t work well: specific friction points, blockers, and inefficiencies. Not “testing could be improved” but “spent 20 minutes figuring out the test pattern because no examples existed in the repo.”
Suggestions for improvement: concrete, actionable changes to the workflow, tooling, specifications, codebase, or process based on what happened.

The artifact also includes a Verification Integrity section (covered in guide 08) that captures any verification corruption patterns observed during the session. These observations feed directly into the team’s improvement metrics.

The key property of the RPIV artifact: it is committed to source control in the implementation repo. It does not live in someone’s notes, a Slack thread, or a conversation that gets closed. It persists in a known location where the analysis stage can find it.

The /sith:rpiv-session skill (guide 01 introduced the concept of skills) automates the setup: it detects the work item, creates the artifact, captures the baseline, and guides the developer through the session structure. Running it at the start of every session is the single action that feeds the rest of the loop.

Stage 2: Analyze (Process Feedback)

The analysis stage is where individual session observations become systemic improvements. The mechanism is a dedicated agent that reads all unprocessed feedback artifacts and cross-references them against the current state of the planning system.

The artifacts land in the planning repo’s .process-feedback/ folder. Implementation repos commit their RPIV artifacts locally, and the planning repo accumulates feedback from across the team. The separation matters: implementation repos track what happened in that specific codebase; the planning repo tracks what it means for the system as a whole.

Running /sith:process-feedback launches the analysis. It:

Reads every .md file in .process-feedback/ (excluding processed/ and runs/)
Reads the existing rules, agent specs, templates, and guides that govern the planning and implementation process
Identifies patterns that recur across sessions: friction that appeared in 2+ artifacts, conventions that were violated because they were undocumented, gaps in templates that caused unnecessary rework
For each pattern, searches the existing system to determine whether coverage already exists. If a rule file already addresses the issue, the pattern is noted but no change is proposed. If no coverage exists, a specific edit is proposed: which file to modify, what to add, and why.
Writes a run report to .process-feedback/runs/run-{date}.md

The run report is the deliverable. Each recommendation includes:

The pattern observed and which sessions exhibited it
Whether existing rules cover it (and if so, where)
A proposed edit with the exact file, section, and content
The rationale linking the evidence to the proposed change

The analysis agent does not apply changes. It proposes them. The developer reviews each recommendation and approves, rejects, modifies, or defers. Rejected and deferred items include the developer’s stated reason so that future analysis runs do not blindly re-propose the same change.

This is what makes the loop self-correcting rather than self-reinforcing. The analysis agent sees the patterns; the human decides which ones are worth acting on. Bad proposals get rejected with context. Good proposals get applied. The system improves only in directions the team validates.

Stage 3: Apply (System Changes)

Approved changes are applied by a second agent (process-feedback-implementation) that takes the run report and the list of approvals and makes the edits. The changes land in the files that govern how agents, skills, and planning processes behave:

Rule files (planning/.agent-rules/): discovery checklists, quality standards, template requirements, technical patterns
Agent specs (.claude/agents/): behavioral rules, methodology references, verification requirements
Skills (.claude/skills/): workflow steps, argument handling, output conventions
Templates (planning/.agent-rules/core/, validation/templates/): story and task structure, validation plan format
Guides (planning/guides/): session guides, deployment process, known-issue catalogs

After applying changes, the implementation agent archives fully-addressed feedback files to .process-feedback/processed/ and creates .todo/ items for deferred investigations. The processed folder serves as a historical record; the todo folder tracks work that needs future attention.

The result: the next agent that runs a discovery pass, writes a story, or implements a task operates under the updated rules. The friction that was documented in an RPIV artifact two weeks ago is structurally prevented from recurring.

What This Looks Like in Practice

A concrete cycle from our team’s recent work:

Session: a developer implements a story in inspection-workflow. During implementation, the agent removes LaunchDarkly toggle infrastructure (NuGet package, DI registration, health check) as part of removing the last toggle usage. PR review catches this and requires rework. The developer notes in the RPIV reflection: “the team convention is to retain toggle infrastructure even when the last usage is removed.”
Analysis: the next process-feedback run reads this artifact alongside three others. It identifies “undocumented team convention” as a recurring pattern (this is the third time a convention existed only in developers’ heads and was violated by an agent). It searches planning/.agent-rules/technical/team-conventions.md and finds no entry for toggle infrastructure retention. It proposes adding a convention entry with the rule and the evidence.
Apply: the developer approves. The implementation agent adds the convention to team-conventions.md. The planning-discovery agent now loads this file during story creation and includes the convention in task specs for any story that touches toggle infrastructure.
Next session: a different developer implements a toggle removal story. The task spec includes “Preserved Behaviors: retain LaunchDarkly infrastructure (NuGet package, DI registration, health check) even when removing the last toggle usage.” The agent follows the spec. PR review passes without rework.

The convention went from “in someone’s head” to “violated, caught in PR review, noted in RPIV” to “documented in the system, enforced in future specs.” That progression happened through the feedback loop, not through a planning meeting.

The Run History

Each analysis run produces a report in .process-feedback/runs/. The reports accumulate as a record of what the system has learned:

Run 2026-03-27 (first run): 18 items analyzed, 5 approved. Agent tool unavailability fallback, data migration script standards, behavior reversal test enumeration, mixed .NET Framework/Core cross-compilation risks, debug artifact checks.
Run 2026-04-02: 5 items, 4 approved. Resource file conventions, dotnet.exe build failure detection, story update validation, package source documentation.
Run 2026-04-04: 21 items, 12 approved. Largest single run. Consolidated run format, Scrutor compatibility detection, SqlClient Extensions.Azure requirement, rg fallback to grep, Roslyn structural verification enforcement.
Run 2026-04-08: 21 items analyzed. Non-migration discovery patterns, portal mirrored business logic detection, ADO auth failure handling.

Each run builds on the prior ones. The analysis agent reads prior run reports to avoid re-proposing rejected items and to track whether deferred items have accumulated enough new evidence to warrant reconsideration.

Where the Artifacts Live

Artifact	Location	Purpose
RPIV session artifacts	`{implementation-repo}/.process-feedback/{workItemId}-rpiv-session.md`	Per-session evidence: what happened, what worked, what didn’t
Discovery session artifacts	`{planning-repo}/.process-feedback/{workItemId}-discovery-session.md`	Per-discovery evidence: what the planning agent found and struggled with
Run reports	`{planning-repo}/.process-feedback/runs/run-{date}.md`	Per-analysis output: patterns found, proposals made, decisions recorded
Processed artifacts	`{planning-repo}/.process-feedback/processed/`	Archive of fully-addressed feedback files
Deferred work	`{planning-repo}/.todo/`	Items flagged for future investigation

Starting the Loop on Your Team

The infrastructure is minimal. The three things you need:

A .process-feedback/ folder in your implementation repos, gitignored if you prefer or committed if you want the artifacts in source history. This is where RPIV session artifacts land.
A session guide that your developers run at the start of implementation sessions. Ours lives at planning/guides/rpiv-session-guide.md and is invoked via /sith:rpiv-session. The guide does not need to be complex. It needs to produce a structured artifact with a Session Log, a Reflection, and a Verification Integrity section. The structure is what makes the analysis stage possible.
A periodic analysis pass over the accumulated artifacts. This can be as simple as reading the artifacts yourself and identifying patterns, or as automated as our /sith:process-feedback skill that launches an agent to do the cross-referencing. The analysis is valuable at any level of automation; starting manually and adding tooling as the volume grows is a reasonable path.

The feedback loop does not require the full planning repo infrastructure to be valuable. A team that captures RPIV artifacts and reads them together once a sprint will find patterns. A team that feeds those patterns back into their CLAUDE.md, their agent specs, or their conventions documentation will stop repeating the same mistakes. The tooling accelerates this, but the habit is the foundation.

Reflection at Different Cadences

Where you reflect — and what you change — depends on the scale of the signal.

Level	What to reflect on	What to update
Session	”That skill kept doing X wrong”	Edit the skill or CLAUDE.md directly
Feature	”Planning missed Y, discovery didn’t catch Z”	Write a reflection file for future planning skills to load
Quarter	”We keep hitting the same class of problems”	Update team conventions, add new verification gates, refine agent prompts

Start with session-level reflection — it’s free and immediate. Graduate to structured reflection files when you’re running multi-step workflows that span sessions. Bring in the full capture-analyze-apply loop when feedback volume across the team makes manual review impractical.

←

Testing, validation, and RPIV

LLM orchestration patterns

→