← back to home
Proposal — targeted for a future release

EPAV as an Autonomous Loop

Loop engineering is the discipline of designing the loop an AI agent runs inside — act, observe, decide, repeat — until a verifier says done. EPAV is already an engineered loop with a human gate and a hard verifier. This guide shows how to close it, so the apply ↔ validate cycle retries itself within an approved plan instead of waiting for a human to crank every phase.

Why a Loop

Today, every phase transition in EPAV is a human action. If /validate fails, a human reads the failures and manually restarts. That works — but the agent is capable of consuming its own validation failures and fixing them, and the whole point of a verifier is that it can be trusted to say "not done yet" without a human re-reading everything.

Two principles from loop engineering drive this design:

EPAV's existing one task, no scope creep rule is what makes autonomy safe here: retries can only iterate within the approved plan — they can never expand it. The approval gate is the contract; the loop just fulfills it.

The Idea in One Picture

Today — open cycle, human cranks each turn

/evaluate/plan[human approves]/apply/validate → done
                                                    │
                                                  fail → human reads, human re-runs

Target — human approves the plan, inner loop closes itself

/evaluate/plan[human approves plan] ──► INNER LOOP (autonomous)
                                              ┌──────────────────────┐
                                              │ /apply               │
                                              │    ↓                 │
                                              │ /validate            │
                                              │    ├─ PASS → done ───┼──► report to human
                                              │    └─ FAIL           │
                                              │        ↓             │
                                              │ retry budget left?   │
                                              │    ├─ yes → /apply   │
                                              │    │   (failures as  │
                                              │    │    new context) │
                                              │    └─ no → escalate ─┼──► human decides
                                              └──────────────────────┘

What EPAV Already Does Well as a Loop

What's missing is only the re-entry logic — the loop doesn't close itself, the verifier could be deeper, and nothing measures the loop. The changes below fix exactly that.

What Changes, File by File

All changes live in three skill files under tools/epav/skills/. No Python changes — the loop is prompt-orchestrated, so rollback is just reverting three files.

FileChangeEffort
epav.mdRetry loop between APPLY and VALIDATE, retry budget, escalation~20 lines
validate.mdHarden the verifier; emit a machine-readable failure block~15 lines
apply.mdAccept retry context (previous iteration's failures)~10 lines

1. epav.md — close the loop

Replace Steps 3–4 of the orchestrator with an iterating inner loop after the existing approval gate:

### Step 3+4 — APPLY ↔ VALIDATE inner loop (autonomous after approval)

Set iteration = 1, MAX_ITERATIONS = 3.

1. Run /apply. On iteration > 1, pass the previous VALIDATION FAILURES
   block as the apply context — fix exactly those failures, nothing else.
2. Run /validate in full (build, tests, criteria, review).
3. Zero BLOCKERs and zero FIX NOWs → exit loop, output VALIDATE COMPLETE.
4. Failures remain and iteration < MAX_ITERATIONS:
   increment, announce "Iteration N of MAX: re-entering APPLY with M
   failures", go to 1. Do NOT ask the user between iterations.
5. Failures remain at MAX_ITERATIONS → ESCALATE and wait for the user:

   LOOP ESCALATION
   ───────────────
   Iterations used:   3/3
   Still failing:     <remaining VALIDATION FAILURES>
   What was tried:    <one line per iteration>
   Recommendation:    <revise plan | human decision | blocked on X>
Loop rules — non-negotiable: the loop may only fix failures listed by VALIDATE; new work goes to knowledge/retros/. If the same criterion fails twice with the same root cause, escalate early. "stop" / "abort" / "cancel" interrupts immediately.

2. validate.md — harden the verifier

(a) Make verification mechanical, not optional. The check stack always runs in order:

1. Build:        npm run build (or equivalent)   — must exit 0
2. Tests:        npm test (or equivalent)        — must exit 0;
                 new behavior needs a covering test
3. Type check:   tsc --noEmit / mypy, if present
4. Review:       run /code-review on the diff;
                 its blockers are [BLOCKER]s
5. Criteria:     each acceptance criterion — PASS / FAIL / PARTIAL

(b) Emit failures in a fixed, machine-readable block so /apply can consume them next iteration. This block is the loop's feedback signal — what makes a retry targeted instead of a blind re-attempt:

VALIDATION FAILURES (iteration N)
─────────────────────────────────
- [BLOCKER|FIX NOW] <check that failed>: <exact error / criterion>
  evidence: <test name, build output line, or review finding>
  suspected cause: <one line>

3. apply.md — accept retry context

## Retry mode (loop iterations 2+)

If a VALIDATION FAILURES block is provided as context:
- Scope for this iteration is fixing exactly those failures. The
  approved plan still bounds all work; failures narrow it further.
- Do not re-implement steps that already passed validation.
- A failure that can't be fixed within the plan's scope is a plan
  problem, not an apply problem — stop and report it.

Choosing the Retry Budget

BudgetBehaviorWhen to use
1 (no loop)Today's behaviorHigh-risk changes, migrations
3 (default)Fixes the common "test failed on first pass" casesNormal feature work
5For flaky / integration-heavy suitesOnly with fast, reliable verifiers

Two failures of the same criterion with the same root cause should escalate immediately regardless of remaining budget — a loop that isn't converging by iteration 2 almost never converges by iteration 5. Optionally expose the budget as an argument: /epav <task> --max-loops 5.

Loop Telemetry

Append one line per completed cycle to knowledge/retros/loop-log.md:

| date | task | iterations | exit (pass/escalated) | failing stage(s) |

After a few sprints this answers the questions that let you tune the loop: How often does iteration 1 pass? Where do failures cluster — tests, review, or criteria? Is the budget too small or wasted?

Closing the Outer Learning Loop (optional, phase 2)

/validate already writes discovered patterns to knowledge/, but nothing forces the next cycle to read them. One-line fix in evaluate.md: make knowledge/patterns/ and the loop log a mandatory context load (currently "check if present"). That turns EPAV from a loop that self-corrects within a task into one that improves across tasks.

What Deliberately Does NOT Change

Rollout

#Step
1Edit the three skill files in tools/epav/skills/
2Bump the version and publish to PyPI
3Consuming projects pick it up with nexus update + nexus sync
4Dogfood on one low-risk task with MAX_ITERATIONS = 3 and review the first escalation report before trusting it on real work

What a Loop Run Looks Like

> /epav "Add CSV export to the reports page"

EVALUATE SUMMARY ...
PLAN ... blast radius ...
Plan ready. Reply go to implement.

> go

[iteration 1/3] APPLY ... APPLY COMPLETE
[iteration 1/3] VALIDATE ... 2 failures:
  - [BLOCKER] tests: exports_csv_handles_empty_rows FAILED
  - [FIX NOW] review: unhandled null in formatRow()
Iteration 2 of 3: re-entering APPLY with 2 failures.

[iteration 2/3] APPLY (retry mode: fixing 2 listed failures) ... COMPLETE
[iteration 2/3] VALIDATE ... all checks pass.

VALIDATE COMPLETE
─────────────────
Criteria passed:  4/4
Iterations used:  2/3
Issues fixed:     empty-rows test, null guard in formatRow
Backlog items:    (none)

Task complete. Ready for the next /evaluate.
built by CoderStudio Labs