AAP Studio

AAP = Agent Action Plan

A studio where you approve an AI coding agent's plan before it writes a single line of code. The plan is the thing you review, not 800 lines of generated diff after the fact.

Product Design|AI Agents|Developer Tools|Human-in-the-Loop

Concept, design + build, solo·2026

Try the concept ↓

The problem

An autonomous coding agent is only as useful as it is trustable, and right now the trust runs backwards.

The pitch for these agents is autonomy: describe the work, walk away, come back to finished code. But autonomy is exactly what makes them frightening to ship. You find out what the agent did after it did it, by reading the diff, or worse, in production. The faster the agent, the more code there is to audit, and the audit is the slowest, most human part of the whole loop. So you either rubber-stamp work you didn't really read, or you read everything and lose the speed you were paying for. Autonomy gives with one hand and takes with the other.

The instinct is to put a human "in the loop." But where in the loop is the entire design problem. Review every line and you've just hired yourself as the agent's intern. Review nothing and you've shipped a stranger's code under your name. Most tools pick one of those two failure modes and call it a workflow.

The real question isn't "should a human review the agent's work." It's "what is the smallest, highest-leverage thing a human can review that still earns the trust." Reviewing output is too late and too much. There has to be an earlier, smaller object, something you can read in two minutes that determines what the next two hours produce.

The approach

The reviewable unit is the plan, not the code. That's the whole thesis. Before the agent writes anything, it produces an Agent Action Plan, a structured, human-readable proposal of what it intends to do, and waits. You read it, you change it, you approve it. Only then does it build.

The studio walks one job through four stages, and the human's weight sits almost entirely in stage two.

Requirements

Describe the work in plain English, point it at the repos. The messy human intent goes in here, once.

Review Plan

the design

The agent drafts a tech spec and an action plan, the files it will touch, the approach, the dependencies, the risks it spotted. This is the screen that matters. You read the plan, revise what's wrong, flag what's missing, and approve. Small enough to actually read, consequential enough to be worth reading. One reviewer, one object, two minutes.

Execution

Approved, the agent builds, with runtime validation checking its own work against the spec as it goes, so the plan stays honest during execution, not just at approval.

Delivery

Pull requests across the affected repos, plus a last-mile guide for the parts a human should finish. Not "here's a black box of done," but "here's what I did, here's what's left, here's where to look."

~2 min

to review the plan

Instead of auditing hundreds of lines of generated code after the fact.

decision point

One human, one approval gate carries the trust for the whole build, not spread thin across every line.

Continuous

validation

Runs through execution, not just once at the end, so an approved plan can't drift mid-build.

Hours

prompt → reviewed PRs

The human spends minutes; the agent spends hours.

Try it

Read the plan. Approve it. Watch it build.

This is the actual interactive concept, not a video. Move through the four stages. Stage 02 is the design: the plan is the thing you approve, before any code exists.

Where this breaks

Putting the human at the plan instead of the code solves the speed-vs-trust problem by moving it. Here's where it moves to, honestly.

A plan you don't fully understand is still a rubber stamp.

Reviewing the plan only earns trust if the reviewer can actually evaluate it. Approve a plan whose implications you don't grasp and you've recreated the original problem one level up, now with a false sense of having done diligence. The interface makes approval easy; it can't make judgment real.

Plan-level review is blind to implementation-level bugs.

A perfect plan, faithfully executed, can still produce subtly broken code, an off-by-one, a race condition, a wrong assumption the plan was too coarse to mention. Approving the "what" doesn't approve the "how," and the how is where most bugs live. Runtime validation catches some of this; it doesn't catch the bug nobody specified a check for.

Trust decays into habit.

The first ten plans, you read carefully. By the fiftieth approval, "Review Plan" becomes a button you press on the way to lunch. The pattern's whole value depends on the human staying engaged, and nothing in a smooth approval flow fights the slide toward autopilot. Designing against my own frictionlessness is an unsolved tension here.

What happens when execution diverges from the approved plan?

Right now the concept assumes the build follows the plan. Real agents hit surprises mid-execution and adapt. The honest version needs a story for re-approval, when does a deviation require coming back to the human, and when is that just more interruption tax? I haven't designed that gate yet.

The “taste in the loop” is one canned voice.

The reviewer in this concept is essentially me, encoded once. A real version would model different reviewers with different standards, a security-minded reviewer, a ship-it reviewer, a junior who needs more scaffolding. The plan that's "good enough to approve" depends on who's reading. That variability is the next design problem, and it's a big one.

What I learned

The hardest part wasn't designing the plan review, it was noticing that the smoother I made approval, the easier it got to approve without reading, so good UX and real diligence were quietly working against each other.

"Human in the loop" turned out to be a placement problem, not a feature: the same person reviewing the same work is either essential or useless depending entirely on where in the sequence you put them.

And building this with a coding agent taught me the thing the concept is about, I trusted the output far more once I could read the plan first, which is either proof the pattern works or just confirmation bias I designed for myself, and I'm honestly not sure which.

Try the live concept→

Explore more case studies

Design System