Our Kitchen’s On Fire and the Chef is a Robot

If a chef can cook 10x faster, what happens to the rest of the restaurant?

I can’t open a browser without running into an ad about someone building an app in two hours with an “AI IDE.” And look, I’m sure they did. I’m also sure I’d never ship it. I think the whole conversation is measuring the wrong thing.

I don’t think code was ever the bottleneck. Delivering working software was. I’m increasingly convinced of three things:

Shipping software requires teams to focus on shipping software. Performing work != delivery.
Limiting AI to generating code risks being counterproductive.
If we’re writing less code, we need to spend more time thinking about what gets written.

Work != delivery

A chef who cooks 10x faster still needs runners, plating, timing across tables. The kitchen doesn’t ship a dish when the protein hits temperature. It ships when the guest has everything they ordered, together, hot, correct.

Software isn’t delivered when an agent writes the code. It’s delivered when it’s planned, implemented, reviewed, tested, merged, deployed, and confirmed working. The code is one step. The other steps, the ones that involve human judgment, coordination, and attention, take exactly as long as they always did.

I’ve watched this play out on my own projects. An agent finishes implementation in minutes. Then I spend an hour reviewing it, another hour realizing the approach doesn’t fit the broader architecture, and a third hour unwinding it. The “fast” part was fast. The delivery wasn’t.

Evan Phoenix is candid about this in his piece on working inside the loop at Miren. Past three parallel agents, he says his supervision quality degrades faster than the output improves. More code, less understanding, worse outcomes. The bottleneck was never typing speed.

I think AI tooling needs to address the entire software delivery lifecycle, not just the “write code” step. If we’re not doing that, we’re not making anything better. We’re cheerfully gumming up the works, generating more code, faster, with less context, and then blaming our old processes for not keeping up instead of evolving them alongside the tooling.

AI that only writes code is a footgun

An agent with no context beyond “implement this feature” will produce something. It’ll compile. It might even pass tests. But it won’t necessarily reflect our architecture, our conventions, our team’s decisions about how the system should grow. Left unsupervised, agents average out every codebase they’ve ever seen and hand us the median. That’s fine for scaffolding. It’s dangerous for design decisions.

Markus Eisele describes this as the AI responsibility gap, the structural problem where AI-assisted workflows spread authorship thin enough that nobody can honestly defend the result. The code exists. Who decided it should work this way? Who checked? The answer, too often, is nobody.

I’ve screwed this up personally. I’ve been dealing with client-side state management since Flash (seriously, since ActionScript 2). I know how render cycles work, I know what re-entrant updates look like, I know the shape of the problem. But when I jumped to my first production AI app, built in React, I trusted the agent’s output and ran headlong into endless state update loops that I didn’t realize were baked into the architecture until the app ground to a halt. The agent produced clean, idiomatic React. It also produced a state management pattern I hadn’t vetted, built on assumptions I hadn’t reviewed, and I didn’t catch it because I’d let the speed of delivery substitute for the work of understanding.

The problem isn’t that the agent wrote bad code. The problem is that I used it as a code generator when I should have used it as a delivery tool. There’s a difference. A code generator takes a prompt and produces files. A delivery tool understands the broader workflow: what’s the plan, what are the constraints, what happens after the code is written, who reviews it and how.

I think limiting agents to code generation, treating them as fast typists and nothing else, actually makes things worse. We get more code, faster, with less context, and the humans downstream have to absorb all the complexity that used to get worked out during the slower, more deliberate process of writing it ourselves.

If we’re writing less code, we need to think more

If agents handle more of the implementation, engineers don’t have less to do. We have different things to do. The time that used to go into typing out a feature can go into planning it better, reviewing it more carefully, thinking harder about whether this is the right approach before anyone, human or agent, starts writing.

This is the “same as it ever was” part. Good teams have always planned before building. Good engineers have always scoped work into small vertical slices, one concern per PR, reviewable by a human with a real attention span. Good organizations have always encoded their standards somewhere durable (architectural decision records, linting rules, merge gates) rather than relying on tribal knowledge.

What’s changed is the cost of skipping those things. When the human was also the typist, slow implementation was an accidental backstop. We had time to think while we coded. We noticed architectural drift because we were reading every file we touched. The slowness was load-bearing.

Agents removed that backstop. Now, if we don’t plan deliberately, scope intentionally, and review with real attention, there’s nothing in the pipeline to catch it. The code arrives too fast for laziness to be self-correcting.

Evan Phoenix nails it: if a PR is too big for a human to review, it was too big for an agent to write. Not because the agent can’t produce it. It can, happily, across twelve files and a thousand lines. Because nobody on the other end can honestly evaluate it. And “LGTM” on a thousand-line diff isn’t a review. It’s a rubber stamp with a keyboard shortcut.

But I’d go further: it’s not just about reviewing. Agents can introduce new concepts, patterns, and techniques we’ve never seen before. That’s wonderful. That’s one of the genuinely exciting parts of working this way. But if we don’t learn them, understand them, and evaluate whether they’re right for our system, as part of both review and planning, we’re letting ourselves and our teams down.

What we’re building into Cate

This is one of many hypotheses built into Cate: that the right place to enforce delivery discipline is the orchestration layer, the thing that sits between “I have a problem to solve” and “an agent is writing code.”

Planning before typing. Cate’s Build workflow starts with a planning agent that researches the problem, proposes an approach, and decomposes it into discrete issues in the team’s issue tracker, not a text file on someone’s desktop. Each issue captures scope, acceptance criteria, and architectural context. The plan is a shared artifact, reviewable and debatable before implementation starts, when changes are cheap.

Scoped units of work by default. Because Cate decomposes plans into individual tracked issues, each agent session maps to one issue. One concern. One PR. Big concern? An epic as a graph of many small PRs. The constraint isn’t willpower, it’s the workflow. The agent doesn’t get a vague prompt and infinite runway. It gets a scoped issue with context and boundaries.

Your rig, your rules. Cate works with existing repo-level instructions, rule files, and architectural constraints. The agents work inside your environment, with your toolchain, subject to your merge gates. We’re not replacing your setup. We’re putting it to work in a structured loop. The judgment about what’s allowed in your system stays where it belongs—with your team, encoded in artifacts that survive contact with a Tuesday afternoon.

Human checkpoints that aren’t optional. An agent finishes work, surfaces what it did and why, and the PR moves to human review with that context attached. The reviewer isn’t staring at a raw diff wondering what the agent was thinking. They’re looking at decisions, rationale, and trade-offs alongside the code.

None of this is revolutionary. It’s planning, scoping, guardrails, and review…same stuff, different day! What Cate adds is the structure to make those things happen consistently, even when the agents are fast enough to tempt you into skipping them.

The typing got faster. Now what?

2025 was the year we sped up typing. Agents got fast, tools got flashy, and the demos were genuinely impressive. But demos aren’t delivery.

I think 2026 needs to be the year we remember that software delivery is more than code generation. It’s planning, scoping, reviewing, integrating, and owning the result. The teams that figure out how to apply AI to that whole pipeline, not just the typing step, are the ones that’ll actually ship faster. Everyone else will generate a lot of code and wonder why it doesn’t feel like progress.

Maybe I’m wrong. But I haven’t seen the version of this that skips the fundamentals and works. What I’ve seen is same as it ever was, just compressed. The question isn’t whether you can generate code faster. It’s whether your delivery pipeline keeps up.

That’s the bet we’re making with Cate: a workflow that enforces the discipline instead of hoping for it. Download Cate or get in touch.