Zero Compensation
What thirty years of software engineering assumed — and what AI removed.
Every developer using AI coding tools has had this experience.
The PR looks fine. The review catches nothing. It merges.
Three days later, something breaks.
The problem wasn’t in the PR. It was baked in long before — in a vague prompt, an undiscussed assumption, a decision not written down.
You know the feeling.
Not anger at the code.
Frustration at the gap that let it through.
The AI coding tool did what was specified. The problem was that the specification itself had defects.
Why CI exists
Not to catch bugs at the end. To move the cost of failure as early as possible in the pipeline.
A failure caught at spec costs almost nothing. The same failure caught in production wastes all the work already completed.
CI is just the engineering formalisation of that principle: the earlier you catch something wrong, the cheaper it is to fix.
Which raises a question that is easy to miss…
If that principle is so well understood, why wasn’t it applied at every stage?
Why CI at the PR and not at the spec, the plan, the build, the verification?
The answer is not that nobody thought of it.
The answer is that they couldn’t afford to.
The compromise that worked
Human developers are expensive and scarce.
Every gate in the process is another person’s time.
A rigorous spec-audit-plan-build-verify loop on every card, with a senior engineer at each stage — that is not a process. It’s a budget crisis.
So teams made rational compromises. CI at the critical points. Trust experience and judgment to cover the gaps.
And it worked.
Because humans are remarkable gap-fillers.
A senior developer reads a vague spec and mentally rewrites it before touching code.
They feel that something is architecturally wrong before they can say why.
They remember the similar change that caused a problem six months ago.
They notice when a ticket description doesn’t match the actual system state. They push back on requirements that sound reasonable but will create technical debt.
That implicit compensation is invisible, constant, and almost impossible to replicate.
The whole model depends on one assumption: that the gaps between gates will be filled by experience, intuition, and professional judgment.
That was a safe assumption when every dev in the pipeline had years of context to draw on.
What AI coding changes
Tokens are not scarce.
Running a full audit pass on every card before the build starts costs cents, not hours. The economic constraint that forced the “CI at critical points only” compromise simply does not exist in the same way.
Execution is suddenly cheap.
But AI has a deficit precisely where humans had abundance.
The implicit compensation — experience, intuition, pattern recognition, professional judgment — is exactly what AI does not have.
It has no memory of the similar change six months ago.
It does not feel that something is architecturally wrong.
It will not push back on a vague spec.
It will build what it is told, confidently, even if what it is told is wrong.
Zero compensation.
Not a flaw. Not a bug. A fact.
And the output does not signal that anything went wrong.
That is the part that makes zero compensation genuinely dangerous.
Traditional bad code has tells. Rushed code looks rushed — inconsistent naming, missing tests, scattered comments.
A senior engineer scanning a pull request can spot it in seconds.
AI-generated code has none of those signals.
The naming is consistent. The tests exist. The documentation is present.
Everything looks like it was written by a careful, thoughtful person.
But it was not written by a careful, thoughtful person.
It was generated by a model that had no idea the spec was wrong, built confidently on assumptions nobody discussed, and produced something that will pass every surface-level check you throw at it.
The problems are architectural, not syntactic. They live in the decisions the model was never told to question.
Where the cost lands
Most teams adopting AI coding tools have left their CI gates exactly where they were.
The code is written fast. PRs arrive more frequently. The code is clean and well-structured.
And the silent decisions — the unresolved assumptions, the undefined error states, the architectural choices nobody discussed — are all still in there.
The model made them at build time, without flagging them, because that is what models do.
A senior developer opens one of those PRs at three in the afternoon. The code is clean. The tests pass.
But something is off — the auth layer has been touched, and there is no mention of session handling anywhere in the ticket.
They dig.
The spec never defined it. The model chose an approach. It is not the approach the team would have chosen.
Unpicking it means unpicking the whole PR.
That developer is now doing the gap-filling that should have happened at spec. Except they are doing it at the end of the pipeline, on code that is already complete, in a review queue that keeps getting longer.
The numbers are starting to confirm what experienced developers already feel.
PRs per author are up twenty percent year-over-year. Incidents per pull request are up twenty-three percent.
Code review time has nearly doubled.
More code, moving faster, with more problems embedded in it, landing on the same number of humans who were already stretched.
One team reported thirty pull requests per day across six reviewers. A reviewer on that team described the experience as being “the first human being to ever lay eyes on this code.”
That sentence should sit uncomfortably with anyone who manages a software team.
The CI has not moved.
The compensation that used to be distributed across the whole pipeline is now compressed into one gate, at the worst possible point to catch a problem.
And the people absorbing it have less time per review than they had last quarter.
Where memory falls short
You might be thinking: the model has context. It has been briefed on the project. Doesn’t that help?
It does. But less than it sounds.
What is actually happening with AI memory is that key information from past sessions is extracted, summarised, and injected back into context at the start of each new conversation.
The AI is not remembering. It is being briefed.
The memory lives in a database. The model starts fresh every time — with notes pre-loaded.
That distinction matters enormously for coding.
A briefing can hold broad context. It cannot hold the full state of a real codebase.
Every decision made in every file? Every convention established across every sprint? The subtle architectural choice from six weeks ago that turns out to matter today?
That level of detail does not survive summarisation.
And code is unforgiving of imprecision in a way that conversation is not. A memory system that gets a function signature slightly wrong, or misremembers an architectural decision, produces confidently wrong code.
A human developer with an imperfect memory compensates with judgment. They feel something is off, and they check.
Memory systems make AI less stateless. They do not make it stateful enough.
They move the problem.
They do not solve it.
The inversion
The resource picture has inverted. But the process picture, for most teams, has not.
Human developer
Execution cost: Expensive
Implicit compensation: Rich — experience, intuition, judgment
Where the human gap-filling occurs: Distributed across every stage
AI developer
Execution cost: Cheap
Implicit compensation: Zero
Where the human gap-filling occurs: Compressed at PR
For thirty years, the rational response was to save execution and spend judgment.
That was not laziness. It was resource allocation.
The resource profile has changed. The execution response, for most teams, has not caught up.
Adversarial review
If execution is cheap and judgment is absent, the response is straightforward: gates at every handoff.
Not just at PR.
At spec. At audit. At plan. At build. At verify.
But here is the part that is easy to miss when you first hear “more gates.”
The gates are not passive checks. They are adversarial reviews.
At each handoff, a separate model — running a separate prompt, with no stake in the previous output — is given one job: find what is wrong with this before it moves forward.
Not summarise it. Not improve it. Challenge it. Return specific, numbered objections.
The card does not proceed until each objection is resolved or explicitly accepted.
This matters because of a specific failure mode.
If you ask the same model that wrote the spec to review the spec, you get agreement. The model has no distance from its own output. It will find what it was looking for. It will miss what it was not.
A separate model, with no prior context, reads the spec and asks: what is ambiguous here? What assumption is unexamined? What has been left undefined that the build stage will have to invent?
In practice, this looks like a spec coming back with numbered flags before a line of code is written.
Flag one: the acceptance criteria do not define the error state.
Flag two: the proposed approach touches the authentication layer, but there is no mention of session handling.
Flag three: “user can edit” — which user roles?
Those are the decisions that, left unresolved, the build model will make silently on its own. And there will be no senior developer at the other end to notice.
This is what the senior engineer was doing all along — internalised, invisible, without a prompt template.
The difference is that what used to be instinct now has to be process, because there is no judgment left in the pipeline to rely on.
What this looks like at scale
In March 2026, Amazon’s e-commerce site went down for six hours.
Checkout, pricing, account pages — all offline.
The cause was traced to AI-assisted code changes deployed to production without adequate review.
The estimated cost was over six million lost orders.
Amazon’s response was a ninety-day safety reset across three hundred and thirty-five critical systems, mandatory two-person review for all production changes, and senior engineer sign-off on all AI-assisted deployments.
The interesting thing about that story is not the outage. Outages happen.
The interesting thing is the response. Every measure Amazon implemented after the fact — the mandatory review, the senior sign-off, the structured approval process — is a gate that should have existed before the code was written.
They built the process after the damage, not before it.
And the process they built is exactly what this essay describes: structured adversarial review at every handoff, because the model will not catch itself.
Amazon had thousands of engineers and could afford to learn the lesson expensively.
Most teams cannot.
A bad spec caught at the authoring stage costs one prompt. The same bad spec caught after the build costs a full rebuild, plus whatever the model broke along the way that nobody noticed.
The question was never whether to have process.
It was always how much you could afford.
For thirty years, the answer was: not much. Trust the people to fill the gaps.
Execution is now cheap. Judgment is absent. The code looks right even when the thinking behind it is wrong.
Zero compensation is not a flaw in AI. It is a fact about AI.
CI throughout the dev cycle is no longer a question.
It is a requirement.