Pilots stall, or pilots compound. There isn't a third option.

Sid Mookerji wrote a piece last week about the pilot trap in retail AI. He named something every engineering leader I talk to has felt: the pilot dazzles in a controlled environment, then collapses the moment it touches the real system. His diagnosis pointed at the usual suspects. No structured data. Mismatched processes. Unclear ownership.

He’s right about all of it. But the trap he’s describing isn’t a retail problem. It’s what happens any time AI adoption gets treated as a tool rollout instead of a change in how the work gets done.

Greenfield was always the easy half

The current generation of AI tools was tuned and demoed on greenfield code, fresh data, and one team’s well-bounded workflow. Those are the conditions that make any software tooling look like magic. A new repo. A clean slice of the problem. A single owner who can say yes.

Then the pilot leaves the lab.

Now it’s looking at a .NET monolith stitched together from three acquisitions. A billing platform whose business logic lives in a stored procedure that nobody has touched since 2014, because the person who wrote it left in 2017 and nobody else understands what it does. Four teams with four definitions of “customer.” A compliance posture that will not accept “the model decided” as an answer to anything.

The AI was good. The conditions it was good in just don’t exist in production.

What the diagnosis actually means

When people say “no structured data,” what they usually mean is that the codebase was never required to be queryable by anything other than the application that wrote it. The data is there. It’s encoded in twenty years of business rules, edge cases, and conventions that live in people’s heads. A retrieval pipeline can’t extract what was never written down.

“Differing processes” is the polite framing. The real version is that the business runs on locally optimized workflows, and nobody has the authority or the budget to harmonize them. AI doesn’t fix that. It inherits it.

“Unclear ownership” is what the org chart looks like the moment something breaks. A pilot can dodge that question. Production can’t.

The shape of the fix

You don’t escape this by buying better AI. You escape it by changing the practice the AI lands in.

Three things have to be true.

First, the system has to be readable before the model touches it. Business rules surfaced. Unsafe surfaces flagged. Dependencies traced. We use PlayerZero as the spine for this work, with its simulation layer letting agents propose changes without breaking production. Whatever stack a team uses, the principle holds: a model can only be as good as its model of your system.

Second, the work has to be decomposed into measurable blocks. A pilot has no exit criteria, which is part of why so many of them quietly become permanent. A block does. Two to six weeks, fixed price, a defined deliverable, an outcome you can measure from the outside. Blocks compound. Each one permanently lowers the cost of working in the surface area it touched.

Third, somebody has to operate the system instead of handing it off. Pilots fail at the handoff. Practices don’t have a handoff. The agents live across CI, support, ops, and reporting, and they get tended like anything else in production.

Compounding is the real metric

The pilot trap doesn’t announce itself as failure. It shows up as a slow stall. The pilot worked. The next one was harder. The third one stalled out. The fourth got deferred to next quarter, then the quarter after that. Eighteen months in, the team is back to sustenance work and there’s a procurement bill stapled to it.

The right question to ask isn’t whether the pilot worked. It’s whether each block of work makes the next block cheaper. That’s the curve that matters, and it’s the one I’d push leadership to track.

In the engagements where this has worked for us, that curve bends. A lean healthcare engineering team cut maintenance overhead roughly in half. A research SaaS company in a regulated space saw productivity climb about 4× over eighteen months while their customer NPS went up 51 points. The curve bends because the practice got better, not because the tool got newer.

The take

Sid named the symptom accurately. The cause is bigger than retail. Any enterprise running AI on systems that were built before AI will hit the same wall, in roughly the same way: the pilot survives the lab, meets the legacy, and stops compounding.

A different pilot won’t fix that. A different practice might.

If you’re somewhere on that curve and it’s starting to flatten, get in touch. We’ve seen this one before.

Greenfield was always the easy half

What the diagnosis actually means

The shape of the fix

Compounding is the real metric

The take

New field notes, when they ship.

Sound familiar? Let's talk.

Sound familiar?
Let's talk.