Pilots stall, or pilots compound. There isn't a third option.
Tools can be rewritten. Applications have to be re-engineered. Why every enterprise AI pilot stalls — and why replacing the legacy starts the same loop over.
Sid Mookerji wrote a piece last week about the pilot trap in retail AI. He named something every engineering leader I talk to has felt: the pilot dazzles in a controlled environment, then collapses the moment it touches the real system. His diagnosis pointed at the usual suspects. No structured data. Mismatched processes. Unclear ownership.
He’s right about all of it. But the trap he’s describing isn’t a retail problem. It’s what happens any time AI adoption gets treated as a tool rollout instead of a change in how the work gets done.
Greenfield was always the easy half
The current generation of AI tools was tuned and demoed on greenfield code, fresh data, and one team’s well-bounded workflow. Those are the conditions that make any software tooling look like magic. A new repo. A clean slice of the problem. A single owner who can say yes.
Then the pilot leaves the lab.
Now it’s looking at a .NET monolith stitched together from three acquisitions. A billing platform whose business logic lives in a stored procedure that nobody has touched since 2014, because the person who wrote it left in 2017 and nobody else understands what it does. Four teams with four definitions of “customer.” A compliance posture that will not accept “the model decided” as an answer to anything.
The AI was good. The conditions it was good in just don’t exist in production.
What the diagnosis actually means
When people say “no structured data,” what they usually mean is that the codebase was never required to be queryable by anything other than the application that wrote it. The data is there. It’s encoded in twenty years of business rules, edge cases, and conventions that live in people’s heads. A retrieval pipeline can’t extract what was never written down.
“Differing processes” is the polite framing. The real version is that the business runs on locally optimized workflows, and nobody has the authority or the budget to harmonize them. AI doesn’t fix that. It inherits it.
Replace, and the loop restarts
Some teams hit the wall and decide the legacy was the problem. They bet on the new pilot as the replacement. Eighteen months later the pilot is in production. Now it’s the legacy. New business rules sit on top of it. New integrations depend on it. New people joined the team who don’t know how it was put together. The wall is back, in a different room.
Tools can be rewritten. Applications have to be re-engineered. AI is one of the most powerful tools ever invented for re-engineering an enterprise application — as long as the practitioners using it know what they’re doing. The dump-and-replace move treats the new tool as the answer and the old system as the obstacle. Both readings are wrong. The obstacle is the practice. The fix is engineering.
There isn’t a third option. There’s the practice you adopt now and the loop you keep paying for until you do.
The shape of the fix
You don’t escape this by buying better AI. You escape it by changing the practice the AI lands in.
Three things have to be true.
First, the system has to be readable before the model touches it. Business rules surfaced. Unsafe surfaces flagged. Dependencies traced. We use PlayerZero as the spine for this work, with its simulation layer letting agents propose changes without breaking production. Whatever stack a team uses, the principle holds: a model can only be as good as its model of your system.
Second, the work has to be decomposed into measurable blocks. A pilot has no exit criteria, which is part of why so many of them quietly become permanent. A block does. Two to six weeks, a defined deliverable, an outcome you can measure from the outside. Blocks compound. Each one permanently lowers the friction of working in the surface area it touched.
Third, somebody has to operate the system instead of handing it off. Pilots fail at the handoff. Practices don’t have a handoff. The agents live across CI, support, ops, and reporting, and they get tended like anything else in production.
Compounding is the real metric
The pilot trap doesn’t announce itself as failure. It shows up as a slow stall. The pilot worked. The next one was harder. The third one stalled out. The fourth got deferred to next quarter, then the quarter after that. Eighteen months in, the team is back to sustenance work and there’s a procurement bill stapled to it.
The right question to ask isn’t whether the pilot worked. It’s whether each block of work makes the next block cheaper. That’s the curve that matters, and it’s the one I’d push leadership to track.
In the engagements where this has worked for us, that curve bends. A lean healthcare engineering team cut maintenance overhead roughly in half. A regulated healthcare SaaS company saw productivity climb about 4× over eighteen months while their customer NPS reversed from −41 to +52. The curve bends because the practice got better, not because the tool got newer.
The take
Sid named the symptom accurately. The cause is bigger than retail. Any enterprise running AI on systems that were built before AI will hit the same wall, in roughly the same way: the pilot survives the lab, meets the legacy, and stops compounding.
A different pilot won’t fix that. A different practice might.
If you’re somewhere on that curve and it’s starting to flatten, get in touch. We’ve seen this one before.