AI Program Patterns

Why Bank AI Pilots Fail to Reach Production — and the Six Patterns That Get Them Unstuck

The model isn't the problem. It almost never is. After watching dozens of AI initiatives stall inside large financial institutions, the patterns are remarkably consistent — and so are the things that actually get them moving again.

Here is something I've come to believe after twenty years inside large banks and a couple of years specifically focused on AI program work: the failure mode of an AI pilot is almost never the model.

The model performed in the proof-of-concept. The vendor demo went well. The data scientist showed the chart with the impressive AUC. Everyone clapped. And then nothing happened for nine, twelve, eighteen months.

This is the most common situation I'm called into. The institution has done the hard part of generating real evidence that AI can deliver value, and now they're stuck. The board wants to know when production happens. The CFO wants to know when the budget produces revenue. The data science team is frustrated. The vendor is frustrated. The CIO is frustrated.

And usually no one can quite articulate what's actually wrong.

So let me try. In my experience, the blockers fall into six patterns. Most stalled programs have at least three of them at once.

1. Model Risk Management is asking questions nobody can answer

Inside a regulated bank, an AI model is a model. It goes through Model Risk Management. MRM was built for credit scorecards and PD/LGD models — for systems where you can write down the math, validate it, monitor it, and explain it to a regulator. A modern LLM-based or agentic system breaks most of those assumptions.

The data science team submits the model for validation. MRM comes back with thirty-seven questions. Some are reasonable. Some are unanswerable as written. The team writes a partial response. MRM comes back with twenty more questions. The team takes three months to respond. By the time they do, the regulatory environment has shifted, and MRM has new questions.

This isn't bad faith on anyone's part. MRM is doing its job. The data science team is doing its job. They're just not having the same conversation.

What gets it unstuck: Someone who can sit between the two functions and translate. The data science team needs to understand which questions are about substance and which are about evidence. MRM needs to understand which controls are equivalent across model classes. This is a conversation, not an artifact, and it usually takes a few weeks of dedicated facilitation.

2. The integration work was scoped at zero

The pilot ran on a sandbox. The data was extracted, cleaned, and loaded into a notebook. The output was a CSV. Everyone agreed this was a successful proof of concept.

Production is different. Production means a live data pipeline. It means authentication and authorization. It means logging and audit. It means handling failures gracefully when an upstream system is down. It means SLAs. It means deployment infrastructure that can be patched, monitored, and rolled back.

None of that was in the pilot. None of it was scoped. None of it was budgeted. And the team that built the pilot is often not the team that can do the integration work.

What gets it unstuck: An honest re-scoping. The integration work is usually three to five times the effort of the model work, and that's a number nobody wants to say out loud. Saying it out loud, with a credible plan, is often the unblock.

3. The data is not as clean as the pilot suggested

The pilot used a curated dataset. Someone, somewhere, did several days of data wrangling to make the pilot possible. That work didn't get documented because it was just "getting the data ready."

In production, the data comes from the actual upstream systems, with all of their actual quirks. Missing fields. Inconsistent encodings. Records that look the same but aren't. The model that performed beautifully on the pilot dataset performs noticeably worse on the production dataset, and nobody is sure why.

What gets it unstuck: A short, focused effort to characterize the production data, document the gaps, and either fix them upstream or build them into the model preprocessing. This is unsexy work that nobody wants to do, and it is almost always the actual blocker.

4. The change management plan is the slide that got skipped

An AI system that affects how humans do their work requires the humans to change how they do their work. This is true whether the system is making decisions, recommending decisions, or just providing information.

The pilot didn't surface this because the pilot didn't touch real users. Production will. And the operational team — the people who will actually use the system day-to-day — were not in the room when the pilot was scoped, were not consulted on the workflow design, and are now being asked to adopt something they didn't help build.

Adoption stalls. Usage metrics look bad. Leadership concludes the model isn't working. The model is working. The change management plan was the slide that got skipped.

What gets it unstuck: Bringing the operational owner in as a real partner, not a stakeholder to be managed. Designing the workflow with them, not for them. Investing in training and feedback loops. This sounds obvious; it is rarely done.

5. The vendor relationship has drifted

The vendor sold a vision. The procurement contract was negotiated against that vision. Eight months in, the vision and the reality have diverged. The vendor is over budget. The internal team is over budget. The contract is unclear about who owes what.

Nobody wants to renegotiate because renegotiation is unpleasant and exposes everyone to scrutiny. So the project drifts in a kind of polite stalemate.

What gets it unstuck: A clean conversation about what the contract actually says, what each party can realistically deliver, and what an honest revised scope and timeline look like. This is sometimes uncomfortable but rarely as bad as the parties fear, because both sides usually want the project to succeed.

6. The executive sponsor stopped paying attention

Every successful program inside a large bank has an executive sponsor who pays attention. Every failed program has a sponsor who started paying attention, got pulled into something else, and now drops in once a quarter to ask why nothing is moving.

The sponsor's attention is the single most reliable predictor of whether a program reaches production. Without it, governance defaults to no, integration teams deprioritize, vendors lose urgency, and change management never happens.

What gets it unstuck: Re-engaging the sponsor — but with a specific, narrow ask. "Stay engaged" is not an ask. "Approve this scope, attend the steering committee for the next three months, and resolve these two specific cross-functional disputes" is an ask. Sponsors respond to specificity.

The pattern behind the patterns

If you read those six patterns carefully, you'll notice something: not one of them is a technical problem. Five out of six are organizational. The sixth (the data) is technical only in the most generous reading.

This is the thing I want financial services leaders to internalize. The AI hype cycle is loud. The vendor pitches are loud. The conferences are loud. But the actual work of getting AI into production inside a regulated institution is mostly the work of moving an organization — which is the work financial services has always required, and which the people running large programs at large banks have always done.

The technology has changed. The work of moving an institution has not.

When an AI program is stuck, ask which of the six patterns is in play. It's usually three or four of them. Naming them out loud is often the first thing that happens in months.

If your program is in this place — pilot worked, production hasn't happened, nobody can quite say why — I'd be glad to talk. The first conversation is free, and even if I can't help, I'll usually be able to tell you something useful.

Working through something similar? I'd be glad to compare notes — even if it doesn't lead to an engagement.

Email me