Program Leadership

Why Most AI Use Case Prioritization Exercises at Banks Produce the Wrong List

The prioritization workshop produced a ranked list of twenty-seven use cases. Number one was real-time fraud detection. It required real-time data pipelines the bank didn't have, a latency SLA the current infrastructure couldn't meet, and MRM approval for a system type the bank had never deployed before. It was still number one on the list two years later — still in design, still blocked on the same three things that were visible in the original workshop if anyone had looked closely enough.

I've run enough of these exercises to recognize the pattern immediately. The list looks authoritative. It has scores, weights, and color coding. The problem is not the framework — it is the inputs. The feasibility scores are wrong, and they are wrong in a predictable direction.

Why conventional prioritization frameworks produce misleading rankings

The standard AI prioritization framework scores use cases across two dimensions: value and feasibility. Value is usually estimated by the business line — revenue impact, cost reduction, risk reduction. Feasibility is usually assessed by the data science team — data availability, model complexity, development timeline.

This division of labor produces a systematic bias. The data science team is good at estimating whether a model can be built. They are not the right people to estimate whether a model can be deployed into a bank. Building a model and deploying it into a regulated financial institution are two entirely different undertakings, and the gap between them is where most prioritized use cases go to stall.

A fraud detection model is technically straightforward for a capable data science team. Scoring transactions against a model is solved engineering. What the data science team's feasibility score does not capture is that real-time fraud detection at a bank requires a real-time event stream that most institutions do not have, a feature store capable of sub-100ms lookups, integration with the core transaction system that will require months of IT prioritization and testing, and an MRM validation process for a system type — real-time decisioning — that may be new territory for the bank's Model Risk function. None of that is a data science problem. None of it shows up in a data science team's feasibility estimate.

The result is a list ranked by what is technically buildable, not by what will actually reach production fastest or produce the most reliable value.

The three dimensions that most prioritization frameworks get wrong

Feasibility is assessed without integration cost. Integration cost is the single most consistently underestimated dimension in bank AI programs, and it is underestimated because the people assessing feasibility do not own the systems being integrated into. The data science team can estimate model development time with reasonable accuracy. They cannot accurately estimate how long it will take to get a real-time data feed from the core banking system, how many sprints the infrastructure team will need to allocate, or what the testing and change management load will be on the business line. These are the costs that turn a six-month use case into an eighteen-month use case, and they are invisible to the feasibility scoring exercise because the right people are not in the room.

I've written about this specifically in the context of AI program budgeting — the integration and infrastructure costs are where most AI program budgets fall apart, and they fall apart because feasibility was assessed by a team with only partial visibility. For prioritization purposes, the fix is to require an integration cost estimate from IT, not just a model complexity estimate from data science, before any use case is scored.

Data readiness is part of this. A use case that requires clean, structured, real-time data from a source that currently produces batch exports with known quality issues is not a high-feasibility use case. It is a high-technical-debt use case. The distinction matters. I've seen it explained well in the context of what AI-ready data actually requires at a financial institution — the short version is that data readiness is almost always more work than it looks like from the model side.

Governance complexity is treated as a fixed multiplier when it varies enormously by use case. Most prioritization frameworks apply a governance adjustment — something like a 20% complexity penalty for "regulatory sensitivity" — uniformly across regulated use cases. This is wrong. The governance complexity of a document summarization tool used by internal analysts is not comparable to the governance complexity of a model that affects credit decisions or flags transactions for review. Those use cases require different MRM tiers, different fair lending analysis, different explainability standards, and potentially different regulatory disclosures to customers.

The difference is not a 20% multiplier. For a bank that has never deployed a real-time decisioning system, getting that use case through model risk management for the first time might add six to nine months to the timeline — not because MRM is obstructionist, but because the bank is establishing a new validation framework for a new class of system. That is a full program-level decision, not a line-item adjustment in a scoring matrix.

The practical implication is that governance complexity needs to be assessed by the Model Risk function and the Compliance team — not inferred from a category label. Any prioritization exercise that doesn't include input from MRM on the governance complexity of each scored use case is missing the dimension that most frequently determines whether a use case reaches production in year one or year three.

Time-to-value ignores the change management timeline for adoption. A use case can be technically complete — model built, integrated, validated, deployed — and still produce no value for twelve months because the business line hasn't adopted it. Change management for AI systems at banks is consistently underestimated in prioritization exercises because it is treated as an implementation footnote rather than a critical path dependency.

Adoption timelines vary by user population, by how much the AI system changes existing workflows, by whether the system makes recommendations or makes decisions, and by how much training and trust-building is required before the intended users actually use it. A loan officer who has spent fifteen years developing judgment about credit risk will not immediately defer to a model recommendation, and the time it takes to earn that trust — or to redesign the workflow so it doesn't require trust — is not zero. Prioritization frameworks that estimate time-to-value as "model deployment date plus 30 days" are measuring the wrong thing.

A better prioritization framework: value times speed-to-first-dollar times governance simplicity

The three inputs that actually predict which use cases will deliver value fastest are: the potential value of the use case, the time from project start to the first dollar of realized value (not deployment, but realized value), and the governance complexity relative to what the institution has already done.

Speed-to-first-dollar is a more useful measure than feasibility because it forces the scoring exercise to include integration time, change management time, and adoption ramp — not just model development time. A use case with a six-month model development timeline and a fourteen-month adoption ramp has a twenty-month speed-to-first-dollar. A use case with a two-month model development timeline and a three-month adoption ramp has a five-month speed-to-first-dollar. In a conventional feasibility scoring exercise, the first use case will almost always outscore the second because the technical complexity favors large, sophisticated models. In a speed-to-first-dollar framework, the second use case is three times faster.

Governance simplicity favors use cases that fit within validation frameworks the institution has already established. Not because novel use cases aren't worth pursuing — they are — but because the first time a bank deploys a specific class of system, the governance process takes longer. The second deployment of the same class is dramatically faster because the framework exists. Prioritizing use cases that extend existing validated patterns over use cases that require building new governance frameworks is not timidity; it is sequence awareness.

Before scoring any use case on this framework, a structured proof of concept should answer the hardest feasibility questions — particularly integration complexity and data readiness — before the use case gets a production-level prioritization score. A PoC doesn't need to be a full build. It needs to answer the questions that would change the score if the answers were different than assumed.

How to use the first successfully deployed use case as a platform

The value of the first production AI deployment at a bank is not just the direct value of that use case. It is the institutional infrastructure that deployment creates for every subsequent use case.

The first deployment forces the bank to establish: an AI system inventory and classification process, an MRM validation track for that class of system, a human oversight and audit logging standard, a monitoring cadence, an incident response protocol, and a change management playbook for user adoption. All of those artifacts exist after the first deployment and are directly reusable for the second, third, and fourth.

This means that use case one should be chosen partly for its ability to establish durable governance infrastructure, not only for its standalone value. A use case that is simple enough to complete cleanly, visible enough to build internal confidence, and representative enough of a category the bank wants to pursue repeatedly — that is the right first use case, even if it is not the highest-value item on the original list.

The first production deployment is not the most important use case. It is the platform that makes all subsequent use cases significantly cheaper and faster to deploy.

The use case category that almost always wins the corrected prioritization

When I re-score a bank's use case list using speed-to-first-dollar and governance simplicity, the category that consistently rises to the top is process automation and document handling — specifically, use cases where an AI system processes unstructured documents or text, extracts structured information, and routes it into an existing workflow.

The reasons are consistent across institutions. Document handling use cases typically run on existing data — documents the bank already has, in formats it already manages — which eliminates most data readiness complexity. They fit within existing governance categories: document processing is a well-understood use case type in MRM, it typically does not involve direct customer decisions, and it usually carries lower fair lending and adverse action risk than credit or fraud models. The adoption curve is faster because the output of the system (a completed form, a routed request, a flagged exception) is immediately legible to the user without a trust-building period. And the integration requirements are usually much lighter than real-time decisioning or customer-facing systems.

Loan document extraction, contract review, regulatory filing assistance, internal knowledge retrieval — these use cases consistently underperform on conventional prioritization matrices because their headline value is lower than fraud detection or revenue optimization. They consistently outperform on speed-to-first-dollar because they can actually get to production.

The institutions I have seen build durable AI programs did not start with the most ambitious item on the original list. They started with the use case that would reach production inside twelve months, establish the governance infrastructure, demonstrate value clearly, and give the team the credibility to move on to more complex use cases with institutional wind at their backs.

If your prioritization list still has the same number-one item it had eighteen months ago, that is not a sign that the institution is being appropriately ambitious. It is a sign that the inputs to the prioritization were wrong.

If you're working through a use case prioritization or trying to diagnose why the top items keep stalling, I'm happy to look at the list with you.

Email me