Governance

Governance Frameworks That Actually Work for AI in Banking

Most AI governance frameworks are designed by people who have never had to defend a model to a regulator. The ones that work treat governance as an enabler, not a gate — and start with the audit trail in mind.

Every bank I've worked with in the last two years has produced an AI governance framework. Most of them sit in a SharePoint folder, get referenced in board decks, and have almost no effect on how AI actually gets built or deployed inside the institution.

This isn't because the people writing the frameworks are bad at their jobs. It's because most of these documents are written to satisfy three different audiences at once — the board, the regulators, and the internal risk function — and end up serving none of them well. They become defensive artifacts. They describe what the bank says it does, not what would actually be useful when an AI program is moving through the organization.

What I've come to believe is that a useful AI governance framework looks structurally different from the ones most banks publish. It's smaller. It's more concrete. And it starts from a question that most frameworks never ask out loud.

The question every framework should answer first

The question is: when this AI system causes a problem, what will we be able to tell the regulator?

Not "what controls do we have." Not "what is our risk appetite." The specific, narrative question of: a journalist calls, or an examiner schedules an MRA review, or a customer complaint hits the Bureau — what does the audit trail actually look like, and is it good enough?

If you start there, the rest of the framework writes itself. You know what data you need to capture, what decisions need human sign-off, what monitoring needs to be live, and what documentation has to exist before deployment. If you don't start there, you end up with a framework that lists eighteen principles and provides no operational guidance for any of them.

The five things a useful framework actually has to do

Strip away the consulting deck overhead, and an AI governance framework only needs to do five things. Most published frameworks do one or two of these well and the rest poorly.

1. Classify systems by risk, with thresholds that bite

Every framework has a tiering system. Tier 1, 2, 3. High, medium, low. Critical, significant, limited. The tiering isn't the hard part. The hard part is making sure the thresholds actually change behavior.

The test: when a new initiative gets classified as Tier 1, does anything actually become harder? If the answer is "more documentation" but not "different approval path, different monitoring, different sign-offs," the tiering is decorative. Useful tiers translate to real, named, gated steps that are different across tiers.

The other test: are the criteria for tiering specific enough that two different people would assign the same system to the same tier? If "materiality" or "customer impact" is the criterion, you'll get inconsistent classifications, which means you'll get inconsistent governance. Use specifics — number of customers affected, dollar exposure, whether the output is customer-facing, whether a human reviews each decision.

2. Define the model lifecycle gates with named owners

An AI system goes through stages: ideation, design, development, validation, deployment, monitoring, retirement. A useful framework names each gate, defines what has to be true to pass through it, and names a specific role accountable for that gate. Not "the team." A role.

"Validation has been completed" is not a gate. It's a description. A gate is: "The Model Validator has produced a written validation memo, which has been reviewed and signed by the Chief Model Risk Officer or designate, and has been archived in the Model Inventory with a unique identifier."

That sentence is doing five things at once: it names the artifact, names the responsible role, names the reviewer role, defines storage, and creates auditability. Every gate in a useful framework reads like that. Most published frameworks don't have a single sentence that specific.

3. Spell out what the audit trail captures, and where it lives

This is the part most frameworks skip almost entirely. They mention "logging" and "documentation" but don't specify what gets logged, in what format, where it's stored, who has access, or how long it's retained.

For an AI system in a regulated environment, the audit trail has to capture, at minimum: the model version that produced each decision, the input data the decision was based on, the output the model produced, what (if anything) a human did with that output, the version of the policy or threshold the system was operating under, and a timestamp. That's the floor. Many systems need more.

It also has to live somewhere that's queryable years later. Not in application logs that get rotated every 90 days. Not in a vendor's system that the bank can't access independently. In a system the bank controls, with a retention policy aligned to the regulatory requirement (typically 5–7 years for most banking applications).

If you can't tell me, with specificity, where the audit trail of a Tier 1 AI system lives and who can query it, you don't have a usable framework.

4. Define the human-in-the-loop standard for each tier

"Human in the loop" is one of the most overused and underspecified phrases in AI governance. There are at least four different things it can mean:

Human approves every decision. The model recommends; a human signs off. Slow, expensive, defensible.
Human approves a sample. The model decides; humans audit a percentage. Faster, requires statistical rigor on the sampling.
Human reviews exceptions. The model decides for "normal" cases; flags edge cases for review. Requires a clear definition of "edge case."
Human reviews periodically. Aggregate review of model outputs over time, looking for drift or bias. Doesn't catch individual errors.

Each of these is appropriate in some contexts and wrong in others. A useful framework defines, by tier and by use case category, which version of "human in the loop" applies — and what training, authority, and time the human reviewer needs to do the job meaningfully.

A human who has 30 seconds to review a model recommendation, no authority to override it, and no training in what to look for, is not a control. They're a compliance theater prop. Regulators are increasingly able to tell the difference, and they're increasingly skeptical of human-in-the-loop claims that aren't backed by real review capacity.

5. Define what triggers a re-review, and how that gets enforced

An AI system is not static. The model gets retrained. The data drifts. The use case expands. The vendor updates the underlying foundation model. Any of these can change the risk profile of the system materially — and most governance frameworks have nothing to say about how those changes get caught.

A useful framework defines specific triggers that require re-review:

Material change in training data composition
Material change in model architecture or hyperparameters
Vendor-side update to a foundation model the system depends on
Expansion of the use case beyond originally approved scope
Performance drift beyond defined thresholds in production monitoring
Time-based: every 12 or 24 months regardless of changes

And then it defines who is responsible for catching each trigger, what the re-review process looks like, and what happens if the trigger fires and the re-review hasn't happened. Most frameworks describe the triggers and stop there. The accountability for catching them — and the consequences of missing them — is where the framework actually has teeth.

What's usually wrong with the framework you already have

If your institution has produced an AI governance framework in the last two years, here's what's most likely wrong with it. Not all of these will apply, but most banks I've looked at have at least three of them.

The framework reads as a set of principles, not a set of procedures. Principles like "AI must be explainable" or "we will use AI responsibly" tell no one what to do. A useful framework is mostly verbs and named roles.

Model Risk Management was the primary author. MRM frameworks are excellent at validating models that look like traditional models — credit scorecards, PD/LGD models, asset pricing models. They are often awkwardly applied to LLM-based or agentic systems, where the validation methodology is genuinely different. If MRM wrote the AI framework alone, the framework will treat LLMs as if they were credit models, and the data science teams will route around it.

The framework doesn't distinguish between models the bank owns and models the bank uses. A model the bank trains itself, a model the bank fine-tunes from a vendor base, and a model the bank consumes as an API are three fundamentally different governance situations. A framework that treats them identically is either too strict for the API case or too loose for the owned case.

There is no one accountable for the framework itself. The framework was published by a working group that has since dissolved. Updates require a new working group. The document is therefore static in a domain that is moving faster than any other domain in financial services right now.

The framework was reviewed by Legal and Compliance, but not by anyone who would actually have to operate under it. The result is a document that's defensible but not workable, and the operational teams quietly do something different.

The structural fix

The fix isn't a bigger framework. It's a smaller one — but with the five things above done well, and with named ownership that survives turnover.

A useful framework can be 15–25 pages, not 80. It can fit on a single org chart for the governance bodies and a single matrix for the tiering. It can name specific people in specific roles, with named successors. It can be reviewed every six months by a standing committee, not annually by a working group that re-forms each time.

This kind of framework is harder to write than the longer kind, because every sentence has to do work. But it's the only kind that survives contact with the institution actually trying to deploy AI under it.

The honest test

Here is a test I sometimes run with clients. Take your AI governance framework, pick a specific in-flight AI initiative at the bank, and ask three questions:

Which tier is this initiative in, and who decided?
Which gates has it passed, and where is the documentation for each?
If this system causes a problem tomorrow, what does the audit trail show, and where does it live?

If the people running the initiative can't answer all three within a few minutes, the framework isn't operational — regardless of how well-written it is. That gap, between the framework as published and the framework as practiced, is where most AI programs in financial services actually live right now.

Closing that gap is mostly an organizational design problem, not a documentation problem. And it's solvable — but it requires someone willing to look at the framework you already have and tell you, honestly, which parts of it work and which parts of it don't.

If you are shaping an AI use case, the best next step is usually AI Pilot Setup. If you want to compare notes, contact me.

Working through governance for an AI program right now? I'd be glad to compare notes on what's working and what isn't — even if it doesn't lead to an engagement.

Contact me