Governance

What Regulators Are Actually Asking About AI: A Field Guide for Bank Executives

The examiner scheduled a targeted review of the bank's AI systems. The CRO asked for a summary of everything currently in production. It took three weeks to compile it, and when it came back it was incomplete. Nobody had been keeping a list.

This is not an unusual situation. Most financial institutions that have been deploying AI for the past two or three years have done so program by program, initiative by initiative, with governance applied to each one individually. The result is a collection of AI systems with no common inventory, no consistent documentation standard, and no unified view of what the institution has deployed, who owns it, or what decisions it affects.

Regulators have noticed. The guidance coming from the OCC, the Federal Reserve, and the FDIC over the past several years has become progressively more specific about what examiners expect to find — and the gap between what most institutions have prepared and what examiners are actually looking for is wider than most executives realize.

This article is not legal advice, and it is not a compliance checklist. It is a field guide — the kind of briefing a former insider would give you before an examination, based on what the guidance actually says and what examiners are actually asking.

The regulatory landscape: three bodies, overlapping concerns

The three primary federal banking regulators have each issued guidance on AI, and while the specific emphasis varies, the core concerns are consistent enough that preparing for one prepares you substantially for all three.

The OCC has been the most explicit about AI as a distinct supervisory focus area. Its guidance and examination procedures make clear that examiners will assess AI governance as part of broader technology and operational risk examinations — and for institutions with material AI deployments, as a targeted review in its own right. The OCC's primary concerns center on model risk management, third-party risk (vendors), and fair lending implications of algorithmic decision-making.

The Federal Reserve's approach is grounded in SR 11-7, the 2011 model risk guidance that predates modern AI by a decade but that examiners apply to AI systems as best they can. The Fed has issued supplemental guidance acknowledging that SR 11-7 doesn't translate cleanly to LLMs and other generative systems, but the framework remains the baseline. Fed examiners will ask whether AI models are inventoried, validated, and monitored consistently with SR 11-7 requirements — and where they aren't, they will want to understand why and what alternative controls exist. The challenge of applying SR 11-7 to modern AI is real, and examiners generally understand this, but they need to see evidence that the institution has thought carefully about the gap.

The FDIC's focus for community and mid-market banks has emphasized vendor risk — the risk that an institution deploys AI through a third-party vendor without understanding what the vendor's model does, how it works, or how to validate and monitor it. This reflects the reality that most mid-market institutions are deploying AI through vendors rather than building it themselves, and that the vendor relationship creates governance obligations that many institutions are not meeting.

What examiners are actually asking

Strip away the guidance documents and the examination manuals, and what an examiner conducting an AI-focused review wants to understand reduces to five areas. Not all of them will receive equal attention in every examination — the depth depends on the institution's size, its AI maturity, and what the examiner finds when they start pulling threads.

Inventory. What AI systems does the institution have in production? Who owns each one? What decisions does each one affect, and at what scale? What is the tier or risk classification of each system?

This is consistently the first place examinations surface problems, because most institutions do not have a clean, current, centralized answer to these questions. Systems have been deployed across business lines with different governance standards. Some have gone through MRM validation; others haven't. Some are vendor-supplied and treated as standard software rather than models. The examiner's first request is often simply: show me your model inventory. The institutions that are prepared hand over a document. The institutions that aren't spend the next two weeks scrambling.

The inventory doesn't need to be elaborate. It needs to be accurate, current, and complete. For each system: the name, the business owner, the use case, the risk tier, the validation status, and the date of last review. That's the floor.

Model risk governance. How are AI models validated before deployment? Who conducts the validation — internal MRM, an external validator, or something else? What is the ongoing monitoring framework? How does the institution know when a model is drifting or degrading? What triggers a re-validation?

Examiners will ask these questions and then ask to see evidence: the validation memo for a specific system, the monitoring reports, the trigger criteria and what happened the last time a threshold was crossed. A governance framework that lives in a document but hasn't been operationalized will not satisfy this line of questioning. The examiner wants to see the artifact, not the policy.

Fair lending and disparate impact. For any AI system that touches a credit decision — directly or indirectly — examiners will want to know how the institution is monitoring for disparate impact. This includes systems that rank, score, or prioritize customers for credit-related outreach, not just systems that make explicit credit decisions. The fair lending inquiry has expanded as AI has expanded, and institutions that have deployed AI in marketing or customer segmentation without considering fair lending implications have been caught off guard.

What examiners need to see here is not a guarantee of zero disparate impact — they understand that statistical disparities can exist without discrimination. They need to see that the institution is monitoring for disparate impact, has a methodology for doing so, and has a process for investigating and remediating disparities when they appear.

Third-party AI governance. For every AI system supplied by a vendor, the institution is responsible for understanding what the model does, validating that it performs as described, and monitoring it in production. The vendor's attestations are not a substitute for the institution's own due diligence. Examiners will ask: what did you do to validate this vendor's model, and what are you doing to monitor it?

This is the area where mid-market institutions are most frequently unprepared, because vendor-supplied AI has often been purchased and deployed as standard software — evaluated for functionality, not for model risk. The institution may not have validated it, may not have an ongoing monitoring framework, and may not even know what data the vendor's model was trained on. Each of those gaps is an examination finding waiting to happen.

Explainability and adverse action. For AI systems that affect consumer-facing decisions — credit, insurance, account terms — the institution needs to be able to explain the decision to the consumer and to the examiner. Adverse action notices require a reason. "The model said no" is not a reason. The institution needs to be able to translate the model's output into the specific factors that drove the decision, in language that satisfies the regulatory requirement and that a consumer can actually understand.

For traditional credit scorecards, this is well-established practice. For newer AI systems, it is frequently an afterthought. Building explainability into the system from the start is considerably cheaper than retrofitting it after deployment — and the institutions that haven't done it are finding out during examinations.

What SR 11-7 does and doesn't cover for modern AI

SR 11-7 is the baseline, and it's worth understanding both what it requires and where it breaks down for LLMs and generative AI.

What it clearly covers and that translates directly to modern AI: the requirement for a model inventory, the requirement for independent validation, the requirement for ongoing monitoring, the requirement for documentation of model purpose and limitations, and the requirement for governance oversight at the board and senior management level. These requirements apply regardless of model type.

Where it breaks down: the specific methodology for validation was designed for deterministic models with documented inputs and outputs. An LLM doesn't have a validation dataset in the traditional sense. Its "performance" can't be reduced to a single accuracy metric. Its failure modes are probabilistic and context-dependent rather than enumerable. Examiners are generally aware of this tension, and the most sophisticated among them are looking for institutions that have thought carefully about it — institutions that have developed equivalent controls rather than institutions that have tried to force LLMs through a scorecard validation framework and produced documentation that doesn't quite fit.

The institutions that handle this best produce documentation that acknowledges the framework mismatch explicitly, describes the alternative validation approach they used, and explains why that approach satisfies the underlying risk management objective. That kind of transparency tends to land better than documentation that tries to make a generative AI system look like a credit scorecard.

The three things most institutions are missing

Across the institutions I've worked with and observed, three gaps appear most consistently when an AI examination happens.

A current, complete model inventory. Not an inventory that was accurate eighteen months ago. Not an inventory that covers the systems MRM validated but not the vendor-supplied systems that slipped through as software purchases. A current, complete list of every AI system in production, with the governance status of each. Building and maintaining this inventory is not technically difficult. It requires organizational discipline and a clear owner. Most institutions have neither.

Documented monitoring for vendor AI. Institutions that bought AI from a vendor and haven't established independent monitoring — haven't defined what "performing as expected" means for that system, haven't established a cadence for reviewing performance, haven't defined what would trigger re-evaluation — are exposed. The vendor's monitoring is not the institution's monitoring. The examiner will ask what the institution is doing, not what the vendor is doing.

A clear answer to "what decisions does this system affect." When an examiner asks this question about a specific AI system and the answer is uncertain — when the business owner and the technology owner give different answers, or when nobody is quite sure how the output is used in practice — that uncertainty itself is a finding. If the institution doesn't know what decisions an AI system is affecting, it cannot be managing the risk of those decisions. That is the examiner's conclusion, and it is a correct one.

Preparing for an AI examination is not primarily a documentation exercise. It is an organizational exercise — getting clarity on what you have deployed, who owns it, what it does, and what you are doing to ensure it continues to do what you intended. The documentation is evidence of that clarity, not a substitute for it.

How to prepare without over-preparing

The documentation trap is real: institutions that receive advance notice of an AI examination sometimes respond by producing massive documentation packages that take months to assemble and don't actually address what examiners will ask. The examination takes longer than it should, the documentation contradicts itself in places, and the examiner comes away with less confidence, not more.

What examiners respond well to is an institution that can demonstrate it has thought carefully about AI risk — that has an inventory, has applied governance consistently, has identified the gaps in its governance, and has a credible plan for closing them. An institution that presents its AI program honestly, including the gaps, and demonstrates active management of those gaps, is in a better position than one that produces a documentation package designed to make everything look resolved when it isn't.

If you are preparing for an examination and are not confident in your model inventory, your vendor governance, or your monitoring frameworks, the right move is to build those things — not to document around the gaps. Examiners are experienced at reading documentation that is trying to cover something up. They are considerably more forgiving of institutions that identify their own gaps before the examiner does.

If you are shaping an AI use case, the best next step is usually AI Pilot Setup. If you want to compare notes, contact me.

Preparing for an AI-focused examination or trying to get your governance in order before one happens? I'd be glad to talk through what you have and what you're missing.

Contact me