AI Program Patterns

Agentic AI in Financial Services: What It Is and Whether Your Institution Is Ready

The vendor said "autonomous." The demo showed the AI drafting a customer email, populating the subject line, choosing a tone. The room was impressed. But the demo ended with a human clicking send. That is not autonomy. Autonomy is when the system sends it — without the human in the loop, or with a human whose role is to veto rather than initiate. That distinction is not semantic. It is the entire governance problem.

Every major AI vendor has discovered the word "agentic" in the last eighteen months. It is now applied to systems that, in many cases, cannot do anything a well-prompted chatbot couldn't do in 2023. The hype is understandable — genuinely agentic AI is a real and significant development in the capability curve. But the gap between what the term implies and what most vendors are currently shipping is wide enough that financial services executives need a more precise frame before they evaluate anything in this category.

The meaningful question is not whether agentic AI is coming to financial services. It is. The question is whether a given institution has the governance infrastructure, audit architecture, and risk appetite to deploy something genuinely autonomous safely — and most mid-market financial institutions do not yet. That is not a criticism. It is a sequencing observation, and understanding it is worth more than any vendor demo.

What "agentic" actually means

The term describes a spectrum, not a binary. At one end is an AI that answers questions — it takes an input and produces an output, and nothing changes in the world as a result of its response. The conventional chatbot, the document summarizer, the internal search tool. Useful, often genuinely valuable, but not agentic.

One step further is an AI with tools: a system that can retrieve information from external sources, run calculations, look up records, and produce outputs that incorporate real-time data. A customer service AI that can look up an account balance and answer a question about a recent transaction is using tools. It still isn't acting — it's answering.

Further along the spectrum is an AI that can take multi-step actions within a defined environment. It doesn't just retrieve information; it uses that information to make a sequence of decisions and execute them. This is where the governance complexity starts. Each step in the sequence is a decision. Each decision needs to be traceable, auditable, and bounded by policy. The chain of decisions is what regulators and auditors will want to reconstruct when something goes wrong — and something will go wrong.

At the far end of the spectrum is a system that can initiate consequential actions in the real world — send a payment, execute a trade, file a report — without requiring human confirmation of each action. This is genuinely autonomous execution. It is also, in a regulated financial institution, the category that requires the most rigorous infrastructure before deployment. Most vendors demoing "autonomous AI" today are somewhere in the middle of this spectrum. Some are closer to the chatbot end than they admit.

Why agentic AI is genuinely different for regulated institutions

The governance frameworks that most financial institutions use for AI were designed for a simpler model: a system takes inputs, produces outputs, humans evaluate those outputs and decide what to do. Model Risk Management, fair lending review, explainability requirements — all of these assume there is a decision point where a human is in the loop before any action is taken.

Agentic systems break that assumption in three specific ways.

The audit trail problem. When an agentic system takes ten steps to reach an outcome, the audit trail is not "we ran a model and got a result." It is a chain of decisions, each of which needs to be explainable, each of which could be a point of failure. SR 11-7 was not written with this in mind. Neither was the typical model validation memo. The institutions that handle this well are building an action log at every step — not just a final output — and treating each node in the chain as a separately auditable decision. That is a different architecture than most AI systems in production today.

The policy problem. Existing AI governance frameworks assume a human is accountable for decisions that affect customers or the institution. An agentic system that initiates actions creates a policy question that most governance documents have not addressed: under what conditions can this system act without human confirmation, and what is the explicit scope of its authority? This is not a technology question. It is a policy question, and most institutions have not written the policy. I have seen programs where the technical team built genuinely capable agentic functionality and then had to pause the deployment for six months while Legal and Risk figured out whether existing governance documents permitted it.

The failure mode problem. When a recommendation system is wrong, a human catches it before any harm is done. When an agentic system is wrong, it may have taken three actions before anyone notices. In a financial services context, those actions could have customer-facing consequences, regulatory implications, or both. The blast radius of a failure is larger, and it compounds if the system is designed to act quickly.

The institutions that will deploy agentic AI successfully are not the ones with the most advanced models. They're the ones that built the audit, policy, and rollback infrastructure first — and treated the model as the last piece, not the first.

What needs to be in place before you deploy anything genuinely agentic

These are not aspirational requirements. They are the minimum viable infrastructure for agentic deployment in a regulated financial institution. I have worked through the deployment of systems that meet this bar and systems that don't. The ones that don't either stall in governance review or produce incidents that consume more institutional capital than the AI created.

A complete action inventory. Before the first agentic action is deployed, the institution needs a written document that specifies exactly what the system can do and, as explicitly, what it cannot. This is not a capability list — it is a policy document. "The system may send payment reminders to customers with invoices overdue by more than 30 days, using only pre-approved message templates, to customers with active accounts in good standing, between 8 AM and 6 PM local time, no more than three times per invoice." That level of specificity. The action boundary must be narrow enough that the system cannot surprise you.

Dual-control architecture for high-stakes actions. Any agentic action above a defined threshold — dollar amount, customer impact, regulatory relevance — should require human confirmation before execution. The threshold and the confirmation workflow need to be designed before deployment, not added after the first incident. The confirmation step should not be a rubber stamp: the human reviewer needs enough information to actually evaluate the action, not just a notification that something is about to happen.

A rollback mechanism. Every agentic action should either be reversible or have a documented procedure for remediation if it cannot be fully reversed. This needs to exist before the action type is deployed. "We'll figure it out if something goes wrong" is not a rollback mechanism.

A governance framework that specifically addresses agentic behavior. Most AI governance documents cover model development, validation, and monitoring. They say nothing about AI systems that initiate actions. The institution needs a governance document — signed off by Legal, Risk, and the relevant business lines — that covers who approves an agentic use case, what the ongoing monitoring requirements are, and what triggers a suspension of autonomous action. This document is usually six to twelve months behind where the technical team wants to be. Start writing it now.

The use cases where agentic AI is closer to ready than most institutions think

The readiness question is not binary across the entire category. There are use cases where the action inventory is naturally narrow, the failure mode is contained, and the governance lift is manageable — and institutions that wait for a "fully ready" posture before deploying any of them are leaving real value on the table.

Document processing workflows are the clearest example. An agentic system that reads a loan application, extracts relevant fields, flags missing items, populates a checklist, and routes the file to the appropriate reviewer is taking multiple autonomous actions — and none of them have irreversible consequences. The blast radius of an error is a delayed file, not a customer harm. These workflows can be deployed with existing governance infrastructure and modest monitoring requirements.

Internal operational workflows — scheduling, summarization, report generation, data quality flagging — follow similar logic. The actions are consequential to the institution's efficiency but not to customers or regulators. They can move faster through governance.

Fraud detection alert triage is more complex, but manageable for institutions that have already gone through model risk management for a conventional fraud model. The agentic component is taking a flagged transaction and gathering supporting evidence across systems before presenting it to a human analyst. The human still makes the decision. The agent reduces the analyst's workload. That is a narrow and defensible action scope.

The use cases that aren't ready yet — and why

Autonomous payment initiation. Customer-facing communications that go out without human review. Credit decisions without human confirmation. Anything that directly affects a customer's account or financial position in a way that cannot be instantly reversed.

The reason is not that the AI is incapable. The reason is that the governance, audit, and remediation infrastructure at most financial institutions is not built for autonomous action in these categories. When an autonomous payment system makes an error — and it will — the institution needs a complete audit trail explaining every step, a remediation process that satisfies the customer and the regulator, and a monitoring system that caught the error before the customer did. Building that infrastructure takes longer than building the model. The institutions that will succeed with autonomous payment AI in three years are starting that infrastructure work now.

One more category worth naming: anything that involves generating and sending external-facing communications without human review at the final step. I have seen vendors demo systems that draft and send customer service responses autonomously. The demo looks smooth. The edge cases — a customer in financial distress who receives an automated tone-deaf response, a customer complaint that triggers a regulatory inquiry before anyone from the institution has read the exchange — are not in the demo. They are in the compliance team's incident log six months after deployment.

The right question to ask any vendor pitching agentic AI

Ask them to walk you through the last three times their system took an action it shouldn't have, what the consequences were, and what stopped it from happening again. If they cannot answer that question specifically — not theoretically, specifically — the system has not been deployed at sufficient scale in a production environment to know its failure modes. That is not necessarily disqualifying, but it should change how you evaluate the deployment risk.

The vendors who are genuinely ahead in this category have a detailed answer. They know exactly where their systems fail, under what conditions, and what the monitoring infrastructure catches versus misses. That knowledge comes from production deployments, not from labs.

If your institution is evaluating agentic AI deployments and wants an independent assessment of readiness — governance infrastructure, audit architecture, action scope — I'm available for that conversation.

Email me