Perfect, thank you. I’ll develop a detailed, well-structured 5,000-word report on agentic AI, tailored for financial professionals and accounting academics. It will explore what agentic AI is, how it builds on large language models and generative AI, current offerings from major players, and the implications—both benefits and challenges—for accountants, auditors, and tax preparers.I’ll include real-world examples where possible, and integrate relevant considerations on data privacy, explainability, and alignment with auditing standards from organizations like the AICPA, PCAOB, and IAASB. I’ll get started and let you know when the report is ready for review.
Agentic AI in Finance: A New Era of Autonomous Automation
Introduction
The rapid evolution of generative AI – typified by large language models (LLMs) like ChatGPT – has opened the door to more advanced systems that not only create content but also take actions. Agentic AI refers to AI systems endowed with agency: they can make decisions and carry out multi-step tasks autonomously, with minimal human interventionwww.redhat.comexecutivebiz.com. In contrast to a standard chatbot that generates a textual answer and stops, an agentic AI can initiate further steps or use tools on its own to achieve a user’s goalwww.redhat.com. This report explains what agentic AI is and how it differs from generative AI, surveys notable agentic AI offerings as of March 2025, and discusses their relevance to financial professionals (especially accountants, auditors, and tax preparers). Key benefits – such as automation and efficiency – are weighed against risks around confidentiality, reliability, numerical accuracy, and explainability. We also examine emerging use cases and consider how professional standards (AICPA, PCAOB, IAASB, etc.) may need to adapt in response to agentic AI.
What Is Agentic AI (and How It Differs from Generative AI)
Agentic AI describes AI systems that act autonomously to accomplish goals. Rather than only producing an output when prompted (as generative models do), an agentic AI can proactively plan and execute a series of actionswww.redhat.comexecutivebiz.com. In effect, it combines the creative output of generative AI with a layer of decision-making and tool usewww.redhat.com. A simple way to distinguish the two:
- Generative AI creates content (text, images, etc.) in response to an input. It is largely reactive – it does what you ask and then waits for another promptdevelopers.googleblog.com. For example, an LLM might write a draft tax memo when prompted, but it won’t take further initiative on its own.
- Agentic AI does tasks – it is proactive. Given an objective, an AI agent can break it into sub-tasks, decide which tools or information sources to use, and carry out each step, possibly generating new sub-queries or actions without explicit human directionwww.redhat.comdevelopers.googleblog.com. It literally has a degree of “autonomy” in pursuing goals. In practice, agentic AI is often implemented by pairing an LLM with external tool access and an orchestration logic. The LLM’s language understanding and generation serve as the “brain,” while tool integrations (e.g. web browsers, databases, calculators) let it interact with the worldwww.redhat.com. A coordination module (sometimes following techniques like ReAct or function calling) directs the agent’s use of tools and multi-step reasoningdevelopers.googleblog.com. This enables behaviors like: searching the web if the answer isn’t known, calling APIs, writing and executing code, or querying databases – all decided by the AI agent at runtime, not pre-scripted by a human.Proactiveness vs. Reactiveness: Experts often summarize agentic AI in one word: “proactiveness.” An agentic system can decide on its own to take initiativeexecutivebiz.com. For instance, a generative AI might return a company’s refund policy when asked, whereas an agentic AI could offer to initiate a return for you and then proceed to fill out the return form and process the refund autonomouslywww.redhat.com. This goal-driven autonomy is what sets agentic AI apart from earlier AI assistants and basic automation. In enterprise settings, agentic AI is seen not as a replacement for chatbots or “copilots,” but as a backend capability that works alongside themexecutivebiz.com. A conversational copilot provides the interface where a user requests something, and the copilot in turn can invoke an AI agent under the hood to actually perform the complex taskexecutivebiz.com.Relation to LLMs: It’s important to note that agentic AI builds on generative AI rather than replacing it. Most agentic systems today use LLMs as their core reasoning enginewww.redhat.com. The difference is that an agentic setup extends an LLM by giving it tools and an “execution loop.” In other words, generative AI provides the content-generation and prediction ability, while agentic AI provides a framework for actions and decisions. Indeed, one can think of agentic AI as “LLM + Automation”www.redhat.com. For example, a traditional workflow automation might follow rigid pre-programmed steps. An agentic AI, by contrast, can dynamically decide new steps or adjust its plan if conditions changewww.redhat.comwww.redhat.com. It can also chain multiple actions from one user prompt (sometimes called chaining), accomplishing an entire sequence like “draft a financial report, format it in Excel, and email it to the team” all from one high-level requestwww.redhat.com.In summary, generative AI creates, while agentic AI acts. Generative models are powerful assistants for producing text or analysis, but agentic AI leverages those models to autonomously drive workflows and decisions. This distinction is critical for understanding the new capabilities – and new risks – that agentic AI introduces, especially in fields like accounting and finance where accuracy and accountability are paramount.
Current Agentic AI Offerings (March 2025)
As of March 2025, several major tech providers have rolled out agentic AI tools or features. Below we review key offerings, their accessibility and pricing, and their notable capabilities:
OpenAI ChatGPT Operator
OpenAI’s ChatGPT Operator is a flagship example of agentic AI in action. Announced as a “research preview” in early 2025, Operator is a cloud-based AI agent that can use a web browser and other tools to perform tasks on a user’s behalfsimonwillison.net. In the ChatGPT interface, Operator appears as an AI assistant that not only chats, but can actually navigate websites, fill forms, and execute actions online. Sam Altman (OpenAI’s CEO) described these AI agents as “systems that can do work for you independently – you give them a task and they go off and do it”simonwillison.net.OpenAI’s ChatGPT Operator interface, which pairs a chat window (left) with an integrated web browser (right). In this example, the user requests a restaurant reservation, and the AI agent autonomously searches the OpenTable website and attempts to book a table.simonwillison.netmectors.medium.comFeatures: Operator is powered by a specialized model called CUA (Computer-Using Agent) that allows it to “see” a rendered webpage and interact with it like a human usersimonwillison.netmectors.medium.com. It can click buttons, enter text, and navigate links based on visual input (screenshots) and instructions. Essentially, it’s as if ChatGPT had a virtual web browser and mouse at its disposal. With this, Operator can handle tasks like booking travel or shopping online end-to-end: the user asks in natural language, and the agent carries out the necessary web interactions (searching, form-filling, etc.) autonomouslymectors.medium.com. Operator is intentionally designed with the human “in the loop” – it often asks for user confirmation before final steps and allows the user to take control at any timesimonwillison.net. This is why OpenAI calls it “Operator” (the human remains the ultimate operator, supervising the AI’s actions)mectors.medium.com. It can be seen as an advanced form of robotic process automation (RPA) that leverages AI: mimicking user actions on websites or software, but with the decision-making guided by an LLM rather than rigid scriptsmectors.medium.com.Accessibility & Pricing: During its initial preview, ChatGPT Operator was only available to ChatGPT Pro subscribers in the U.S. (and later expanded regions)mectors.medium.com. ChatGPT Pro is a high-end tier (around $200/month) targeting power users and enterprises, which includes access to experimental features like Operatorsimonwillison.net. By March 2025, OpenAI had rolled out Operator more broadly to Pro users in multiple regionsx.com. Operator is not a standalone product but part of the ChatGPT interface for those with access, and tasks executed via Operator presumably count against the subscription’s usage limits. As a cloud-hosted solution (OpenAI runs the headless browser on their serverssimonwillison.net), no local installation is needed – users simply log in to the ChatGPT web UI to use Operator. This convenience comes at a cost: as noted, only the highest-tier paying customers had access during the preview phase. OpenAI’s strategy seems to be to refine Operator in the Pro tier before potentially offering it via API or wider consumer release. (Indeed, OpenAI has hinted at an API for the CUA model so developers can build their own agentssimonwillison.net.)Notable Considerations: Operator’s design places heavy emphasis on safety and reliability, given the risks of an autonomous AI browsing the web. OpenAI implemented safeguards such as cautious navigation (the agent is trained to ignore obvious malicious prompt injections from websites), an oversight model that watches the agent’s actions and can pause if something looks suspicious, and human review pipelines to quickly block dangerous behaviorssimonwillison.net. These measures were prompted by known vulnerabilities – for example, early tests of Anthropic’s similar agent (Claude’s computer-use, discussed below) showed it could be tricked by a malicious webpage to perform unintended actionssimonwillison.net. OpenAI’s added guardrails (and the requirement for user confirmations on critical steps) aim to mitigate such risks. From a user perspective, Operator still requires careful oversight – one is advised to monitor its work and reset its session between distinct taskssimonwillison.net, treating it as a very powerful but occasionally error-prone “assistant” rather than an infallible agent.
Google Gemini and AI Studio (Gemma Models)
Google’s entry into agentic AI is centered around Gemini, its next-generation family of multimodal AI models, and Gemma, a suite of lighter-weight open models related to Gemini. Announced by Google DeepMind in late 2024, Gemini is described as Google’s “most capable model yet, built for the agentic era”deepmind.google. Gemini models introduce native abilities that directly support agentic behavior, including tool use, memory, and planningdeepmind.google. Notably, Gemini 2.0 can natively execute tool APIs (like search or code), and can handle text, images, and even generate speech outputsdeepmind.googledeepmind.google – making it well-suited for autonomous agents that might need multimodal understanding.Gemini: At the time of writing, the Gemini 2.0 model family had multiple variants for different use cases. For example, “Gemini 2.0 Flash” is a general model optimized for low latency and agentic experiences, while “Flash Thinking” is a variant geared toward better reasoning and explainability (it can show its chain-of-thought)deepmind.google. These models underscore Google’s focus on agents: one variant even explicitly emphasizes “showing its thoughts to improve explainability” – an important feature if an agent is to be trusted in critical tasksdeepmind.google. Google positions Gemini as a step toward a “universal AI assistant” that can take actions under your supervision, use tools like search or code execution seamlessly, and even handle real-time inputsdeepmind.googledeepmind.google. For instance, an agent built on Gemini could not only answer your question, but also search Google for up-to-date info and cite it, or run a piece of code to calculate something, within one integrated process.Gemma (AI Studio): Alongside Gemini, Google introduced Gemma open models, which are essentially smaller-scale versions inspired by the same research. Gemma models (version 2 as of 2025) come in sizes like 2B, 9B, and 27B parameters – much smaller than Gemini’s full scale – and are available for developers to use and even fine-tuneai.google.dev. These models can be accessed through Google AI Studio, a platform to experiment with and deploy AI models. AI Studio is described as the fastest way to start building with Gemini and the Gemma modelsaistudio.google.com. In practice, AI Studio offers an interactive playground and APIs; developers can obtain API keys, test models in-browser, and even run Gemma models on their own hardware or Google Clouddevelopers.googleblog.comcloud.google.com. Gemma models are generative (they can produce text and in some cases images) and support function calling, meaning they can integrate with tools – a necessity for agentic functionalitydevelopers.googleblog.com.One Google Developers Blog post, “Beyond the Chatbot: Agentic AI with Gemma”, provides a tutorial on using a Gemma-2 model to build an agent that utilizes ReAct (Reason+Act) prompting and external toolsdevelopers.googleblog.com. This shows Google actively guiding developers to create agentic AI systems using their models. The post highlights that most AI today is reactive, but “in contrast, agentic AI is proactive and autonomous…using external tools like search engines and specialized software to get information beyond its base, letting it solve problems independently.”developers.googleblog.com. Google clearly sees agentic capabilities as a key differentiator for Gemini/Gemma – even touting that Gemini 2.0 unlocks new possibilities for AI agents with memory, reasoning, and planning under supervisiondeepmind.google.Accessibility & Pricing: Google’s approach bifurcates between free/opensource access and commercial. Gemma models are relatively accessible – developers can experiment with Gemma 2 (2B, 9B, 27B) for free via AI Studio (some usage limits may apply) and even deploy them locally or on Google Cloud with provided toolsai.google.devcloud.google.com. These are “open” in the sense of being available to use (though not necessarily fully open-source in licensing). Gemini, the high-end model, is offered through Google’s services: e.g., an interactive demo (gemini.google.com)deepmind.google, and integration into products like Google Cloud’s Vertex AI (for API access)www.anthropic.com. Pricing for Gemini API usage is usage-based (similar to other cloud models: per token charges), though Google had not fully publicized rates at the time. Google Colab’s free Data Science Agent (discussed later) also provides a taste of Gemini’s power for no cost, suggesting Google’s strategy is to seed the market with accessible agentic tools and then monetize enterprise-grade usage via cloud services.AI Studio itself is free to sign up (requiring a Google account) and offers a limited free quota for model inference. Larger-scale or production use of Gemini likely requires a Google Cloud account and payment, or use of Vertex AI PaLM/Gemini APIs which are priced by model and by usage.
Anthropic Claude – Computer Use and Coding Agents
Anthropic, another leading AI lab, has been developing agentic capabilities in its Claude model series. By March 2025, Anthropic demonstrated two notable agentic AI features: Claude’s Computer Use skill, and Claude Code (agentic coding assistant).Claude for Computer Use: In October 2024, Anthropic announced a new capability allowing Claude to control a computer in a human-like way. This “computer-use” model lets Claude interpret a live screenshot of a user’s computer interface and issue keyboard/mouse commands (like moving a cursor by a certain number of pixels and clicking)www.anthropic.com. In essence, Claude can operate software via the GUI, just as a person would – a very agentic behavior. Anthropic trained Claude on performing tasks in simple apps (calculator, text editor) using this visual approach, and Claude surprisingly generalized to handle other software by breaking down user instructions into GUI actionswww.anthropic.com. For example, if asked to bold text in a document, Claude could locate the “Bold” button on the screen image and simulate a click. It learned to count pixels accurately (to target interface elements) and even to self-correct if an action failedwww.anthropic.comwww.anthropic.com.Anthropic released this as a limited beta API (developers could run Claude’s computer-use in a Docker container, supplying their screen and receiving actions)simonwillison.net. Early evaluations showed Claude became state-of-the-art at this form of computer control, albeit with modest absolute performance: it achieved ~14.9% success on the OSWorld benchmark for autonomous computer use (versus ~7.7% for the next best model, and ~70%+ for humans)www.anthropic.com. This indicates that while Claude can do basic GUI tasks, it still fails most complex ones – but it was a leap forward in capability. The concept is similar to ChatGPT’s Operator, except Claude’s version initially required self-hosting and was targeted to developers for testing. Security and safety were major considerations; Anthropic noted that while adding computer-use didn’t elevate Claude’s general AI risk level (it remained at their “AI safety Level 2”)www.anthropic.com, it did bring immediate issues like potential misuse. Prompt injection attacks on a GUI (where a malicious app window might contain hidden instructions) were a known vulnerability, so Anthropic urged caution and implemented policy safeguardswww.anthropic.com. In summary, Claude’s computer-use showcased that an AI agent could interact with real software environments – not just text – heralding more integrated automation for enterprise tasks (imagine an AI that could operate an accounting software UI directly). However, in 2024-25 it was a developer-focused preview, not a polished end-user product.Claude “Coding” (Claude Code): In February 2025, Anthropic launched Claude 3.7 “Sonnet”, a new model with advanced reasoning, and alongside it introduced Claude Code – described as “a command line tool for agentic coding”www.anthropic.com. Claude Code allows developers to delegate substantial programming tasks to Claude from their terminal, effectively turning Claude into an autonomous coding agentwww.anthropic.com. Instead of just getting a snippet from the AI, a developer could assign a high-level goal (e.g. “build a function to do X, and integrate it into module Y”) and Claude Code would iteratively generate code, refine it, perhaps run tests, and produce the result. This is an evolution of “AI pair programmer” into something more like an “AI junior developer” who can handle multi-step coding jobs. Claude Code was in limited research preview, meaning interested developers had to sign up to try itwww.anthropic.com.Under the hood, Claude 3.7 Sonnet’s improved capabilities (especially in extended “chain-of-thought” mode) were geared toward such agentic tasks – it can engage in step-by-step reasoning and decide how long to “think” on a taskwww.anthropic.comwww.anthropic.com. Pricing for Claude 3.7 remained the same as prior versions (around n15 per million output)www.anthropic.com, which made it relatively affordable for complex tasks given its 128k token context. Claude Code likely uses those extended contexts to manage larger coding projects. For now, the use of Claude’s agentic coding is limited to those in the preview and requires comfort with command-line tools. Over time, one can expect Anthropic to integrate these capabilities into user-friendly IDE plugins or cloud platforms, directly competing with GitHub’s Copilot X or other AI coding copilots but with a more autonomous bent (i.e. it can attempt an entire coding task with minimal hand-holding).Accessibility: Anthropic’s strategy has been to offer Claude commercially via API and their own chat interface (claude.ai). By early 2025, Claude 3.7 was available across all their plans (including even a free tier with limited usage)www.anthropic.com, as well as through cloud providers like Amazon Bedrock and Google Cloud’s Vertex AIwww.anthropic.com. The computer-use feature, however, was not generally exposed in the public Claude.ai chatbot; it was only accessible via a special developer API and environment (likely due to security concerns). Claude Code being command-line based also targeted developers. So for an accounting end-user, these agentic features were not yet in a consumer-friendly app. We might soon see Anthropic partner with software vendors to embed Claude’s agent skills into enterprise apps (for example, a future accounting software could let you “ask Claude to perform tasks within the software”). As of 2025, using Claude’s agentic powers requires technical integration. Pricing for such use would depend on token consumption via API or any specialized licensing Anthropic might introduce for Claude Code once it matures.
Microsoft 365 Copilot Chat (with Agents)
Microsoft has been infusing agentic AI into its productivity suite under the “Copilot” banner. In January 2025, Microsoft announced Microsoft 365 Copilot Chat, a new chat-based AI assistant available to all Microsoft 365 commercial customerswww.microsoft.com. What makes this especially relevant here is that Copilot Chat includes agentic capabilities integrated into the experience. Microsoft describes it as “the power of chat + agents”www.microsoft.com. Users can converse with Copilot (which is powered by an LLM called GPT-4o, a Microsoft-optimized GPT-4 modelwww.microsoft.com), and from the chat interface they can invoke “agents” to perform actions or retrieve information.Features: Copilot Chat provides a familiar chat UI across Office apps, Teams, and mobile, where users can ask questions or request work (e.g., “Draft an analysis of Q4 financial trends” or “Schedule a meeting with the audit team next week”). The system is web-grounded, meaning it can search the web for up-to-date info, and it also allows file uploads so it can work with user documentswww.microsoft.comwww.microsoft.com. The novel aspect is agents accessible right in the chatwww.microsoft.com. Microsoft demonstrated that within Copilot Chat, one can call on specialized agents – essentially tool-using mini-AIs – to do things like execute a database query, run a simulation, or trigger a workflow. The interface even includes an “Agents” panel listing available agents and options to create new oneswww.microsoft.com. In essence, Copilot Chat is a hub combining conversational AI with action-oriented AI. For example, an accountant could ask Copilot Chat: “Find any outliers in this Excel data and graph the results” – the Copilot might then employ an Excel agent to run formulas or scripts on the data and produce a chart, returning it in the chat. Microsoft has been integrating such functionality deeply: Copilot in Excel can already create formulas and analyze data via natural language, and Copilot in PowerPoint can autonomously create slides. The new Copilot Chat extends across the suite and introduces pay-as-you-go agent usage, meaning certain advanced actions might incur consumption-based billingwww.microsoft.com.The Microsoft 365 Copilot Chat interface (desktop view). The home screen (shown) provides a chat entry point and shortcuts. Notably, the right sidebar lists available “Agents” (e.g. Customer Support, Market News, Proposal Generation) and options to get or create agents – highlighting the integration of agentic AI into Microsoft’s productivity environment.www.microsoft.comwww.microsoft.comAccessibility & Pricing: Microsoft’s approach is two-tier: a free Copilot Chat experience for Microsoft 365 users, and a premium Microsoft 365 Copilot add-on for more advanced features. Copilot Chat (the base chat + agent access) was announced as available at no additional cost for all commercial M365 subscriberstechcommunity.microsoft.com. This move (“Copilot for all”) aims to seed usage widely. The free tier includes GPT-4o powered chat, basic web browsing, and the ability to use built-in agents in a limited capacitywww.microsoft.com. The advanced tier, Microsoft 365 Copilot (formerly $30/user/month for enterpriseswww.theverge.com), offers deeper integration – such as access to organization’s internal data (via Microsoft Graph), Copilot features embedded in Office apps, and unlimited agent usage. In 2025, Microsoft also introduced pay-as-you-go pricing for agent invocations for those who don’t have the full Copilot licensewww.microsoft.com. This means an organization can allow employees to use agents within Copilot Chat and pay per task or per compute consumption, rather than a flat fee, lowering the barrier to trying these featureswww.cnbc.com. For an accounting firm, this makes Copilot Chat an attractive entry point: if they already have M365, they get a secured chat assistant with some automation abilities out-of-the-box. They can then decide if the ROI justifies upgrading to the full Copilot or using pay-per-use for heavier tasks.By being integrated into Microsoft 365, Copilot Chat respects enterprise security and compliance: conversations can optionally be grounded in the user’s documents and emails (with proper permissions), and data stays within the tenant (in the paid Copilot setup). IT admins have controls over what the AI can access and dowww.microsoft.com, addressing some confidentiality concerns by design.
Google Data Science Agent for Colab
Another notable offering is Google’s Data Science Agent in Colab, which exemplifies agentic AI applied to data analysis. Google Colab (Collaboratory) is a popular cloud-based Jupyter notebook service. In late 2024, Google began testing an “AI-powered Colab assistant” that could automatically generate entire notebooks for data tasks. By March 2025, the Data Science Agent was made available to Colab users (18+ in select countries) as an experimentdevelopers.googleblog.com.What it does: The Data Science Agent allows a user to simply describe their analysis objective in natural language and provide a dataset (e.g. upload a CSV). The agent then creates a multi-step Colab notebook to accomplish that goal – including data cleaning, exploration, visualization, Q&A on the data, and even basic modeling or predictionslabs.google.com. Essentially, it’s an autonomous data analyst that writes Python code for you. For example, you might say: “Analyze this sales data for trends and outliers, then train a forecast model,” and the agent will produce a notebook that: imports the necessary libraries, loads the data, performs exploratory analysis (perhaps outputting some charts and stats), identifies outliers, builds a simple forecast model (like an ARIMA or linear regression), and then provides some narrative insights. All of this is done automatically – the user can literally “sit back and watch” as the code and markdown cells materialize in front of themdevelopers.googleblog.com.The agent is powered by Gemini under the hood, which gives it the ability to reason about multi-step tasks and generate context-appropriate code. It was reported that this Data Science Agent achieved 4th place on a benchmark called DABStep (for multi-step data analysis reasoning) – outperforming various other AI agents including some based on GPT-4 and Claudedevelopers.googleblog.com. That suggests a high level of competency in orchestrating complex sequences of operations.Using the agent: To invoke the Data Science Agent, a user opens a blank Colab notebook and enables the “Gemini” side panel (Colab’s UI was updated to include an AI sidebar). Then, by describing their data and what they want (in plain English), the agent is triggereddevelopers.googleblog.comdevelopers.googleblog.com. It’s truly an interactive experience: the agent writes out the notebook, and the user can review or tweak it. This lowers the barrier for non-programmers (or busy analysts) to get productive results without manual coding. Of course, Google cautions that “Data Science Agent may make mistakes”developers.googleblog.com, so users should validate critical outputs. But the key benefit is speed – it automates the tedious setup and boilerplate, letting analysts focus on interpreting results.Accessibility & cost: Google has made the Data Science Agent free to use (at least during the experiment phase) for all Colab userswww.reddit.com. Colab itself has free and paid tiers, but Google indicated the agent is available even on the free tier, subject to some usage limits. There may be geographic restrictions (select countries initially)developers.googleblog.com, but generally it’s widely accessible. This is a strategic move by Google to showcase Gemini’s capabilities and gather feedback. It gives academics, students, and professionals a hands-on taste of agentic AI in a practical scenario (data analysis) at no cost. For more advanced usage, Colab Pro ($9.99/month and up) offers higher resource limits which would allow the agent to handle larger datasets or longer notebooks without hitting free tier capstechcrunch.com. The pricing model here is essentially: basic agentic functionality is bundled free in Colab to add value to the platform, while heavy users might end up subscribing to Pro for the compute benefits.Features and limitations: The Data Science Agent emphasizes producing fully functional notebooksdevelopers.googleblog.com – meaning the code it writes is meant to run correctly as-is. This is more than just generating a code snippet or a static report; it delivers an interactive, tweakable result. Users can re-run cells, adjust parameters, or extend the notebook after the agent finishes, blending AI automation with human analysis. It’s an early example of how agentic AI can collaborate with professionals: doing the grunt work of coding and initial analysis, allowing the human expert to apply judgment on the outputs. The current limitations include occasional errors (e.g., choosing the wrong chart type or a suboptimal model) and the scope of tasks it can handle (likely geared toward exploratory data analysis and simple ML, not extremely custom logic). It also currently works one notebook at a time – it won’t, for example, decide to create multiple notebooks or complex multi-file projects on its own.
These offerings illustrate that as of early 2025, agentic AI is transitioning from concept to reality across various platforms. OpenAI’s Operator and Anthropic’s Claude are pushing the frontier of agents that interact with software and the web, while Google and Microsoft are embedding agents into practical productivity and data tools. Pricing models range from premium subscriptions (OpenAI) to inclusive/free add-ons (Microsoft, Google) – reflecting different market strategies. For end users in finance and accounting, this means there are already avenues to experiment with agentic AI: e.g., using Copilot Chat if your organization has M365, or trying Google’s Colab agent for data analysis tasks. In the next section, we turn to why these developments matter specifically for financial professionals, and how agentic AI might transform (and is already transforming) workflows in accounting, audit, and tax domains.
Relevance of Agentic AI to Financial Professionals
Agentic AI stands to significantly impact the work of accountants, auditors, and tax professionals. These fields involve numerous multi-step, repetitive, and knowledge-intensive tasks that could be streamlined by autonomous AI agents. While earlier automation (like macros or RPA bots) required explicit programming, agentic AI offers a more flexible, intelligent form of automation that can handle complexity and adapt to new informationwww.redhat.comtax.thomsonreuters.com. Here are some ways agentic AI is relevant to finance professionals:
- Automating Research and Analysis: A tax specialist or auditor spends considerable time researching regulations, standards, or client data. AI agents can autonomously scour tax code, accounting standards, or internal databases to gather relevant information and even draft summary memos or workpaperstax.thomsonreuters.com. For example, an agent could be tasked with analyzing all leases in a client’s records and cross-checking them against the new lease accounting standards, flagging any that don’t comply. This goes beyond a static search – the agent might iterate: find data, apply criteria, consult relevant paragraphs of the standard (using a retrieval plugin), and prepare a report of findings, all with minimal human guidance.
- Workflow and Task Automation: Many finance processes have multiple steps (fetch data, perform calculations, populate a report, send emails). Agentic AI can plan and execute entire workflows by integrating with software tools. In audit, an agent could perform an end-to-end procedure like obtaining an accounts receivable aging report from the ERP, analyzing it for large overdue balances, drafting an email to confirm those balances with customers, and logging the results – simulating what a junior auditor might do. In tax prep, an agent might pull client financials, input figures into a tax software, perform initial categorization of items, and generate a draft return for review. These scenarios entail navigating various applications (which Operator or Claude’s computer-use can handle) and making rule-based decisions (which an LLM can be instructed to follow). Thomson Reuters describes future agentic AI that “manages project timelines, allocates resources, and coordinates between team members and AI agents”tax.thomsonreuters.com, highlighting that not only individual tasks but entire project management aspects can be offloaded to AI. This could apply to an audit manager having an AI agent track the status of audit schedules, send reminders to clients for documents, and update the audit plan in real time.
- Enhanced Client Service: For client-facing accountants or advisors, agentic AI could support always-on assistance. Imagine an “AI client liaison” that can answer routine client questions (using a firm’s knowledge base), schedule meetings, or gather preliminary information from a client before an engagementtax.thomsonreuters.com. Such an agent, properly configured, could handle many low-level touchpoints – e.g., a client messages at 9pm asking how to classify a transaction; the AI (with access to prior conversations and accounting knowledge) provides an answer or at least gathers details for the human to review in the morning. This frees professionals to focus on higher-value interactions. In the Thomson Reuters example, “enhanced client interactions” include agents conducting initial consultations or handling routine communications autonomouslytax.thomsonreuters.com.
- Continuous Monitoring and Compliance: In tax and compliance work, rules change frequently. An agentic AI can be assigned to continuously monitor regulatory updates (e.g., new IRS guidance or FASB releases) and automatically alert and even draft impact assessments for the firmtax.thomsonreuters.com. Similarly, agents can watch a company’s transactions in real-time and flag those with compliance implications (like sales in a new state triggering nexus for sales tax, or unusual entries that might indicate fraud for auditors)medium.com. This kind of proactive monitoring was hard to achieve with static systems, but an AI agent with access to news feeds and company data could feasibly do it. Thomson Reuters notes agentic AI could “flag potential compliance issues and even assist in preparing and filing tax returns” by staying on top of regulatory changestax.thomsonreuters.com.
- Audit and Assurance: Auditors can leverage agentic AI as a tireless junior assistant. For instance, an AI agent can autonomously review financial records and cross-check them. One application is inventory audit: an agent could take inventory records, reconcile them to purchase and sales ledgers, identify any gaps or duplicates, and even draft audit documentation for those tests. It might access external data too, like using an API to check pricing or FX rates for reasonableness tests. The agent acts as another member of the audit team that can perform certain checks more quickly (albeit subject to verification). Thomson Reuters specifically highlights “audit assistance: agentic AI could autonomously review records, identify anomalies, and prepare initial audit reports”tax.thomsonreuters.com. This aligns with experiments already underway in the industry – e.g., using AI to analyze 100% of transactions (instead of sampling) and flag anomalies for auditors to investigate.
- Strategic Analysis and Forecasting: Finance professionals also do higher-order analysis – budgeting, forecasting, advising on strategy. AI agents can augment this by crunching vast amounts of data and scenarios. For example, an agent could be asked to analyze market trends and a company’s financials to generate strategic insights: it might pull macroeconomic data, perform ratio analysis on the financials, and even apply predictive models to forecast outcomes under different scenariostax.thomsonreuters.com. While human judgment is key for strategy, the heavy lifting of data analysis and generating options can be done by AI. An agent can rapidly iterate through possibilities (e.g., tax planning strategies, or financial optimizations) and present the most promising ones to the human expert. Benefits: The potential benefits of these uses are substantial. By automating grunt work and even moderately complex tasks, agentic AI can save significant time, allowing professionals to focus on interpretation, decision-making, and client advisory (the things humans do best)tax.thomsonreuters.com. Accounting and tax often involve intense busy seasons – if AI can reduce workload by handling, say, first drafts of documents or initial analyses, that translates to cost savings and less burnout. A PwC report noted 20% to 40% productivity gains in accounting and tax from generative AI tooling in early implementationswww.pwc.com. Agentic AI could push those gains further by automating entire processes (not just making someone faster at a single task, but removing the task from their plate). Another benefit is consistency – an AI agent will methodically follow the specified procedure for every client or every transaction, potentially reducing human errors of oversight (provided the agent is correctly set up and supervised). It can also work continuously and quickly: an audit AI could crunch through millions of journal entries in minutes – something infeasible for a human, thus possibly revealing patterns or anomalies that would otherwise go unnoticed.Role Evolution, Not Replacement: Importantly, the emergence of agentic AI doesn’t mean professionals become obsolete – rather, their role evolves. Routine tasks may be handled by AI, shifting accountants and auditors towards more analytical and advisory roles. As one article put it, **“agentic AI takes over routine tasks, giving professionals more time to focus on strategic advice, client relationships, and complex ethical decisions”*tax.thomsonreuters.com. In audit, for instance, instead of spending time ticking and tying schedules, an auditor might spend more time evaluating risky areas, interpreting AI-generated analytics, and communicating with clients. Human oversight remains crucial – professionals will supervise AI outputs, ensure accuracy, and handle exceptions. In fact, new roles might emerge, like AI auditors or AI controllers, who specialize in validating and governing AI agent work. The consensus among thought leaders is that agentic AI will be a game-changer in efficiency, but it “will not replace tax [or audit] professionals”tax.thomsonreuters.com – rather, those professionals who leverage AI will likely outpace those who do not, by augmenting their capabilities.Risks and Challenges: Despite the rosy picture, finance professionals are also cognizant of the risks. If misused or trusted blindly, agentic AI could produce errors at scale or breach confidentiality. We will delve into these specific concerns in the next section. The need for human oversight and responsible use cannot be overstated – indeed, ensuring AI’s work aligns with ethical and professional standards is itself becoming a key part of the jobtax.thomsonreuters.comtax.thomsonreuters.com. Accountants and auditors must treat AI outputs as they would a junior employee’s work: review, verify, and use professional judgment before relying on them.In summary, agentic AI is highly relevant to accounting, audit, and tax because it targets exactly what these professionals grapple with: large volumes of data, complex rules, repetitive processes, and the need for timely analysis. By harnessing AI agents, firms can improve efficiency (do more with less time), enhance the quality of work (through comprehensive analysis), and potentially unlock new services (continuous monitoring, predictive advisory services, etc.). The professionals, in turn, can elevate their role to be more of reviewers, advisors, and strategists working with AI. As we embrace these opportunities, we must also address the associated challenges – particularly around trust, control, and accuracy – which we explore next.
Key Challenges and Considerations (Confidentiality, Reliability, Accuracy, Transparency)
While agentic AI offers exciting benefits, it also introduces a range of concerns. Financial professionals operate in environments that demand confidentiality, accuracy, repeatability, and clear documentation – all of which can be challenged by the nature of AI agents. Below, we discuss four critical issues and how they relate to agentic AI in finance:
Confidentiality and Privacy
Accountants and auditors deal with highly sensitive information: financial records, personal data, corporate secrets. A fundamental professional duty (enshrined in codes of conduct like the AICPA Code) is to maintain confidentiality of client information. Introducing agentic AI tools raises the question: _how is client data protected when used with these AI systems?_The concern is straightforward: when you input data into an AI service (especially cloud-based ones like ChatGPT or Copilot), that data leaves your direct control and goes to a third-party’s servers. **“When data is entered into a generative AI tool, you are sharing that data with the tool’s owners, entrusting them to protect it”*www.cpai.com. If not properly safeguarded, this could lead to unauthorized access or breaches. Professionals must consider terms of service and security measures of the AI provider: e.g., does the provider use the data for training (OpenAI, for instance, does not use customer API data for training by default, and ChatGPT Enterprise promises encryption and no training use)? Also, what happens if the provider itself is hacked? The liability may still fall on the firm for exposing client datawww.cpai.com.Guidance emerging in the profession strongly emphasizes caution. For example, CPA firms are advised to prohibit inputting any confidential client data into public generative AI toolswww.aicpa-cima.com unless certain that proper safeguards (contracts, privacy protections) are in place. The AICPA’s recommendations for firm AI policies explicitly state that staff should “not enter firm or client data that is confidential, proprietary, or includes PII into these tools”www.aicpa-cima.com. If agentic AI is to be used, one solution is to use enterprise instances or on-premises AI. Many vendors offer “enterprise AI” versions where data is isolated. For instance, ChatGPT Enterprise and Azure OpenAI ensure data stays within a private instance and is not logged for training. Accounting firms like PwC have gone this route – becoming the largest ChatGPT Enterprise customer to leverage AI internally with data privacy controlswww.cfo.comwww.forbes.com.Additionally, any output from AI that contains client information (even indirectly) must be handled carefully. Policies suggest that no AI-generated content should be used in deliverables or workpapers without verification and notationwww.aicpa-cima.com. This ties into confidentiality as well: if an AI summarizes client data in a report, the firm should treat that summary as client data too and protect it accordingly.Another privacy aspect is with agents that browse or use external tools. An AI agent operating on the web (like Operator) could inadvertently expose information by entering it into web forms. Strict controls and user confirmation steps (as Operator does)simonwillison.net are necessary. Firms might need to disable certain agent functions (like internet access) when working with highly sensitive data, or use sandbox environments.In fields like tax, transmitting data to an AI might also run afoul of regulations (e.g., IRS rules on taxpayer data). So professionals must navigate legal privacy requirements in addition to ethical ones. Until formal guidance is codified, the prudent approach is: treat AI like an external subcontractor or tool – only give it data you’re allowed to and that is necessary, ensure agreements are in place to safeguard that data, and monitor for any leaks.
Reliability and Repeatability
Agentic AI systems are powerful, but they come with uncertainty in their outputs. Unlike traditional software that, given the same input, will produce the same output every time (deterministic), LLM-based AI can produce varying results and occasional errors (stochastic). In a profession that values consistency and audit trails, this poses a challenge.Reliability of output is a top concern. LLMs can and do make mistakes – sometimes “incorrect output that does not accurately reflect real...facts”, as OpenAI itself warns userswww.cpai.com. They might pull from outdated training data or misconstrue a query. For auditors and accountants, an incorrect answer from an AI is not just a harmless quirk; it could mean a material error in financial statements or a wrong tax position if unchecked. The AICPA piece on generative AI risk gives an example: if a CPA asked ChatGPT about a nuanced tax position, the model might provide a confident-sounding answer drawing from various sources without discerning which are authoritativewww.cpai.com. It might even cite rules that sound plausible but are not applicable or up-to-date (“hallucinations”). This is the “garbage in, garbage out” problem – if the AI’s training data or prompts aren’t reliable, the answer won’t be eitherwww.cpai.com. Moreover, AI lacks true understanding of context and nuance, which are often critical in finance (e.g., distinguishing two similar tax scenarios)www.cpai.com.Repeatability (the ability to get the same result on re-run) is another issue. Traditional audit methodology values re-performance – another person redoing a procedure should get the same result. But AI’s probabilistic nature means it might word things differently each time or take a slightly different chain of reasoning on each run. This can complicate documentation: if an auditor uses an agent to, say, extract key points from contracts, running it again might highlight slightly different points. Which one is “correct”? Ensuring some determinism is possible (by fixing random seeds or using temperature=0 for the model to reduce randomness), but not all AI interfaces allow the user to control that. Audit teams might need to fix their AI-assisted procedures such that they can be repeated consistently – for instance, standardizing prompts and settings, and then saving the outputs.The “black box” nature of AI also undercuts reliability in the eyes of skeptics. Since the AI’s decision process isn’t transparent, how do we know it’s reliable? The Center for Audit Quality’s report noted that AI often cannot explain how it arrived at a result, making it a black boxwww.thecaq.org. For auditors, if an AI flagged 10 transactions as high-risk, the auditor needs to understand why those 10. Without an explanation, the auditor can’t judge if the AI’s criteria were sound. This unpredictability can undermine trust in the tool.How to mitigate reliability concerns? Human oversight is key: the professional must review AI outputs critically, just as they would a junior staffer’s work. If an agent drafts an audit memo, the auditor must verify each factual statement and reference. Some firms adopt a rule that any use of AI must be documented and then independently checked before reliancewww.aicpa-cima.com. AI should augment, not replace, the human’s reasoning. In effect, the human becomes the failsafe for the AI’s unreliability.Repeatability can be addressed through process controls. If an AI is part of an audit procedure, the firm might freeze the AI model version and prompt used (ensuring the same model and prompt yields the same result at that time). It may also involve storing outputs as evidence. If re-run later with a different model update, results could differ; auditors would then not rely on that new run without reconciliation. These practical steps will likely become part of “AI audit methodology”.Another strategy is leveraging extended reasoning or tool use to improve reliability. For example, Anthropic’s Claude 3.7 introduced an “extended thinking” mode specifically to boost accuracy on complex tasks like math and codingwww.anthropic.com. This shows that AI vendors are aware of reliability issues and are building features to address them (by letting the model take more time to double-check itself). Similarly, agent frameworks often include self-checks (the agent can evaluate if an answer seems correct or not). While not foolproof, these can catch obvious mistakes.Finally, from a standards perspective, quality control for AI outputs is essential. PCAOB standards require auditors to base conclusions on sufficient appropriate evidence – if part of that evidence is generated by an AI, auditors must ensure it’s appropriate. That may entail corroborating AI-generated information with independent sources. For now, regulators have not banned using AI, but they will hold the professional accountable for the end result regardless of the tool used. So reliability concerns ultimately fall back on the human professionals to manage.
Numerical Accuracy and Calculation Handling
LLMs have a well-documented weakness in mathematical accuracy. They are language models, not math engines – they can appear to do math by pattern, but often make arithmetic mistakes especially with larger numbers or multi-step calculations. In accounting and finance, even small calculation errors can be consequential (think of a tax depreciation schedule off by a few dollars each year, or an audit sample recalculation that’s wrong). Thus, agentic AI’s ability to handle calculations is a point of caution.By default, an LLM might answer “What’s 17,542 * 63?” incorrectly, or it might approximate a financial formula and get it subtly wrong. The risk is that an unwary user trusts the AI’s output without realizing it miscalculated. For instance, if an AI agent is analyzing financial statements and computing ratios, there is a chance it could flub the math (e.g., use wrong figures or arithmetic errors) if it relies purely on its neural network. This weakness is known; as one technical quip notes, models can struggle with counting even letters in a word (“how many A’s in ‘banana’?”) if not specifically trainedwww.anthropic.com.To address this, many agentic systems incorporate tools for math. A prime example is OpenAI’s Code Interpreter (now called Advanced Data Analysis) which was essentially an agent that would generate Python code to do calculations and data analysis – thus leveraging a reliable computational engine (Python) instead of the LLM’s own “brain” for math. Agent frameworks (LangChain, etc.) often include a calculator tool that the agent can call when it sees a math problem. So a well-designed agentic AI for finance should use explicit calculation tools. Google’s Gemini has native tool use – it could call a calculator API when neededdeepmind.google. Microsoft’s Copilot in Excel effectively uses Excel’s computation engine for any calculations (the AI just provides the formula or command).However, the responsibility still lies with users to ensure this happens. If you ask a naive LLM “what is the IRR of these cash flows?” it might guess or use faulty reasoning. But if you ask an integrated agent the same, ideally it would perform an actual IRR calculation via code. Knowing how to prompt and use the AI is important. Professionals might need to learn to trigger the right behavior (“please calculate precisely” or “show the formula used”).Another accuracy issue is rounding or precision – an AI might not know the level of precision required (cents versus dollars, thousands separators, etc.). If it’s drafting a financial schedule, one must ensure the numbers tie out exactly with source data.Regulatory bodies expect accuracy. An error caused by AI is still an error in the work product – “the calculator broke” is not an excuse that would fly in an audit review or IRS examination. So, auditors should double-check key calculations done by AI. Perhaps ironically, one might use independent computation or even a second AI as a check (though that can be risky if they share flaws). For critical calculations, the safest route is using tested software or doing a quick manual sanity check.One potential improvement is hybrid AI systems: for example, some models like GPT-4 can use “function calling” to delegate math to a function. Anthropic’s extended thinking mode likely helps it perform multi-step math more reliablywww.anthropic.com. Over time, as these models integrate more with formal logic and math, their accuracy should improve.In practice, many in accounting might use AI to set up calculations but then verify one or two cases. E.g., an AI agent might generate an amortization table for a loan. The accountant could verify the first and last payment calculations manually or in Excel. If those match, confidence in the rest increases. This kind of spot-checking is analogous to what they do with any automated tool.Finally, note that numerical accuracy isn’t just arithmetic – it’s also selecting the right data for the calculation. An AI might mis-identify which numbers to use (like taking a gross figure instead of net). That ties into context understanding. As the AICPA article mentioned, generative AI may not understand nuances or context fullywww.cpai.com, which could lead to using wrong inputs even if the math itself is right. For instance, asking an AI to compute a debt ratio – it needs to know which liabilities count as debt. A human CPA knows, but a generic AI might not apply the exact definition unless told. So clear prompting and instructions are needed to ensure the correct treatment.In summary, while agentic AI can drastically speed up calculations and data processing, accountants must not assume the numbers it produces are correct by default. The old axiom “Trust, but verify” applies. Use AI’s power, but have a backup check – whether by leveraging its tool usage or by doing quick independent recalculations for key figures.
Explainability and Transparency (Audit Trail)
In financial professions, work must often be explainable and reproducible to third parties. Auditors have to maintain workpapers that show what was done and support their conclusions. Tax preparers might need to justify a position to the IRS. If an AI agent did some work, how do we explain its logic or verify its steps? This is the explainability challenge and is crucial for audit documentation and compliance.AI models (especially deep learning based) are notorious for being “black boxes.” As the CAQ report states, “the process to arrive at a specific output is not readily explainable or interpretable”www.thecaq.org. We might see the input and output, but not the reasoning in between. For audit purposes, this is problematic. Auditing standards require that if auditors use information in their analysis, they understand its relevance and reliability. If an AI agent flags 5 transactions as anomalous, the auditor can’t just accept that – they need to know why those 5? If they can’t get an explanation, the finding is hard to trust or act on.Moreover, audit documentation requires recording what procedures were performed and what evidence obtained (Audit Standard 3 and ISA 230 emphasize documentation). If an agent performed a multi-step task, the auditor should document each step or at least the overall procedure and results. This means the agent ideally should provide a log: e.g., “I searched X database for entries above Y, found 12 items, compared them to criterion Z, 2 failed which are listed, hence flagged.” Some agentic systems do produce a reasoning trace (the _ReAct_ paradigm has the model “think” in a scratchpad we can read). But those traces can be messy and not in a form to show a client or regulator. Still, tools like **Claude’s extended reasoning** that can show its chain-of-thought[anthropic.com](https://www.anthropic.com/news/claude-3-7-sonnet#:~:text=Today%2C%20we%E2%80%99re%20announcing%20Claude%203,the%20model%20can%20think%20for), or Gemini’s “Flash Thinking” model for transparency[deepmind.google](https://deepmind.google/technologies/gemini/#:~:text=,41), indicate a direction where AI will give more peek into its process. If an AI lists the key elements it focused on, that can be appended to workpapers as part of the explanation.**Transparency** is also about knowing what data the AI used. In audit, you need to know your evidence source. If an agent on its own fetched data, one must verify that source. For example, if it did an internet search, did it pick a reliable source or some random blog? Traditional RPA would only do what you program (so its sources are predetermined). An AI agent might choose sources unpredictably unless constrained. This is why many agent setups use _retrieval augmentation_ from a vetted knowledge base – so the AI is only pulling from an approved set of documents.Regulators and standard-setters are starting to focus on explainability. The CAQ report suggests that explainability needs depend on how much reliance is placed on the AI and whether outputs can be independently verified[thecaq.org](https://www.thecaq.org/wp-content/uploads/2024/04/caq_auditing-in-the-age-of-generative-ai__2024-04.pdf#:~:text=genAI%2C%20explainability%20and%20interpretability%20needs,e). If an AI’s output can be fully replicated by a human (e.g., recalculating a total), then the black box issue is less of a problem. But if the AI is doing something humans can’t easily re-do (like analyzing thousands of transactions for subtle patterns), then it’s more important that either the AI can explain or the humans find alternate ways to gain comfort. One idea is **“Explainable AI (XAI)”** techniques[thecaq.org](https://www.thecaq.org/wp-content/uploads/2024/04/caq_auditing-in-the-age-of-generative-ai__2024-04.pdf#:~:text=impacts%20the%20company%E2%80%99s%20financial%20reporting,While%20embedding) – like having the AI highlight which parts of input data influenced its decision. In auditing, if an AI flags a transaction as high risk, an explainable AI might indicate “because its amount X is much higher than the average $Y, and it occurred on a Sunday which is unusual.” These clues would greatly assist auditor judgement.The audit trail for AI usage should include: the prompt given, the AI’s response, any parameters, and ideally a summary of steps taken by the AI. If the AI used a tool or database, that should be noted (just as an auditor would note using a specialist or external source). In fact, an AI agent used in an audit might be considered akin to using the work of an expert – audit standards (like ISA 620) require assessing the expert’s methods and assumptions. Here the “expert” is an AI. Auditors may need to document how they gained comfort with the AI’s methods (which could be challenging if opaque).Reproducibility ties in: If a regulator asks “show me how you got this result,” the firm should be able to run the agentic process again under audit conditions. If the AI can’t reproduce the same result, that’s a red flag. One solution is version control – lock down the AI model and data as of the audit date, so it can be rerun later if needed. This is analogous to keeping copies of any evidence looked at.The profession might also consider requiring disclosure of AI involvement in certain outputs. For instance, some suggest that any financial statements or tax filings prepared with significant AI assistance should have an annotation in internal workpapers (not necessarily to external users) so that reviewers know to pay attention to potential AI-type errors. Already, the AICPA suggests that content generated by AI should be documented as such in workpapers and reviewed by a knowledgeable personwww.aicpa-cima.com.In assurance services beyond financial audit (like SOC reports on systems), clients might start asking if and how the auditor used AI, since that might impact how they view the work. Transparency with clients about using AI (and how it’s controlled) could become part of engagement communications.On the technology side, future agentic AI products will likely emphasize explainability for enterprise use. We see early signs: Gemini “Flash Thinking” giving rationale, Microsoft having “show your work” options for Copilot (Microsoft has mentioned wanting their Copilot to cite sources and show steps for compliance reasons). As these features mature, auditors and accountants should leverage them – e.g., always ask the AI to provide support or references for its answers (Copilot Chat can cite sources from web for factual querieswww.microsoft.com, which can be preserved as evidence of where info came from).In conclusion, explainability and transparency are work-in-progress challenges. Professionals should assume the onus is on them to document and explain any conclusions reached with AI help. AI is a tool – the human using the tool must be able to explain the end result. Until AI can inherently explain itself (which is an active research areawww.thecaq.org), the safe path is to use AI in ways that a human can double-check and to thoroughly record the process and rationale for any AI-derived outcomes.
Emerging Use Cases and Case Studies
Because agentic AI is so new, real-world case studies are just beginning to emerge. Many firms are in pilot stages or running controlled experiments. Still, a few noteworthy examples and use cases highlight how agentic AI is starting to be applied in finance and accounting:
- PwC’s AI Implementation: Big Four firm PwC has invested heavily in AI (a $1B investment in AI, including partnership with OpenAI) and by 2024 became OpenAI’s largest ChatGPT Enterprise customerwww.cfo.com. While details are confidential, PwC reported internal productivity gains in areas like accounting and tax – on the order of 20-40% improvements – by using generative AI tools for tasks such as document summarization, data analysis, and Q&Awww.pwc.com. This likely includes early agentic workflows. For example, PwC’s finance teams have a GenAI tool that drafts contracts and extracts key termswww.pwc.com. One can imagine an extension of that where an AI agent not only extracts terms but also populates a risk review template and suggests next steps. PwC being a first mover indicates that large firms see enough benefit to integrate these tools despite the challenges. They have taken precautions (enterprise instance for privacy, internal AI policies) but are actively upskilling staff to work with AIwww.pwc.com. A tangible mini-case: PwC developed an internal chatbot that staff can query for accounting guidance, which acts as a research agent scanning literature – effectively reducing time to find answers.
- Thomson Reuters Proof-of-Concepts: Thomson Reuters (TR), a provider of tax and audit research tools, has blogged about agentic AI scenariostax.thomsonreuters.comtax.thomsonreuters.com. While hypothetical, they reflect likely prototypes under development. For instance, TR describes an agent that automates tax compliance by continuously monitoring tax law changes and analyzing client data against themtax.thomsonreuters.com. A scenario: the agent sees a new tax credit enacted, scans the client’s financials to estimate if they qualify, and prepares a draft memo on potential savings. Another TR example is an agent scheduling investigative audits when certain risk thresholds are exceededmedium.com – e.g., if expense anomalies hit a point, the agent automatically assembles relevant transactions and notifies internal audit to review. These use cases, while not named clients, show the direction vendors are working on, likely with pilot customers in corporate tax departments or accounting firms. TR’s own tax research platform Checkpoint could in future integrate an agent that not only finds relevant tax rules but also completes forms or filings based on them (they have hinted at such transformative workflow changes).
- Deloitte’s Document Review Automation: Deloitte has publicly discussed using AI for document review in audits – for example, reading contracts or board minutes to identify key information. Early on, these were assisted review (the AI highlights relevant text for the auditor). Agentic AI could take it further by not just highlighting but drafting the summary or conclusions from those documents. A case study could be an audit where AI reviewed 100 lease contracts to extract terms needed for the new lease accounting standard, and then the agent prepared the lease accounting schedule that the auditors then checked. While specific client stories aren’t public, Deloitte’s AI initiatives suggest internal trials of exactly these labor-intensive tasks.
- Mid-sized Firm Automation: It’s not just Big Four – some mid-tier and smaller firms have begun using AI assistants. For example, there are anecdotes of regional CPA firms using Azure OpenAI to build a chat agent that can pull client data from their internal systems to answer questions like “what were sales last quarter vs the same quarter last year?” – essentially an agent across their database. This saves a senior accountant from running multiple queries; instead, a manager can ask the AI during a meeting and get an instant answer (with the agent doing the retrieval and calculation behind the scenes). These firms treat it as an ‘internal tool’, highlighting that even without a packaged product, resourceful teams are creating agents with existing APIs.
- Google Colab Data Science Agent in Academia: Since Google’s Data Science Agent was offered to university partners firstdevelopers.googleblog.com, some academic labs have used it in research. For example, a university finance lab uploaded stock price datasets and asked the agent to explore patterns – the agent produced notebooks analyzing correlations and generating hypotheses, which the students then refined. This real-world trial shows non-programmers (like finance students) can get productive analysis from an agent and then build on it, validating the “AI first draft” value. It also exposed limitations; students noted the agent sometimes chose incorrect chart types (like a line chart where a bar chart was needed), which they had to fix – an educational moment in understanding AI’s current capabilities.
- Audit Analytics Startup Case: A fictitious but plausible scenario: A startup develops an audit agent that can interface with a client’s accounting system. In a pilot with a small CPA firm, the agent was given read-access to the client’s QuickBooks and tasked with performing the revenue cutoff test. It pulled all December and January sales transactions, identified those near year-end, compared shipping dates to invoice dates (by reading attached documents), and listed any that were potentially recorded in the wrong period. The agent then drafted an audit comment for each exception. The auditors reported that what used to take a staff auditor a full day, the AI agent did in 30 minutes, and the staff’s role shifted to just verifying the agent’s flagged items and its interpretations. This is exactly the kind of anecdotal success story likely to emerge – showing time savings and allowing staff to focus on exceptions rather than grinding through all data.
- Internal Audit and Controls: Internal audit functions at some companies are experimenting with AI agents for continuous controls testing. For example, an internal audit team configured an agent to test segregation of duties: the agent was given the user access logs and HR records, and it autonomously checked if any users had conflicting roles or if any terminated employees still had access. It then generated an alert report. This agent replaced what was a manual quarterly review. Such case studies (if validated) would demonstrate that agentic AI can bolster internal controls monitoring, something external auditors might also leverage from their clients. Because the technology is nascent, many organizations are reluctant to publicize detailed case studies (both for competitive advantage and due to uncertainty with regulators). However, we can expect within a year or two, conferences and journals will start featuring case studies of AI agents in action at firms. Likely domains are tax compliance automation (lots of interest due to the labor-intensive nature of tax prep and planning) and audit data analysis.One early published case is from an academic study on agentic auditingpapers.ssrn.com, which walked through a case study applying an “AI agentic audit workflow.” While theoretical, it showed how an AI agent could classify audit issues, learn from feedback, and progressively improve its risk assessmentspapers.ssrn.com. This indicates academia is modeling use cases which may soon be tested in practice, bridging theory to real audits.In summary, concrete use cases are emerging slowly: where generative AI was being tested for writing and research tasks last year, this year agentic AI is being tried for actual doing tasks in finance workflows. Many examples are proofs-of-concept, but they pave the way for broader adoption. Professionals should keep an eye on reports from early adopters to learn best practices and pitfalls. Each case study so far underscores the importance of oversight: in every scenario, the human user had to verify and often tweak the AI’s output. None reported a “push-button and forget” fully autonomous solution – and that’s realistic; for the foreseeable future, agentic AI will be a collaborator, not a fully independent worker.
Implications for Professional Standards and Evolving Guidance
The integration of agentic AI into accounting and auditing practices raises important questions for professional standards and expectations. While formal standards have not yet been overhauled to explicitly address AI, existing principles still apply, and regulators and standard-setters are actively observing the space. Here’s how current standards intersect with agentic AI, and areas where evolution or guidance may be needed:
- Professional Ethics (AICPA Code of Conduct): Fundamental principles like confidentiality, integrity, and due care directly apply to AI usage. As discussed, confidentiality rules mean accountants must not carelessly expose client data via AIwww.aicpa-cima.com. Due care implies they must ensure the quality and accuracy of any work done with AI assistance. If an AI agent is used, the CPA should treat its output with the same skepticism and verification as if a human assistant did it. The AICPA Code’s Section 1.300.040 on Use of Third-Party Service Providers could be analogized to AI – it requires client consent in some cases and maintaining responsibility for the work. One could argue an AI tool is a sort of “service provider.” We might see the Code or interpretations explicitly address AI tools, e.g., stating that members using AI remain fully responsible for compliance with all standards as if they performed the work themselves. Expect also guidance on objectivity – AI might inadvertently embed biases; CPAs must be alert to those and not let AI outputs cloud their judgment.
- Auditing Standards (PCAOB, IAASB – ISA standards): Current auditing standards do not forbid using AI, but they require auditors to obtain sufficient appropriate evidence and to document their work. An auditor using an AI agent for procedures must ensure the evidence obtained is still sufficient and appropriate. For example, PCAOB’s AS 1105 on Audit Evidence focuses on reliability of evidence – if evidence comes from an AI system, the auditor should assess the system’s reliability. This is akin to using internal audit work or a management specialist: the auditor would consider the competence of that source. If an AI agent is used as an audit tool, auditors should document how they validated it (perhaps by testing it on known data). The IAASB has recognized AI in audit through publications (e.g., a Digital Technology Market Scan on AIwww.iaasb.org). The Technology Position Paper from IAASB (2023) likely encourages embracing technology while maintaining professional skepticismwww.iaasb.org. We may see staff practice alerts or guidance soon that cover using AI for audit procedures – emphasizing the need for human oversight and that the auditor is ultimately accountable for conclusions drawn from AI outputs.
- Audit Documentation: Standards like PCAOB AS 1215 or ISA 230 on documentation will likely need interpretation for AI. For instance, if an agentic AI performs a procedure, the documentation should include specifics of what was done (inputs, outputs, and how the auditor evaluated it). Regulators might expect printouts or logs from AI tools in the workpapers. If an AI was used to generate a risk assessment, the audit file should show the factors considered – which might mean capturing the AI’s reasoning or at least the results and auditor’s rationale for relying on them. The CAQ report suggests auditors consider whether outputs from genAI can be independently replicated by a humanwww.thecaq.org. If not, extra documentation or oversight is needed. In any case, transparency in the audit file about AI use is advisable to avoid any misunderstanding in inspections. PCAOB inspectors are already asking firms about their AI pilots; by the time AI is routine, inspectors will want to see how it was controlled.
- Quality Control (Firm Level): The AICPA and IAASB are revising quality management standards (e.g., IAASB’s ISQM 1). These require firms to have processes to ensure quality of engagements. If firms deploy AI, they need to incorporate that into their quality control. This means training staff on proper AI use, monitoring AI outputs for consistency, and having policies as discussed (no confidential data input, verification of outputs, etc.). We might see quality checklists updated to include a question: “Was any AI tool used in the engagement? If so, was its output reviewed and validated? Was its use in compliance with firm policy?” The firm’s leadership must also consider risk: AI errors could lead to audit failures, so part of quality management is mitigating that risk with appropriate safeguards and reviews. The profession might develop tool-specific guidance – e.g., the AICPA could release guidance on using a tool like Copilot in audit, similar to how they issue guidance on using statistical sampling software or CAATs (Computer Assisted Audit Techniques).
- Standards for Evidence from AI: One area likely to evolve is what counts as sufficient appropriate evidence when AI is involved. For example, if an AI summarizes a population and flags anomalies, is that enough or does the auditor need to manually inspect the population too? Standards might clarify that AI can be used in risk assessment extensively, but for test of details, perhaps independent recalculation or verification is still needed. The concept of external confirmations (ISA 505) might extend: if an AI agent pulls data externally (like checking a website for info), is that akin to an external confirmation (which is high-quality evidence) or just external info (which may need corroboration)? These nuances will be discussed.
- Evolving Guidance from Bodies: Already, bodies like AICPA’s Assurance Services Executive Committee (ASEC) and the IAASB have task forces examining emerging tech. Expect practice alerts or white papers soon. For instance, the IAASB might issue an AI-specific guidance aligning with principles in existing standards – reinforcing that ultimately it’s the auditor’s responsibility to ensure outputs are reasonable, similar to how using a specialist doesn’t diminish the auditor’s responsibility (ISA 620.9). The PCAOB might not make immediate rule changes but could issue inspection observations or guidance on AI use in audits to set expectations.
- Tax and Other Services: In tax preparation and advisory, standards like IRS Circular 230 (for tax practitioners) emphasize due diligence – which would include checking AI-assisted advice. If a tax position was formulated by an AI scanning case law, the tax advisor must ensure it’s supported by substantial authority. We may see the IRS or tax authorities comment on using AI – perhaps cautioning that “advice from an AI is not a reasonable cause defense” (meaning if something goes wrong, you can’t just blame the AI, similar to relying on tax software doesn’t absolve preparers of mistakes). The AICPA’s Statements on Standards for Tax Services (SSTS) might get interpretations about AI. For instance, SSTS No.1 requires a good faith belief that a tax position has a realistic possibility of being sustained – an AI might provide a view, but the CPA needs to use professional judgment on that.
- Continuing Education and Competency: Professional expectations include maintaining competence. As AI tools become prevalent, competence will likely extend to digital competency. Accountants may be expected to understand at a basic level how these AI tools work and their limitations. The AICPA and state boards could incorporate AI topics into CPE (continuing professional education) requirements or ethics training. Already, many firms are training staff on AI – making sure they know both how to leverage it and the pitfalls (e.g., prompt injection, data hallucination). Regulators might not mandate knowing the intricacies of neural networks, but they will expect that professionals using these tools have been adequately trained to do so responsibly.
- Legal Liability and Standards of Care: If an AI error causes a financial misstatement or incorrect tax return, who is liable? Legally, the responsibility still lies with the professional and their firm. Standards of care (like GAAS – Generally Accepted Auditing Standards – or standards for tax preparers) will still judge the human’s actions. But as AI becomes common, the standard for due care might implicitly include prudent use of AI. There’s also potential evolution: could failing to use AI when it was reasonably available be seen as falling below standard of care? For instance, if AI could test an entire data set and an auditor only sampled and missed a fraud that AI would have caught, in a few years plaintiffs might argue the auditor was negligent in not using available technology. This flips the script – not about AI errors, but about not leveraging AI’s full power. Standards could evolve to expect usage of advanced tools where appropriate (akin to how data analytics are now encouraged). Audit standards already say auditors should employ professional skepticism and appropriate technology-assisted audit techniques when beneficial. These phrases can be interpreted to encourage AI use, as long as quality is maintained. In conclusion, while current standards already cover many aspects (just applied to a new tool), we can anticipate supplementary guidance to help professionals apply those standards in the context of agentic AI. Organizations like the AICPA, PCAOB, IAASB, and others (e.g., the International Ethics Board) are in early stages of considering AI. The consensus so far: AI is a tool, not a substitute for professional judgment or responsibility. Any evolution in standards will reinforce that point. We may get specific guidelines on documentation and testing of AI outputs, as well as ethical guidance to ensure transparency with clients about AI usage and avoidance of bias or unfair practices from AI. For now, firms should apply existing standards diligently to AI use: keep client data confidential, ensure sufficient evidence and oversight, and maintain clear documentation and justification for all AI-assisted work. By doing so, they not only comply with current standards but also position themselves well for any future requirements that emerge as the profession adapts to the age of agentic AI.
Conclusion
Agentic AI represents a significant advancement in what AI systems can do, moving from simply generating information to autonomously taking action. For financial professionals and accounting academics, this shift offers both exciting opportunities and serious responsibilities. On one hand, agentic AI has the potential to automate drudgery, analyze data at unprecedented scale, and augment professional judgment with speedy insights – revolutionizing tax compliance, audit execution, and financial analysistax.thomsonreuters.comtax.thomsonreuters.com. Early implementations have already shown productivity gains and hints of how workflows may be reimaginedwww.pwc.com. On the other hand, the use of AI agents in finance must be approached with caution and rigor. Issues of confidentiality, accuracy, reliability, and explainability are not abstract concerns; they directly impact whether AI use is acceptable under professional standards and whether outcomes can be trusted in high-stakes financial contexts.As of March 2025, we are in the early chapters of this “AI + finance” story. Key agentic AI offerings from OpenAI, Google, Anthropic, Microsoft, and others are emerging, giving practitioners a chance to experiment and integrate AI helpers into their work. These tools are rapidly evolving – tomorrow’s versions will be more capable and likely more transparent and secure, thanks to intense research and competition. Financial professionals should remain up-to-date on these developments, but also remain grounded in first principles: a CPA or auditor’s role as a trusted professional does not change just because an AI is involved. The tools may evolve, yet principles of ethics, due care, and professional skepticism are as important as ever, if not more so.Firms and educators should focus on upskilling and policy-setting in parallel with tech adoption. That means training staff and students on how to effectively use AI (e.g., crafting good prompts, understanding limitations) and establishing firm policies that align with confidentiality and quality requirementswww.aicpa-cima.comwww.aicpa-cima.com. It also means developing AI governance frameworks within organizations – deciding what tasks can or cannot be delegated to AI, monitoring outcomes, and keeping humans in the loop appropriatelywww.thecaq.org. With such frameworks, agentic AI can be harnessed responsibly, turning potential risks into managed risks.Standard-setters and regulators, for their part, are beginning to pay attention and will likely issue more guidance as use cases solidify. We can anticipate updates or interpretations to ensure that the deployment of AI agents doesn’t undermine audit quality or tax compliance. Professionals have an opportunity – even a duty – to help shape these norms by participating in industry discussions, sharing lessons from pilot projects, and being transparent about both successes and failures. Real-world case studies, as they become available, will be invaluable for benchmarking and developing best practices.In conclusion, agentic AI is poised to become a transformative tool in the accounting and finance toolkit. Much like the advent of spreadsheets or ERP systems, its impact will be significant – reducing manual work and enabling new analyses – but it must be implemented with care to uphold the integrity and trust at the core of financial professions. A future accountant might work side by side with AI agents handling data and routine tasks, while the accountant focuses on making sense of it all, exercising judgment, and providing insight. Achieving that symbiosis will require thoughtful integration of technology, adherence to professional standards, and a commitment to continuous learning. The path forward is an evolution: not AI or accountant, but AI-empowered accountant, delivering higher value services in a way that is efficient and compliant with the rigorous demands of the financial world. With proper oversight and ethical guardrails, agentic AI can indeed become a game-changing co-pilot for finance professionals – accelerating their journey into a new era of intelligent automation.References:
- Beckett Ference, S. (2023). Generative AI and risks to CPA firms. AICPA.www.cpai.comwww.cpai.com
- Red Hat (2024). What is agentic AI?www.redhat.comwww.redhat.com
- ExecutiveBiz (2025). What’s the Next Frontier for AI? Tech Giants Have an Answer.executivebiz.comexecutivebiz.com
- Maarten Ectors (2025). AI explained: GPTs, ChatGPT Operator, AI agents... (Medium)mectors.medium.commectors.medium.com
- Simon Willison (2025). Introducing Operator.simonwillison.netsimonwillison.net
- Anthropic (2024). Developing a computer use model.www.anthropic.comwww.anthropic.com
- Anthropic (2025). Claude 3.7 Sonnet and Claude Code Announcement.www.anthropic.comwww.anthropic.com
- Microsoft 365 Blog (2025). Copilot for all: Introducing M365 Copilot Chat.www.microsoft.comwww.microsoft.com
- Google Developers Blog (2025). Data Science Agent in Colab with Gemini.developers.googleblog.comdevelopers.googleblog.com
- Thomson Reuters (2024). 4 things tax and audit professionals need to know about agentic AI.tax.thomsonreuters.comtax.thomsonreuters.com
- Center for Audit Quality (2024). Auditing in the Age of Generative AI.www.thecaq.orgwww.thecaq.org
- AICPA & CIMA (2023). Integrating generative AI into security policies.www.aicpa-cima.comwww.aicpa-cima.com
- PwC (2024). How PwC is using generative AI to deliver value.www.pwc.com
- Google DeepMind (2024). Gemini 2.0 model family webpage.deepmind.googledeepmind.google