A Practical Look at AI Implementation in Finance Departments

A Practical Look at AI Implementation in Finance Departments

Share

To guide you in leading AI-driven optimization processes in finance and operations departments, this article will present the practical applications possible with existing technology, including real-world usage examples, how to use these tools, their limitations, exposure risks, and future potential. We will focus on tools, protocols, and solutions built for companies and businesses, emphasizing use in finance and operations departments, although these can be implemented in any corporate and business framework.

From Excel to AI

As a young accountant, I was introduced to a tool we all know – Microsoft Excel – and immediately became addicted. As I learned to use the software I realized how powerful this tool was in analyzing and processing data, and more importantly, I understood the significant advantage that those who knew the software had over those who were hesitant to use it. Data analysis tasks required me to learn new formulas and understand what was needed to build complete files, a kind of ‘architecture,’ and I gained a reputation in the office as an Excel expert. After receiving calls from coworkers mainly asking about Excel formulas, I prepared a video explaining how to use Vlookup – for those familiar, it’s one of the most useful functions at the basic Excel level. I shared the video in the office, and surprisingly, instead of freeing up my time with fewer questions, the number of inquiries only increased… The tutorial video had an unexpected effect, it revealed the software’s potential to those who previously hadn’t grasped its value. Following this, I opened a Facebook group called ‘Excel Help’ where I posted tutorial videos and provided answers to questions.

Today, with existing AI technology, a site like ‘Excel Help’ looks as outdated as a horse and carriage in the age of autonomous vehicles. Today, a human expert has become unnecessary, and you can simply ask the friendly bot that only gets smarter over time. In the span of two years, from 2021 and the appearance of ChatGPT 3 until 2023 and the launch of GPT-4, the world has changed and the expectation is that future change will be even more significant and extensive.

My personal story with Excel illustrates the evolution of technological tools in finance departments – from the days when a human expert was needed to explain how to use a Vlookup function, to today when AI provides professional and immediate answers to much more complex questions. But to understand how to use the new technology effectively, and to avoid costly mistakes in the finance department, it’s important to understand the technological basis and limitations.

Factors Contributing to AI’s Surge

Contrary to popular belief, AI technology is not new. The term “artificial intelligence” was coined in the 1950s, along with research on neural networks that forms the basis for today’s existing technology.

The current trend of AI capabilities surge is attributed to three main factors. The first is computing power – improvements in processor capabilities since 2010, particularly in GPU processors adapted to AI’s specific requirements, led to the possibility of significantly increasing the scale of data that the model can process and consequently, expanding the model’s capabilities.

The second factor is in model architecture. In a paper published by Google researchers titled ‘Attention is All You Need’, the researchers laid the groundwork by presenting the Transformer architecture and Self-attention mechanism. Using this existing foundation several research labs, including OpenAI, developed a (Large) Language Model known as LLM. The model trains on language-based data at enormous scales, learning to recognize patterns in human language, specifically by attempting to predict the next word in a sentence. From this training, cognitive abilities emerge in providing logical answers to problems across various disciplines, from mathematics through biology to psychology.

The third factor is OpenAI’s innovation in communication method with the model. With ChatGPT-3, a revolutionary interface was introduced for the first time that made ‘conversation’ with the model intuitive in the form of chat, paving the way for the technology we know today.

AI Evolution

Why is this background story essential for our practical implementation in day-to-day operations? To make effective use of AI we need to precisely understand the model and map out which tasks it can perform for us and those where its capabilities are limited.

The existing technology relying on models trained on language (as well as images, video, and audio) works as follows: An enormous amount of data was fed into the model in a way that the model ‘learned’ the connections and patterns and can draw conclusions. This means that despite differences in performance all models are essentially based on similar technology – whether it’s ChatGPT, Claude, or Llama. The differences in capabilities mainly stem from the scale of data on which the models were trained and the limitation of different models today is at the computing level. The more access a developing company has to AI processors (a field almost absolutely dominated by Nvidia), the more capable their model will be.

When GPT-3 was launchedpeople were surprised by the model’s language capabilities and intelligence. Suddenly it was possible to converse with software in a way that could simulate a conversation with a human being. However, at this stage, the model struggled to answer professional questions effectively making it difficult to integrate its outputs into company processes. In March 2023, GPT-4 was launched, showing a significant leap in capabilities and for the first time enabling professional-level responses. The answers it provided were formulated using language at a level equal to or exceeding human composition and many began using the model for text writing, responding to professional emails, translation, and answering professional questions.

Very quickly, these initial efforts with the model’s new capabilities encountered one of the main problems which remains one of the central challenges in professional use today – the hallucination problem (yes, that’s the technical term). One characteristic of a language model is creativity which on one hand allows for flexibility in responses and on the other hand generates answers that don’t necessarily align with the base data it was trained on. This reliability issue prevents many practical applications requiring although there are ways to reduce the effects of this characteristic, which we’ll discuss later.

How do we integrate this technology into our work?

In this article, we’ll focus on using it as a tool in operational work processes for companies, emphasizing processes in finance departments and financial operations. These processes include activities in accounting and financial reporting such as financial calculations and book entries according to accounting standards, preparation of financial and business reports, processing and analysis of financial data, comparisons and reconciliations.

In bookkeeping, the new tools can assist with invoice processing, payment tracking, expense classification, bank reconciliations, customer and vendor reconciliations, regulatory reporting, and more. There are also many uses in financial administration, such as processing payments, collecting data for payroll preparation, handling work orders, price quotes, service terms, and billing. Additionally, there are applications in general tasks such as responding to emails, summarizing conversations and meetings, and handling questions related to internal databases.

Although we’ll refer specifically to use in finance departments, the implementation method will work for other professional segments with relevant adjustments.

What are the possible uses of AI within finance departments?

The uses can be divided into three main categories, for which we’ll discuss current possibilities including advantages and disadvantages,possible risks, recommended usage methods, and future outlook.

3 main categories:

First Category – Narrow AI– This includes the use of tools and software with an AI component. For example, tools like DOKKA that enable vendor invoice identification and processing, or Israeli-based Trullion that creates accounting entries according to lease standards by processing rental agreements, thus saving valuable accountant hours and streamlining processes in companies where standard implementation is material and requires precision and investment. These applications, although using AI components for analysis, processing, and operation, are characterized as ‘narrow intelligence’, meaning they’re suitable for specific, non-general tasks. The Datarails solution, for instance, enables business and economic analysis, streamlines budget preparation, and interfaces with ERP systems. The basis for this is software developed by engineers to provide specific solutions in FP&A worlds. If we want to add functionality, make changes, include new edge cases, etc., we’ll need to rely on the software developers for version updates. Basically, the software is built to provide a specific, point solution. This category of use requires identifying significant processes in the organization, usually led by team managers, who might resist specific automation due to implications for their staff (despite the promise of efficiency), so there’s an inherent conflict here.

Second Category – Generative AI: whether through a chat interface like ChatGPT or through wrapper software such as Microsoft CoPilot. The fundamental difference from the first category is that this involves ‘broad intelligence’, meaning theoretical capability to address general issues.

Third Category – Agentsthe most advanced, is the use of AI agents. How do we define an AI agent? Giving AI the ability to take action makes it an agent. For example, giving AI permission to send emails from our email inbox. The operational advantage is enormous — instead of a question-answer interface that only allows information exchange, agents can create more dynamic and complete work processes.

Exposures, Risks, and Limitations

Information Security Limitations

When we use an interface with an AI model, operationally, it involves data processing in the provider’s cloud. In other words, if we write a query to ChatGPT, the query goes to OpenAI’s servers where the model processes the query and returns an answer. Although query transmission is done via secure connection, a third party – the model provider – receives the information. In the case of sensitive information, such as financial reports, employee salaries, etc., there is exposure.

Companies like OpenAI use the information sent to them to train their models. A model trained on any information can potentially reference this information if asked about the topic in the future. OpenAI offers options to block the use of information for model training and even offers a corporate subscription program that includes data privacy policies, as do other model providers (Google, Anthropic, etc.). We suggest reviewing this topic with your IT expert and ensuring that terms of use and privacy policies, including regulatory requirements for data retention are understood to minimize exposure.

Regarding models, it’s possible to use an open-source model, like Meta’s Llama, in a way that doesn’t send data to third parties. How does this work? Meta took an open-source approach and published their main model Llama for free. Anyone can download this model and use it without connecting to the internet. The limitation is that to run the large model Llama 3.1 405B, serious computing infrastructure is required at high costs making this option uneconomical for most organizations. There is the possibility of using smaller models locally but they won’t have the same level of intelligence and cognition as the large models. Using small models locally will suit specific tasks that don’t require high intelligence, such as classification, summaries, and simple translation.

Reliability Limitation

Another limitation and risk to be aware of is the level of model response reliability. One way to determine model reliability is to measure ‘hallucinations’, meaning model responses that aren’t based on facts. As we explained at the beginning of the article, because a language model is built organically to complete text logically and wasn’t built by engineers specifically to answer correctly or incorrectly, in some cases its answer will be ‘creative’ to the point of distorting facts, lacking logic, and completely wrong answers.

While more advanced models show lower hallucination rates, even the most sophisticated systems still grapple with this fundamental challenge. Additionally, there isn’t a single measure for hallucinations; they can be measured in different ways. In tests presented by Vectara, summarization abilities of models showed hallucination rates between 1-4% (Figure 1), compared to testing models on specific questions from given text, where higher hallucination rates were recorded (Figure 2).

Figure 1 – text summary accuracy by model

https://github.com/vectara/hallucination-leaderboard/?utm_source=chatgpt…

Figure 2 text hallucinations rate by model  

https://github.com/lechmazur/confabulations/?utm_source=chatgpt.co

The implication is that model response reliability is related to how it’s operated. There are differences in the required reliability level for each action. Summaries have one level of reliability, and questions from text have another. For our practical use this means we need to examine each solution and work method separately to assess the reliability level and what controls are needed in use.

Limitations in Types of Tasks

Another limitation worth noting is the gaps in model capabilities across different types of tasks. In measurements of model performance in tests examining language, software engineering, history, psychology, statistics, and bar exam tests – the newest models manage to show results equivalent to the top decile of human test-takers. However, the models struggle with everything related to basic logic and multi-stage planning. (Figure 3)

Figure 3 – performance benchmark by subject , OpenAI

https://openAI.com/index/gpt-4-research

The implication for us is that despite seeing the model excel in tests, in implementing actual tasks in practice, we need to examine them and evaluate for ourselves per task. It is recommended to test every task given to the model to perform, whether as a query or an agent task, and check it multiple times to assess the success rate in that specific task. (Figure 4)

Figure 4performance benchmark by subject  and by models

https://www.vellum.AI/blog/llm-benchmarks-overview-limits-and-model-comparison

The distinction between the capabilities of AI with different task is crucial because it affects how we should implement AI in different scenarios:

  1. Tasks where models excel (language processing, professional standards application)
  2. Tasks requiring careful monitoring (multi-step logical processes)
  3. Tasks where human oversight is still essential (complex decision-making)

What are some of the tools we have at our disposal?

Just before we share examples of implementation,we’ll need to become deeply familiar with the tools at our disposal. Today, there exists a wide range of AI solutions, from advanced language models, through dedicated tools for finance departments, to automated AI agents. Each of these tools is suitable for specific tasks and requires understanding of its optimal use method.

Advanced AI Tools and Models

Among the main LLM (language models) leaders in the market from OpenAI are ChatGPT4o and a specialized model called O1 (indeed similar and confusing names, but these are two different models). For our uses, these are two relevant models from this family. ChatGPT4o is a model capable of receiving text, images, and files, and can also search the web. The o1 model is one of the first of its kind trained to provide responses including logic through planning its steps (Chain of thought). The difference is noticeable in the level of capabilities in handling more complex issues, and therefore very useful for outputs requiring multi-step planning and code generation (Python, VBA relevant for us, we’ll touch on this later).

Anthropic’s model called Claude 3.5 is considered the strongest model in terms of language capabilities, especially in code generation, although the o1 model launched in mid-December surpassed it. Additionally, Anthropic has an agent protocol called MCP that can also connect to models from other companies. This protocol enables, among other things, an agent to perform actions on the personal computer such as opening folders, files, performing browser operations, and more.

From Google comes the Gemini family, with version 1.5 considered inferior to the previous models we mentioned. Google has caught up and version 2 of their flagship model is expected to be a significant improvement. A particularly powerful tool launched recently in December 2024 is Google Deep Research. The tool sends AI to research 100 different websites on the internet and return a summary report on the activity. This tool has been tested and examined by experts in various fields and has entered use as a research aid tool.

Last on this list is Llama 3.1 from Meta, what distinguishes it is that it’s open source, meaning anyone can download and use it without connecting to Meta. However, as we explained, the flagship model requires computing power that small and medium-sized companies don’t have, so the only solution is cloud connection (though not necessarily with Meta, but with any cloud computing service provider).

One of the most important tools today is Microsoft Copilot 365, or the version that connects directly to Microsoft software, including Excel, PowerPoint, Teams, Outlook, and more. The practical significance is enormous – not only is there less friction in use since we’re operating AI within the applications, but we also avoid exposure and keep our private data secure as part of the process. It should be noted that this is a different version from the CoPilot you see offered free in Edge browser or as part of Windows. The 365 version integrates with the software and also enables actions – for example, in Excel, you can verbally request changes and receive the changes within the file. Additionally, CoPilot enables the creation of agents, meaning defining a process factor that receives a role and can perform actions in Microsoft software or in interfaces with other software.

An advance tool worth mentioning is agents. The term describes using a base LLM model, for example ChatGPT 4o, and providing role definitions, method of role execution, and actions. For example, an email summary and classification agent. The agent’s definition would be ‘your role is to review incoming emails to inbox X, classify the emails into three categories: urgent, routine, and spam. When emails are from the urgent category, send to manager Y, use a professional tone and summarize the essence of the urgent email.’

In this case, we defined the agent’s role, defined the actions it needs to perform, and defined the response method. Of course, we can go down to a more specific level of detail. The use and interface today for setting up agents can be complex and through code, through frameworks like Crew.AI or in frameworks of established features like agents built into Salesforce and also like those we mentioned in Microsoft Copilot 365.

We should mention Anthropic’s protocol called MCP – Model Context Protocol, which provides a framework for connecting agents. Using this protocol allows, among other things, a defined agent to perform actions on the personal computer such as opening folders, files, performing browser operations, and more. This is a very powerful tool but currently limited in terms of capability. The main limitation today in using agents is the models’ difficulty in performing multi-stage planning and using logic. Except for OpenAI’s o1 model, most models fail as agents in overly complex operations. o1 represents a change and basis for future development of agents with higher capabilities that are expected to be significant in 2025.

A final tool worth mentioning, one of the most useful for us, is slightly different from those mentioned and is called Perplexity. This is a search tool, Google-style, except that instead of directing to a website, it provides answers based on websites. Overall, it’s a very efficient interface for receiving professional information that meets reliability and efficiency needs, and therefore recommended to incorporate into basic information search processes.

How do we implement this technology in practice in the finance department?

After understanding the capabilities, limitations of the technology, and some of the available tools, let’s discuss on how to actually go about implementing in a corporate environment.

Implementation Framework

The implementation process includes several key stages:

  1. Organizational Process Mapping: Similar to SOX requirements, this includes defining key processes in the organization while dividing them by executing factors, execution method, software interfaces, data sensitivity level, and automation potential.
  2. Pre-Implementation Feasibility Assessment: The more significant the process is to the company, the more worth investing in AI planning and implementation. As we’ve understood from the article, today’s models specialize in different types of problems and struggle with others, so we need to break down each process into sub-processes where possible.
  3. Solution Testing in Control Environment: Effective AI implementation hinges on rigorous testing – no amount of theoretical mapping can substitute for practical validation. Only through examining the practical usage can we evaluate whether the task is suitable for AI, what quality it produces, whether additional controls are needed, what efficiency improvements are achieved, etc. We need to remember that AI technology is characterized by variance meaning that even if we managed to produce a desired result in testing it’s important to try several times and in different variations to get a realistic assessment of future field implementation.
  4. Evaluation and Recommendations: In the article’s examples, we’ll refer to ways to improve output reliability. Evaluating the solution in the control environment will provide you with an assessment of the solution’s fit for your work method. Once you see value in the solution according to parameters of time efficiency, quality, and/or ease of use, you can recommend general and cross-organizational implementation for employees and interfaces related to the process.
  5. Field Implementation: Before implementation, it’s recommended to train employees who will use or interface with the system to create optimal conditions.

Examples of practical implementation:

Information Queries: Finance departments must meet professional requirements to work according to changing and complex standards and procedures. The tax world, for example, includes developments and updates continuously. The professional material produced by tax authorities, experts, court rulings, etc., is complex and abundant. When looking for a solution, a simple keyword search doesn’t provide a real answer – we want to get an answer based on data. For tax-related questions, we took BDO’s 2025 tax updates booklet (useful and recommended) and asked the model questions on various topics. To prevent hallucinations, we asked the model to quote the location from which it took the answer within the text. In this case, there’s no restriction on exposing the file to public models, so you can get quality responses from leading models. In this example, the models manage to produce relatively accurate answers efficiently, significantly reducing execution time while also gaining quality improvement.

Professional Outputs: Our accountants need to perform accounting records according to standards. Whether it’s IFRS, US-GAAP, or Israeli standards, a high professional level is required. For instance, IFRS16 is considered a common standard with technical implementation. Again, we can take the basic data (in this standard: rental agreement costs, period, discount rate, etc.) and ask the models to prepare not only the calculation but also the required journal entries for each period. The leading models will produce detailed answers including the solution method, so it can be integrated into financial statement audits.

Financial Statement Review: For sensitive data, it’s always recommended to consult with your IT expert regarding exposure. Currently using the built-in CoPilot we don’t need to leave Word and can examine financial statement text for errors and mistakes, number reconciliations, or version comparisons. Currently, CoPilot’s performance in financial statement review falls short of professional standards, but it’s still worth knowing this solution because CoPilot’s improvement rate is tremendousand it’s expected that by mid-2025, it will be able to take a substantial part in preparing financial statements.

Interfaces and Data Editing: In Excel’s case knowledge of formula writing is required. Today, you can ask the models or directly in CoPilot for the desired result and get an appropriate formula. This use can be expanded to more powerful tools, such as PowerBI or PowerPivot, which require knowledge of more complex code languages typically unfamiliar to finance people. In many databases you can generate SQL queries for direct data extraction. We ask the model to generate the query and use it to get processed output. In projects involving complex data processing using PowerPivot, the models created the complex formulas needed for information processing. Additionally, when data arrangement is required, such as converting files from one format to another, macro commands in Excel are typically used. Macro commands are written in VBA and we use models to write the code needed for complex macro commandsenabling complex format conversions in ways we previously couldn’t achieve.

Data Analysis and Anomaly Detection: In cases of examining our organizational charges (customers/vendors) we deploy the model on large data sets to examine anomalies. For example, an Excel file of employee time records containing tens of thousands of rows where each entry is made manually (making it difficult to search absolutely). We deploy the model for analysis and define what to look for as anomalous. In performing this task, we received mixed results with AI sometimes struggling to fully and completely find what we wanted and defined.

Recommendationsand why start today

You’ll only know if these applications are relevant to your business if you try them. If the tools in their current version don’t succeed in performing 100% of the task, it’s important to know that the entire field is making quantum leaps at an accelerating pace. The new version of O1 was given to a PhD physicist, and in his experience, the software managed to generate a solution to research he had worked on for 10 months, in a matter of hours.

The implication is that there are real use cases where AI is going to change the way we work. A deep look at the technology, its advantages and limitations, shows that the key to success lies in smart and gradual implementation. In today’s finance departments, the combination of professional understanding and proper use of technology can lead to significant efficiency. We recommend everyone to expand their experience and experiment, firsthand, by using this technology in a business context and this is why you muststart now.

As Jensen Huang, CEO of Nvidia (quoted by economist Richard Baldwin) recently said: ‘AI won’t take your job, it’s somebody using AI that will take your job.’

*This article was written manually, and after being written was rewritten by AI. It was then rewritten again by another person and represents a human+machine product.

Share

Latest Articles & Updates

Provident Funds – Mandatory Standardized Clearing and Employer Interface Update (Version 5) Starting in February 2025, two key changes will...

Read More

To guide you in leading AI-driven optimization processes in finance and operations departments, this article will present the practical applications...

Read More

Freelancers often hire a financial professional with a "set it and forget it" mindset—sending materials each reporting period and expecting...

Read More

Proper payroll management is essential for any organization, directly impacting employee satisfaction and the financial health of the business. Whether...

Read More

As we close Q2 2024, the cryptocurrency market has witnessed significant volatility, with the total market cap dropping from $2.7...

Read More

New Club Enables Angels to Enjoy Top Decile Performance of VC Funds. How Risky Are VC Investments and How Can...

Read More

According to an update from the Tax Authority as of 28.06.2023, the form will took effect on 1.11.2023. No new...

Read More

What a few years ago seemed to us as the ''future'' has become today's reality

Read More

Let's Talk

(Real people here by the way, not bots!)

Fill out the form and we’ll get back to you shortly with tailored  solutions