[Copilot Studio] Tried Handling KPI Aggregation Deterministically Without Relying on LLM: Two Approaches Using Code Interpreter and Office Script

[Copilot Studio] Tried Handling KPI Aggregation Deterministically Without Relying on LLM: Two Approaches Using Code Interpreter and Office Script

2026.06.22

This page has been translated by machine translation. View original

Introduction

Hello, I'm Keema.

When creating KPI reports, the source data is often monthly, and you frequently want to include quarterly totals, averages, and year-over-year comparisons in the report.
The dilemma here is whether to let an LLM handle the average and year-over-year calculations.
Generative AI is good at generating text, but numerical calculations can sometimes have digit errors or mix up targets, leaving uncertainty about entrusting it with monetary or ratio calculations.

This article covers two approaches for performing numerical aggregation deterministically without relying on an LLM, verified with actual machines as of June 2026.
One is a code interpreter that executes Python inside an agent, and the other is running Office Script (TypeScript) for Excel from a flow. Both share the same principle of writing the code used for aggregation in advance and fixing it.
I hope this serves as a reference for those who want to accurately aggregate collected data and turn it into materials.

This article is the 6th installment in a series about building agents in Copilot Studio.
The series as a whole aims for an agent that handles "collection → aggregation → charts → insights → documentation" end-to-end, and this article covers the "aggregation" part.

Target audience: Those who want to perform accurate numerical aggregation in Copilot Studio without relying on an LLM

Series Article List

# Theme Article
Part 1 First Agent Creating Your First Agent
Part 2 Knowledge Trying File-Based Answers with Knowledge
Part 3 Topics, Tools, Flows Building "Actions" with Topics, Tools, and Agent Flows
Part 4 Templates, Autonomous Triggers, Multi-Agent Expanding the Configuration with Templates, Autonomous Triggers, and Multi-Agent
Part 5 Collection (How to Pass Data) Comparing Methods for Giving an Agent KPI Data for Aggregation
Part 6 Aggregation (This article)

In the previous article (Part 5), we compared methods for giving an agent KPI data. This time, as the next step, we move on to methods for performing "aggregation" of that data deterministically.

1. What We're Doing This Time

We perform aggregation (calculations such as averages and year-over-year comparisons) deterministically without relying on an LLM.
In this article, we try this with two approaches.

  • Aggregate quarterly averages and year-over-year comparisons from monthly raw data
  • Approach 1: Aggregate using a code interpreter (Python). Embed Python in the prompt to fix the calculation
  • Approach 2: Write processing in Office Script (TypeScript) and execute it from a Copilot Studio agent flow
  • Organize how to choose between the two approaches

The verification uses the same monthly KPI data (fictional) for three fictional SaaS companies (CloudNova / StreamForge / Datapeak) as before.

2. Why Not Let the LLM Handle Aggregation

Generative AI is good at generating text, but numerical calculations are not always accurate.
Even when instructed to calculate averages or year-over-year comparisons, it can make digit errors or mix up targets.

Since report figures must be accurate by assumption, we leave calculations to deterministic means.
We divide the work so that the LLM is only responsible for "text generation (insights) based on aggregation results" and "insertion," while the calculations themselves are handled by a separate mechanism.

3. Two Approaches to Fixing Aggregation

This article tries two approaches for not having the LLM perform aggregation.
Both share the point of writing the code used for aggregation in advance and fixing it. The differences are "where the calculation runs" and "billing."

  • Approach 1: Code Interpreter (Python). Copilot Studio executes Python inside the agent to aggregate. If you embed the Python code used for aggregation in the prompt, that calculation logic will be executed mostly as-is (as long as the code interpreter doesn't rewrite it), allowing you to fix the calculation. This is a premium feature.
  • Approach 2: Office Script (TypeScript). Write the aggregation processing in Excel's Office Script and execute it from a Copilot Studio agent flow. Since the pre-written code runs as-is, the result is exactly the same every time. It works with standard connectors and doesn't incur premium billing (Copilot Credits) like the code interpreter.

Depicting "where the calculation runs" in a diagram looks like this.
The code interpreter executes Python inside Copilot Studio, and Office Script executes TypeScript inside Excel, and both return aggregation results to the agent.

Perspective Code Interpreter Office Script
Where calculation runs Python executed by Copilot Studio Office Script (TypeScript) executed by Excel
How to write calculation logic Embed Python in the prompt (executed mostly as-is) Write in the script (executed as-is every time)
Required skills Natural language + Python TypeScript
License / Billing Premium (consumes Copilot Credits) Standard (no premium billing. Production flows consume capacity for actions)
How to pass data Attach to chat, or Knowledge Read files on OneDrive/SharePoint via flow
Notes Response may take minutes Script needs to be written in advance

The "no additional cost" for Office Script is a contrast with the code interpreter (premium = consumes Copilot Credits), meaning Copilot Credits are not required. However, when running agent flows in a production channel, even standard connectors consume Copilot Studio capacity for agent flow actions (test execution is excluded).

4. Data Used for Verification

To compare the two approaches on equal footing, we prepare common verification data first.
We use the same monthly KPI (fictional) for three fictional SaaS companies (CloudNova / StreamForge / Datapeak) as before.
Since we need the same period from the previous year to calculate year-over-year comparisons, we prepare an Excel file with monthly KPI for 3 companies × (Q3 FY2024 and Q3 FY2025, both October–December) on a single RawMonthly sheet.
Since aggregation (averages, year-over-year) is left to the deterministic means of each approach, this file contains no aggregation formulas or summaries—only raw data.

Script to generate the raw data Excel (click to expand)
#!/usr/bin/env python3
"""Generate Excel for year-over-year (YoY) aggregation verification.

Raw data to be read by the code interpreter (CI).
Has monthly KPIs for 3 companies × (Q3 FY2024 and Q3 FY2025, both October–December) on a single RawMonthly sheet.
Since aggregation (total, average, YoY) is performed deterministically with CI's Python (pandas),
this file does not include aggregation formulas or summary sheets (only raw data is passed).
"""
from openpyxl import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo

OUT = "kpi-raw-data-yoy.xlsx"

# Raw data (fictional). Q3 FY2024 has lower values before growth, Q3 FY2025 has values that grew from the previous year.
# (Company, Month, ARR (millions of yen), NRR (%), Operating Profit Margin (%))
RAW = [
    ("CloudNova", "2024-10", 1600, 116, 11),
    ("CloudNova", "2024-11", 1625, 117, 11.5),
    ("CloudNova", "2024-12", 1650, 118, 12),
    ("StreamForge", "2024-10", 1160, 105, -1),
    ("StreamForge", "2024-11", 1180, 106, -0.5),
    ("StreamForge", "2024-12", 1200, 107, 0),
    ("Datapeak", "2024-10", 2450, 124, 16.5),
    ("Datapeak", "2024-11", 2500, 125, 17),
    ("Datapeak", "2024-12", 2550, 126, 17.5),
    ("CloudNova", "2025-10", 1900, 119, 13),
    ("CloudNova", "2025-11", 1950, 120, 13.5),
    ("CloudNova", "2025-12", 2000, 121, 14),
    ("StreamForge", "2025-10", 1280, 108, 0.5),
    ("StreamForge", "2025-11", 1300, 109, 1),
    ("StreamForge", "2025-12", 1320, 110, 1.5),
    ("Datapeak", "2025-10", 2750, 127, 18.5),
    ("Datapeak", "2025-11", 2800, 128, 19),
    ("Datapeak", "2025-12", 2880, 129, 19.5),
]

def main() -> None:
    wb = Workbook()
    ws = wb.active
    ws.title = "RawMonthly"
    ws.append(["Company", "Month", "ARR", "NRR", "OPM"])
    for row in RAW:
        ws.append(list(row))
    table = Table(displayName="RawMonthly", ref=f"A1:E{len(RAW) + 1}")
    table.tableStyleInfo = TableStyleInfo(
        name="TableStyleMedium2", showRowStripes=True
    )
    ws.add_table(table)
    wb.save(OUT)
    print(f"saved {OUT}")

if __name__ == "__main__":
    main()

Save the above script as make_excel_yoy.py and run it.

pip install openpyxl
python make_excel_yoy.py
# Output example
saved kpi-raw-data-yoy.xlsx

Opening the generated kpi-raw-data-yoy.xlsx in Excel shows the monthly KPI for 3 companies × 6 months (Q3 2024 and Q3 2025) arranged on the RawMonthly sheet.

kpi-raw-data-yoy.xlsx RawMonthly sheet. Monthly KPI for 3 companies × 6 months arranged
RawMonthly sheet of kpi-raw-data-yoy.xlsx. Monthly KPI (ARR, NRR, OPM) for October–December 2024 and October–December 2025 for the three companies CloudNova / StreamForge / Datapeak are listed. No aggregation formulas or summaries are included; only raw data is passed

When each approach aggregates this file, we judge it as "correctly aggregated" if the results match the following values (pass criteria).
Rounding is 0 decimal places for ARR and NRR, 1 decimal place for operating profit margin (OPM), and 1 decimal place for year-over-year.

Company 2024 ARR Average 2025 ARR Average ARR YoY 2025 NRR Average 2025 OPM Average
CloudNova 1625 1950 +20.0% 120 13.5
StreamForge 1180 1300 +10.2% 109 1.0
Datapeak 2500 2810 +12.4% 128 19.0

The calculation method for each value is as follows.

  • ARR Average, NRR Average, OPM Average: Simple average of the 3 months October–December of that year (same for both 2024 and 2025). For example, CloudNova's 2025 NRR average is (119 + 120 + 121) ÷ 3 = 120, and 2025 OPM average is (13 + 13.5 + 14) ÷ 3 = 13.5. Similarly for 2024: NRR average (116 + 117 + 118) ÷ 3 = 117, OPM average (11 + 11.5 + 12) ÷ 3 = 11.5.
  • ARR Year-over-Year (YoY): (2025 ARR Average ÷ 2024 ARR Average − 1) × 100. For CloudNova: 2025 ARR average (1900 + 1950 + 2000) ÷ 3 = 1950, 2024 ARR average (1600 + 1625 + 1650) ÷ 3 = 1625, so (1950 ÷ 1625 − 1) × 100 = +20.0%.

Both Approach 1 (code interpreter) and Approach 2 (Office Script) use this same kpi-raw-data-yoy.xlsx as-is.

Let's start by looking at Approach 1 (code interpreter).

5. Approach 1: Aggregating with the Code Interpreter

5.1 What Is the Code Interpreter

The code interpreter is a feature that allows an agent to generate and execute Python as needed, enabling aggregation and analysis of structured files like CSV and Excel.
Feature details, billing (premium feature that consumes Copilot Credits), how to pass data, and activation procedures were covered in Part 5.
For details, please refer to Part 5: "Comparing Methods for Giving an Agent KPI Data for Aggregation".
In this article, we aggregate using "attach to chat," the most straightforward method.

5.2 Embedding Python in the Prompt to Fix the Calculation

If you attach an Excel file to the chat and ask it to "aggregate," the code interpreter generates and executes Python to perform the aggregation.
This basic usage was covered in Part 5, so in this article we go a step further and examine whether aggregation can be fixed deterministically.

Since the code interpreter has AI regenerate Python code for each request, there's lingering uncertainty: "the code changes every time even though I'm asking for the same aggregation" and "am I really getting the same calculation?" So we tried embedding the Python code used for aggregation wholesale in the prompt, instructing it to "execute this code as-is without regenerating," to see if the calculation can be fixed.

In the test chat, attach the kpi-raw-data-yoy.xlsx prepared in Chapter 4 and request the following.

Please execute the following Python code in the code interpreter exactly as-is, without regenerating it. The input is the attached Excel (RawMonthly sheet). Please replace INPUT in the code with the path of the attached file. Please show the executed code and output as-is.

import pandas as pd
df = pd.read_excel(INPUT, sheet_name="RawMonthly")
df["Year"] = df["Month"].astype(str).str[:4]
g = df.groupby(["Company","Year"], as_index=False).agg(ARR=("ARR","mean"), NRR=("NRR","mean"), OPM=("OPM","mean"))
order = ["CloudNova","StreamForge","Datapeak"]
rows = []
for c in order:
    a24 = g[(g.Company==c)&(g.Year=="2024")].iloc[0]
    a25 = g[(g.Company==c)&(g.Year=="2025")].iloc[0]
    yoy = (a25.ARR / a24.ARR - 1) * 100
    rows.append([c, round(a24.ARR), round(a25.ARR), f"{round(yoy,1):+}%", round(a25.NRR), round(a25.OPM, 1)])
result = pd.DataFrame(rows, columns=["Company","2024 ARR Average","2025 ARR Average","ARR YoY","2025 NRR Average","2025 OPM Average"])
print(result.to_string(index=False))

I ran this prompt a total of 6 times in my environment. All used the same prompt (same embedded Python).
The results were as follows. Output values were the same all 6 times (matching the pass criteria from Chapter 4), and the calculations did not fluctuate.
On the other hand, the actually executed code differed between runs. The differences appeared in the following 2 places.

  • The line that reads the file: Some runs executed the embedded pd.read_excel(INPUT, sheet_name="RawMonthly") as-is, while other runs converted the Excel to CSV first and rewrote it to pd.read_csv(...) (since the prompt instructs "replace INPUT with the path of the attached file," how to read it is left to the code interpreter).
  • Quote style: In 2 of the 6 runs, the double quotes "..." throughout the entire code were reformatted to single quotes '...'.

Conversely, everything other than these 2 places (the aggregation logic itself, consisting of groupby for averages, year-over-year, rounding, and output) was the same all 6 times. The breakdown of the 6 runs is as follows.

Run Read line Quotes Output
1 pd.read_csv('…RawMonthly.csv') Double Pass criteria
2 pd.read_excel(INPUT, sheet_name=…) Double Pass criteria
3 pd.read_csv(INPUT) Single Pass criteria
4 pd.read_csv(INPUT) Single Pass criteria
5 pd.read_excel(INPUT, sheet_name=…) Double Pass criteria
6 pd.read_excel(INPUT, sheet_name=…) Double Pass criteria

Even in the actual screens of run 1 (left) and run 2 (right), the aggregation result tables match despite having different read lines.

Side-by-side comparison of embedded Python execution results from run 1 (left) and run 2 (right). The read line differs but the aggregation logic and output values are the same
Left is run 1, right is run 2 (2 of the 6 runs). The line reading the file differs (run 1: pd.read_csv(...)/run 2: pd.read_excel(INPUT, ...)), but the aggregation logic and output (CloudNova +20.0%, StreamForge +10.2%, Datapeak +12.4%) are the same

It is difficult to fix the code interpreter's prompt to every single character. Since the content entered in the prompt passes through an LLM, there is a possibility of it being changed.
If you want the executed code to be exactly the same character-for-character every time, a method where pre-written code runs exactly as-is—like Office Script in the next chapter—is more reliable.

5.3 Code Interpreter Note: Responses Take Time

An important thing to keep in mind when aggregating with the code interpreter is the time until response. In my environment, the time required was about 2.5 minutes for the first run and about 3 minutes for the second run. Even so, results were returned because the execution was done directly in the test chat with no time constraints.

In production, this aggregation would be called as a tool via an agent flow, but agent flows have a time constraint of "failure if it takes longer than 2 minutes." The code interpreter, which can take several minutes, is prone to hitting this constraint. Since the response time constraint for agent flows is common to both Office Script and the code interpreter, it is summarized in Chapter 7.

Note that the verification in this article is conducted using the most straightforward test chat.

6. Approach 2: Aggregating with Office Script

Approach 2 is a method of writing aggregation processing in Excel's Office Script (TypeScript) and executing it from a Copilot Studio agent flow.
As noted in 5.2, the code interpreter can mostly fix the calculation logic by embedding it in the prompt, but the file read line is not fixed character-for-character.
If you want to "write the aggregation code once and execute it exactly the same way every time," a method that fixes the code in advance like Office Script is more suitable.
Office Script works with standard connectors, doesn't incur premium billing (Copilot Credits) like the code interpreter, and AI never regenerates the code.
In the tool addition screen of Copilot Studio as well, agent flows are described as "predictable automation that is executed the same way every time," which is a different nature from the code interpreter's "generate every time."

6.1 Writing the Office Script

When it comes to "writing processing in Excel," some people might think of VBA macros. The reason we use Office Script instead of VBA this time is that only Office Script can be called from a flow (a Copilot Studio agent flow). The main differences are as follows.

Perspective Office Script VBA Macro
Execution environment Cloud / Cross-platform (mainly Excel for the web; also available on Windows/Mac desktop) Desktop only (Windows/Mac)
Language TypeScript (JavaScript) VBA

The fundamental difference is that VBA macros are developed for desktop solutions and Office Scripts are designed for secure, cross-platform, cloud-based solutions.

Source: Differences between Office Scripts and VBA macros | Microsoft Learn

Open the aggregation target kpi-raw-data-yoy.xlsx in Excel for the web, and from "Automate" tab → "New Script" → "Create in Code Editor", save the following script (here named KPIQuarterAggYoY).
It reads the RawMonthly sheet, calculates the average of ARR, NRR, and OPM by company × year, calculates the ARR year-over-year comparison, and returns it as CSV-formatted text.

Office Script (click to expand)
function main(workbook: ExcelScript.Workbook): string {
  // Read RawMonthly (Company, Month, ARR, NRR, OPM), average by company × year, and deterministically aggregate ARR YoY
  const values = workbook.getWorksheet("RawMonthly").getUsedRange().getValues();
  const map: { [k: string]: { arr: number[]; nrr: number[]; opm: number[] } } = {};
  for (let i = 1; i < values.length; i++) {
    const company = String(values[i][0]);
    const year = String(values[i][1]).substring(0, 4);
    const key = company + "|" + year;
    if (!map[key]) map[key] = { arr: [], nrr: [], opm: [] };
    map[key].arr.push(Number(values[i][2]));
    map[key].nrr.push(Number(values[i][3]));
    map[key].opm.push(Number(values[i][4]));
  }
  const mean = (xs: number[]) => xs.reduce((a, b) => a + b, 0) / xs.length;
  let out = "Company,2024ARR Average,2025ARR Average,ARR YoY,2025NRR Average,2025OPM Average\n";
  for (const c of ["CloudNova", "StreamForge", "Datapeak"]) {
    const a24 = mean(map[c + "|2024"].arr);
    const a25 = mean(map[c + "|2025"].arr);
    const yoy = (a25 / a24 - 1) * 100;
    out += `${c},${a24.toFixed(0)},${a25.toFixed(0)},${yoy >= 0 ? "+" : ""}${yoy.toFixed(1)}%,`
         + `${mean(map[c + "|2025"].nrr).toFixed(0)},${mean(map[c + "|2025"].opm).toFixed(1)}\n`;
  }
  return out;
}

The above code is the complete script.
Office Script does not have import statements like Python. main(workbook: ExcelScript.Workbook) is the sole entry point, and the ExcelScript API for operating Excel is available from the start, so no additional loading is required.

Each script must contain a main function with the ExcelScript.Workbook type as its first parameter. When the function runs, the Excel application invokes the main function by providing the workbook as its first parameter.

Source: Fundamentals for Office Scripts in Excel | Microsoft Learn

After saving the script and clicking "Run," in my environment it displayed "Script ran successfully," confirming that the aggregation logic works.

6.2 Running the Script from an Agent Flow

From the agent's "Add a tool" → "Agent flow", create a new flow.
The template comes with the trigger "When an agent calls a flow" and "Respond to the agent", so insert "Run script" (Excel Online (Business) connector) between them.

  • Run script: Location = OneDrive for Business, Document Library = Documents, File = /kpi-raw-data-yoy.xlsx, Script = KPIQuarterAggYoY
  • Respond to the agent: Add one text output result, and select the return value (result) of "Run script" from "Dynamic content" (in formula: body/result)

Agent flow configuration. Trigger → Run script → Respond to the agent
Trigger for agent calling the flow (Skills) → "Run script" (Location = OneDrive for Business, File = /kpi-raw-data-yoy.xlsx, Script = KPIQuarterAggYoY) → "Respond to the agent" passes the script's return value (result) to output result

Save and publish the created flow (this is publishing the flow itself, not the agent). Once published, it is added to the agent as a tool.

6.3 Testing

Request from the test chat to run this flow.

Please run the "KPIScriptAgg Quarterly Aggregation (Office Script · Deterministic)" tool and display the returned aggregation result text as-is (without processing) in a code block.

On the first run, allow the connection to Excel Online (Business) (this is a connection using your own credentials).
In my environment, the agent called the flow and the CSV returned by Office Script came back as-is. The values match the pass criteria from Chapter 4.

Office Script flow execution result (1st run). The agent ran the flow and returned the aggregation result CSV
1st run. After allowing the connection to Excel Online, the flow ran and the aggregation CSV returned by Office Script (CloudNova +20.0%, StreamForge +10.2%, Datapeak +12.4%) came back as-is

Running it again with a new request returned exactly the same values. Since the pre-written script runs as-is, the results are naturally stable.

Office Script flow execution result (2nd run). The same values as the 1st run are returned
2nd run. Running the same tool returned exactly the same CSV as the 1st run. Unlike the code interpreter, the code is never regenerated

On the 2nd run, results were returned within about ten seconds of sending, well within the agent flow response time constraint (synchronous response times out at 2 minutes, recommended within 100 seconds; details in the next chapter).

The strength of Office Script is not "no time limit" but rather the ability to execute fixed code the same way every time. The deterministic aggregation in this case is inherently lightweight and was well within the default synchronous response time (about ten seconds).

7. Agent Flow Response Time (Common Notes)

Both the code interpreter and Office Script, in production, involve the agent calling an "agent flow" as a tool (the verification in this article was done in test chat, but actual operation goes through a flow).
Since this flow has "response time" constraints common to both approaches, I'll summarize them here.

When an agent flow is called as a tool, it defaults to a synchronous response. A synchronous response means the agent waits on the spot for the flow's response. Since the agent can't wait indefinitely, it fails if it exceeds 2 minutes. Furthermore, Respond to the agent recommends a response within 100 seconds.

By default, an agent or app initiates an agent flow that fails if it takes longer than two minutes to respond to the calling agent or app.

Source: Speed up agent flow execution with express mode (preview) | Microsoft Learn

Respond to the agent within the 100 second action limit. Optimize the flow logic, queries, and the amount of data returned so that a typical run is below this 100 second limit.

Source: Modify an existing flow to use with an agent | Microsoft Learn

In other words, if you put heavy processing like the code interpreter (which took 2.5–3 minutes in my environment) directly in a synchronous flow, it risks timing out with the 2-minute constraint. On the other hand, lightweight aggregation like today's Office Script finishes in about ten seconds with no issues. Note that "using a flow doesn't remove the time limit".

7.2 Heavy Subsequent Processing Can Be Placed After the Response (Up to 30 Days)

While the response itself must be returned within 2 minutes (recommended 100 seconds), this doesn't mean all processing must fit within 2 minutes.
Processing placed after Respond to the agent can continue in the background for up to 30 days after the response is returned.
However, the results of that subsequent processing cannot be included in the response (because the response has already been returned).

Respond to the agent within the 100 second action limit. ... Actions in the flow that need to run longer can be placed after the Respond to the agent action to continue to run up to the flow run duration limit of 30 days.

Source: Modify an existing flow to use with an agent | Microsoft Learn

Flow diagram showing the flow being called, finishing up to Respond to the agent within 2 minutes and returning the response, with subsequent processing continuing in the background for up to 30 days
Call the flow as a tool, complete "aggregation/file operations (Office Script/code interpreter) → Respond to the agent" within 2 minutes (recommended 100 seconds), and return the result. Subsequent processing placed after Respond to the agent (e.g., Teams notification, log writing) continues in the background for up to 30 days after responding to the user

8. How to Choose Between the Two Approaches

Having run both approaches on actual machines, the choice can be organized around whether "it's okay to have AI generate the code every time."

  • If you want to flexibly ask for aggregation in natural language, like year-over-year comparisons, forecasts, or table joins, choose Approach 1 (code interpreter). It can calculate in Python and generate aggregation tables, charts, and Excel files. However, it incurs premium billing and responses can take minutes. Embedding code in the prompt fixes the content of the calculation (aggregation logic) and output (6 trials all had matching output). However, it's not possible to fix every single character of the execution code, such as the file read line or quote style.
  • If you want to write the aggregation logic once and execute it exactly the same way every time, choose Approach 2 (Office Script). Since TypeScript-written processing is executed from a flow, there are no premium billing (Copilot Credits) charges and the code is never regenerated.

Both approaches share the point of not having the LLM perform calculations, and the content of the calculation can be fixed.
The difference is that the code interpreter has AI assemble the code for each request, so while embedding can fix the content of the calculation and output, the read line and quote style can vary from run to run. In contrast, Office Script runs the pre-written code exactly as-is, not changing a single character.
For the ultimate end-to-end agent goal, a practical approach would be to split the roles: Office Script for lightweight fixed aggregation, and the code interpreter for flexible analysis.

9. Summary

By performing aggregation deterministically without relying on the LLM, the accuracy of report figures can be maintained.
This article tried two approaches.
Office Script runs pre-written TypeScript exactly as-is, always returning the same result for the same input.
The code interpreter, when the Python code used for aggregation is embedded in the prompt, can fix the content of the calculation (aggregation logic) and output, confirmed to return the same values every time across 6 trials. However, it's not possible to fix every single character of the execution code, such as the file read line or quote style.

The axis for choosing is simple: if the aggregation content is determined and you want to always execute with the same processing, choose Office Script with no premium billing; if flexible analysis like year-over-year comparisons or forecasts is needed, choose the code interpreter (premium).
Note that the time constraint for synchronous responses (times out at 2 minutes, recommended within 100 seconds) is common to both the code interpreter and agent flows, and "there's no limit with a flow" is not the case.

In this verification, both approaches returned the same values.
However, this match is simply because both were written with the same aggregation logic and the same rounding—since they are separate implementations (Python and Office Script), if the writing differs, the results can diverge.
"Because there are two means, the values will always match" is not guaranteed, so if using both in parallel, it's safer to cross-check the results to confirm they match.
In any case, we confirmed the division of labor: leave the calculations themselves to deterministic means, and have the LLM handle only the insertion and text generation for insights.
Next time, we'll turn these aggregated values into charts.

References

Share this article