[Copilot Studio] I tried answer generation based on files in Knowledge: A comparison with Web search off

[Copilot Studio] I tried answer generation based on files in Knowledge: A comparison with Web search off

I will introduce how to create an agent that can provide grounded answers based on your company's materials using the Knowledge feature of Copilot Studio. I actually uploaded KPI data from a SaaS company to test it, and also compared the differences between turning web search on and off.
2026.06.15

This page has been translated by machine translation. View original

Introduction

Hello, I'm Keema.

After creating an agent, the next request that comes up is "I want it to answer based on our company's materials." An agent that only answers from general knowledge cannot respond to internal figures or the contents of proprietary documents. In Copilot Studio, you can add "Knowledge" to enable responses grounded in uploaded materials (grounding).

In this article, I cover the Knowledge feature of Copilot Studio, verified with hands-on testing as of June 2026. I uploaded KPI data (docx) for three fictional SaaS companies and confirmed whether grounded responses based on the materials are possible. I also compare the difference when Web search, which is on by default, is turned off to restrict answers to only the materials. I hope this serves as a reference for those who want to build an agent that responds based on their company's materials.

This article is the second installment in a series on building agents with Copilot Studio.
The series aims to build an agent that handles "collection → aggregation → graphs → insights → documentation" end-to-end, and this article covers the first step: "collection (Knowledge)."

Target audience: Those who want to build an agent in Copilot Studio that answers based on their company's materials (files)

Series Article List

# Theme Article
Part 1 First Agent Creating Your First Agent
Part 2 Knowledge (this article) (this article)

In the previous article (Part 1), we created an agent and tested it with instructions. This time, we load company materials into that agent to enable grounded responses.

What We'll Do This Time

  • Upload a KPI comparison report (docx) for 3 fictional SaaS companies as Knowledge
  • Wait for indexing into Dataverse and test grounded responses
  • Compare response sources with Web search on vs. off

What Is Knowledge?

Knowledge is an information source that an agent can use as the basis for its responses. There are multiple ways to add it.

  • File upload: Supports text-based formats such as Word (doc/docx), Excel (xls/xlsx), PowerPoint (ppt/pptx), PDF, text (txt/md/log), HTML, CSV, XML, JSON, YAML, etc. Images, audio, video, and executable files are not supported; images can only be handled when they are embedded in PDFs with annotations (alt-text) (alt-text is text that describes the content of an image)
  • External source connections: SharePoint / OneDrive / Public websites / Dataverse (tables) / Enterprise data via connectors (organizational data indexed by Microsoft Search, including knowledge articles from Salesforce, ServiceNow, Confluence, Zendesk, etc.)

The file upload limit is a maximum of 512MB per file and a maximum of 500 files per agent. However, the actual number that can be uploaded also depends on the Dataverse file storage capacity of the environment.

Image, video, executables, and audio files can't be used as uploaded documents.

Images are only supported when they're embedded in PDF files.

Source: Upload files as a knowledge source | Microsoft Learn

When a file is uploaded, it is stored in Dataverse and automatically indexed. During testing, the agent searches the Knowledge, and responds while citing the matched content as reference sources.

Copilot Studio agents require Dataverse search to use this knowledge source. If you can't add a Dataverse-enabled file to an agent, ask your administrator to turn on Dataverse search in your environment.

Source: Unstructured data as a knowledge source | Microsoft Learn

This time, we'll try the simplest approach: "file upload."

Step 1: Prepare the Knowledge File

For verification, I prepared a docx file summarizing quarterly KPI data for three fictional SaaS companies.

Company Name ARR (million yen) NRR (%) Churn Rate (monthly %) Operating Profit Margin (%) Number of Customers (companies) ARPA (10K yen/month)
CloudNova 1,800 118 1.2 12.5 1,250 12.0
StreamForge 1,150 104 2.1 -3.0 2,400 4.0
Datapeak 2,600 126 0.8 18.4 640 33.9

All data is fictional. It has no relation whatsoever to any real companies.

Step 2: Upload the File

From the Knowledge tab, click "Add Knowledge" and select file upload. You can select files by drag and drop or by clicking "Browse from device."

File upload dialog

Select 2025Q2-saas-kpi-summary.docx and press "Add to agent."

Step 3: Confirm Indexing is Complete

Returning to the Knowledge tab, the uploaded file appears in the list. While a spinner is shown in the "Status" column, indexing is in progress. After waiting a few minutes, it changes to a green checkmark, and the agent is ready to search and cite it.

Knowledge list (indexing complete)

The list displays "Name, Type, Available to, Usage, Last modified date, Status." "Available to" indicates which agents can use this Knowledge.

Step 4: Test Grounded Responses

In the test chat, ask a question about the content of the uploaded file.

What is Datapeak's operating profit margin?

Grounded response (with reference source)

It accurately answered Datapeak's operating profit margin of 18.4%, and also provided a comparison table and comments with CloudNova (12.5%) and StreamForge (-3.0%). At the bottom of the response, 2025Q2-saas-kpi-summary.docx is explicitly shown as a "Referenced Source."

Grounded responses based on file content are working correctly.

Note on Web Search Being On

Here I noticed something. In my environment, Web search was enabled from right after creation. When checking the search activity, I found that not only the docx but also real financial materials publicly available on the Web (such as public company earnings presentation materials) were included as candidate reference sources.

This did not affect the final response this time, but in situations where you want responses based only on internal data, there is a risk of unintended external information being mixed in.

Step 5: Turn Off Web Search and Compare

You can toggle Web search on/off in the Knowledge section on the Overview page (or in the "Generative AI" settings). This Web search ("Use information from the web") setting is only available for agents with generative orchestration enabled.

You can find the Use information from the web setting on the Generative AI settings page. You can also find the Web Search setting in the Knowledge section of the agent's Overview page. This setting requires that the agent has generative orchestration turned on.

Source: Knowledge sources summary | Microsoft Learn

Web search toggle turned off

Turn off the "Web search" toggle. In my environment, this change took effect immediately (no save operation was required).

Test the Same Question with Web Search Off

With Web search turned off, test the same question: "What is Datapeak's operating profit margin?"

Grounded response (Web search off — docx only referenced)

The referenced sources were limited to only the docx file. No external Web materials were referenced at all. The response content is also based solely on the data from the uploaded materials.

Comparison: Web Search On vs. Off

Web Search On Web Search Off
Reference Sources Uploaded file + Public Web Uploaded file only
External data mixing Possible None
Use case When you want broad responses including general knowledge When you want responses based only on internal materials or specific data

In cases where mixing external information would be problematic — such as with confidential internal data or fictional data like in this example — turning Web search off is the safer option. Conversely, if you want to use public information as a supplement, you can leave it on.

Stumbling Points This Time

  1. Indexing takes a few minutes: If you test immediately after uploading, the Knowledge won't be found and responses will be based on general knowledge. Make sure the "Status" in the Knowledge list shows a green checkmark before testing.
  2. Pay attention to supported file formats: Only text-based files are supported. Images, video, audio, and executable files cannot be uploaded (images are only supported when embedded in PDFs with annotations (alt-text)).
  3. Web search may be enabled: In my environment, Web search was enabled from right after creation. If you want responses based only on internal materials, explicitly turn it off. Note that this setting is available when generative orchestration is enabled.

Summary

Simply by uploading a single file, the agent is now able to return responses grounded in its content. The "Referenced Sources" are explicitly shown, making it immediately clear which materials a response is based on. Being able to control the range of response sources by toggling Web search on or off is also practically useful.

References

Share this article