I tried integrating data from Amazon FSx for NetApp ONTAP with Bedrock Knowledge Bases via S3 Access Point

I tried integrating data from Amazon FSx for NetApp ONTAP with Bedrock Knowledge Bases via S3 Access Point

Let's integrate various AWS services using S3 Access Points
2026.04.19

This page has been translated by machine translation. View original

Moving data to Amazon FSx for NetApp ONTAP makes me want to integrate with other AWS services

Hello, this is nonpi (@non____97).

Have you ever thought about wanting to integrate with other AWS services after migrating data to Amazon FSx for NetApp ONTAP (FSxN)? I have.

If you've gone through the effort of migrating your file server from on-premises to AWS, you'll naturally be interested in its compatibility with other AWS services like AI-related offerings.

FSxN has a feature to attach S3 Access Points to volumes.

https://dev.classmethod.jp/articles/amazon-fsx-netapp-ontap-s3-access/

This means that by using this feature, you can integrate FSxN with any service that works with S3.

As a test, I'll try connecting it with Bedrock Knowledge Bases to enable asking questions about files stored in FSxN.

Quick Summary

  • Data in Amazon FSx for NetApp ONTAP can be integrated with Bedrock Knowledge Bases via S3 Access Point
    • Multimodal search is also possible
  • When creating an S3 Access Point for Amazon FSx for NetApp ONTAP, including backslash (\) in Windows usernames causes errors
    • Domain Administrator users can't effectively be used
    • Create alternate domain users instead

My Implementation

Test Environment

Here's my test environment:

Test environment architecture diagram.png

I prepared two types of volumes in the FSxN file system - one with UNIX security style and one with NTFS. I'll query information about files written via NFS and SMB using Bedrock Knowledge Bases.

Here are the volumes:

7.vol.list.png

I created the FSxN file system and AD DC through the management console.

The following articles are helpful for AD DC configuration and domain joining:

https://dev.classmethod.jp/articles/amazon-fsx-for-netapp-ontap-workgroup-cifs-server-to-join-domain/

https://dev.classmethod.jp/articles/amazon-fsx-netapp-ontap-single-availability-zone-deployment/

The OU for the FSxN SMB server's computer objects and service accounts is OU=FSxN,DC=corp,DC=non-97,DC=net, and the service account name is FSxServiceAccount.

> New-ADOrganizationalUnit -Name FSxN -Path "DC=corp,DC=non-97,DC=net" -ProtectedFromAccidentalDeletion $True

> New-ADUser `
  -Name "FSxServiceAccount" `
  -UserPrincipalName "FSxServiceAccount@corp.non-97.net" `
  -Accountpassword (Read-Host -AsSecureString "AccountPassword") `
  -Path "OU=FSxN,DC=corp,DC=non-97,DC=net" `
  -PasswordNeverExpires $True `
  -Enabled $True

One important note when referencing these articles: you need to grant the service account permission to "set msDS-SupportedEncryptionTypes on computer objects" when delegating OU management. This wasn't required before but has become mandatory recently.

Without this permission, you'll get the following error when trying to join the SMB server to the domain:

::> cifs create -vserver svm -cifs-server SVM -domain corp.non-97.net -ou OU=FSxN,DC=corp,DC=non-97,DC=net

In order to create an Active Directory machine account for the CIFS server, you must supply the name and password of a
Windows account with sufficient privileges to add computers to the "OU=FSxN,DC=corp,DC=non-97,DC=net" container within
the "CORP.NON-97.NET" domain.

Enter the user name: FSxServiceAccount

Enter the password:

Error: Machine account creation procedure failed
  [    23] Loaded the preliminary configuration.
  [    75] Created a machine account in the domain
  [    75] SID to name translations of Domain Users and Admins
           completed successfully
  [    76] Successfully connected to ip 10.0.1.11, port 88 using TCP
  [    78] Successfully connected to ip 10.0.1.11, port 464 using TCP
  [   124] Kerberos password set for 'SVM$@CORP.NON-97.NET' succeeded
  [   124] Set initial account password
**[   143] FAILURE: Unable to set machine account attribute
**         'msDS-SupportedEncryptionTypes': Insufficient access
  [   147] Deleted existing account
           'CN=SVM,OU=FSxN,DC=corp,DC=non-97,DC=net'

Error: command failed: Failed to create the Active Directory machine account "SVM". Reason: LDAP Error: The user has
       insufficient access rights.

Since building this repeatedly would be cumbersome, I used AWS CDK to create resources for Bedrock Knowledge Bases, vector buckets, and S3 buckets for multimodal storage.

The code I used is available in this GitHub repository:

https://github.com/non-97/aws-cdk-bedrock-kb-with-s3-access-point

Attaching S3 Access Points to FSxN Volumes

Let's attach S3 Access Points to FSxN volumes. I'll do the same work for both NTFS and UNIX security style volumes.

First, for the NTFS volume:

Since it's NTFS, I specified Windows for the user type. I also specified Administrator as the username to access all files in FSxN. The access policy can be left empty since access is from the same account and will be authorized via IAM role.

1.non-97-fsxn-vol-ntfs.png

This resulted in an error: Failed to lookup the provided user in ONTAP.

2.Failed to lookup the provided user in ONTAP..png

The AWS official documentation shows an ONTAP CLI command to check if a user exists:

vserver services access-check authentication show-creds -node FsxId0fd48ff588b9d3eee-01 -vserver svm_name -unix-user-name root -show-partial-unix-creds true

https://docs.aws.amazon.com/ja_jp/fsx/latest/ONTAPGuide/troubleshooting-access-points-for-fsxn.html#name-mapping

When I tried running it, it indeed showed an error:

::> set -privilege diagnostic

::*> vserver services access-check authentication show-creds -node FsxId0f6a3b21513fb2e32-01 -vserver svm -win-name Administrator

Vserver: svm (internal ID: 4)

Error: Get user credentials procedure failed
  [  0 ms] Using cached S-1-5-21-200422539-3497150653-4195399322-500
           to 'SVM\Administrator' mapping
**[     0] FAILURE: Account is disabled for local user
**         'Administrator'
  [     0] Could not get credentials for Windows user
           'Administrator' or SID
           'S-1-5-21-200422539-3497150653-4195399322-500'

Error: command failed: Failed to get user credentials. Reason: "cifs: user or machine account is disabled".

Based on the error, it seems Administrator alone is being interpreted as the SVM's local Administrator, which is disabled.

As a solution, I'll try explicitly specifying the domain Administrator user as CORP\Administrator.

::*> vserver services access-check authentication show-creds -node FsxId0f6a3b21513fb2e32-01 -vserver svm -win-name CORP\Administrator

 UNIX UID: root <> Windows User: CORP\Administrator (Windows Domain User)

 GID: daemon
 Supplementary GIDs:
  daemon

 Primary Group SID: CORP\Domain Users (Windows Domain group)

 Windows Membership:
  CORP\Domain Users (Windows Domain group)
  CORP\Domain Admins (Windows Domain group)
  CORP\Group Policy Creator Owners (Windows Domain group)
  CORP\Enterprise Admins (Windows Domain group)
  CORP\Schema Admins (Windows Domain group)
  CORP\Denied RODC Password Replication Group (Windows Alias)
  Service asserted identity (Windows Well known group)
  BUILTIN\Users (Windows Alias)
  BUILTIN\Administrators (Windows Alias)
 User is also a member of Everyone, Authenticated Users, and Network Users

 Privileges (0x22b7):
  SeBackupPrivilege
  SeRestorePrivilege
  SeTakeOwnershipPrivilege
  SeSecurityPrivilege
  SeChangeNotifyPrivilege

It appears to recognize the user properly now.

Let's attach the S3 Access Point using this:

3.Retry with CORPdministrator.png

4. non-97-fsxn-vol-ntfs creation completed.png

This time it completed successfully. The creation process took only about 10 seconds.

I'll also do this for the UNIX volume. The username is root.

5.non-97-fsxn-vol-unix.png
6. non-97-fsxn-vol-unix creation completed.png

This was also created successfully.

For reference, checking with the same ONTAP CLI command shows:

# Check NSS
::*> vserver services name-service ns-switch show
                               Source
Vserver         Database       Order
--------------- ------------   ---------
svm             hosts          files,
                               dns
svm             group          files
svm             passwd         files
svm             netgroup       files
svm             namemap        files
5 entries were displayed.

# Check root user
::*> vserver services access-check authentication show-creds -node FsxId0f6a3b21513fb2e32-01 -vserver svm -unix-user-name root  -show-partial-unix-creds true

 UNIX UID: root <>

 GID: daemon
 Supplementary GIDs:
  daemon

# Check with root UID
::*> vserver services access-check authentication show-creds -node FsxId0f6a3b21513fb2e32-01 -vserver svm -uid 0  -show-partial-unix-creds true

 UNIX UID: root <>

 GID: daemon
 Supplementary GIDs:
  daemon

# Check with a non-existent user
::*> vserver services access-check authentication show-creds -node FsxId0f6a3b21513fb2e32-01 -vserver svm -unix-user-name ssm-user

Vserver: svm (internal ID: 4)

Error: Acquire UNIX credentials procedure failed
  [  1 ms] Entry for user-name: ssm-user not found in the current
           source: FILES. Entry for user-name: ssm-user not found in
           any of the available sources
**[     2] FAILURE: Unable to retrieve UID for UNIX user ssm-user

Error: command failed: Failed to resolve user name to a UNIX ID. Reason: "SecD Error: object not found".

Creating Bedrock Knowledge Bases Resources with AWS CDK

I'll create Bedrock Knowledge Bases resources using AWS CDK.

I'll specify the two S3 Access Points I created earlier:

./config.ts
export const appConfig: AppConfig = {
  vectorBucketName: "non-97-bedrock-kb-vectors",
  s3AccessPoints: [
    {
      alias:
        "arn:aws:s3:::non-97-fsxn-vol-zsdzd5qsozmrdsjdkxp5sj6syerzquse1b-ext-s3alias",
      arn: "arn:aws:s3:us-east-1:<AWSAccountID>:accesspoint/non-97-fsxn-vol-unix",
    },
    {
      alias:
        "arn:aws:s3:::non-97-fsxn-vol-ckrnrkqm9ufbcc7daz8tx91qajtk1use1b-ext-s3alias",
      arn: "arn:aws:s3:us-east-1:<AWSAccountID>:accesspoint/non-97-fsxn-vol-ntfs-fsxadmin",
    },
  ],
  syncSchedule: "rate(1 hour)",
};

I'm specifying both the S3 Access Point alias and ARN because of format validation requirements for Bedrock Knowledge Bases data sources and IAM policy Resource statements.

Ideally, I'd prefer to use just one format. However, IAM policies don't allow specifying aliases as shown below:

{
    "Version":"2012-10-17",
    "Statement": [
    {
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::123456789012:user/Jane"
        },
        "Action": ["s3:GetObject", "s3:PutObject"],
        "Resource": "arn:aws:s3:us-west-2:123456789012:accesspoint/my-access-point/object/Jane/*"
    }]
}

Excerpt from: Configuring IAM policies for using access points - Amazon Simple Storage Service

This is also mentioned in the FSxN documentation:

Access denied by default S3 Access Point permissions in automatically created service roles

Some S3-integrated AWS services create custom service roles with attached permissions tailored for specific use cases. When specifying S3 access point aliases as S3 resources, the attached permissions may include access points in bucket ARN format (e.g., arn:aws:s3:us-east-1:1234567890:accesspoint/my-fsx-ap) rather than access point ARN format (e.g., arn:aws:s3:::my-fsx-ap-foo7detztxouyjpwtu8krroppxytruse1a-ext-s3alias). To resolve this, modify the policy to use the access point ARN.

Troubleshooting S3 access point issues - FSx for ONTAP

On the other hand, when specifying an S3 bucket in Bedrock Knowledge Bases data sources, the S3 Access Point ARN format doesn't match the pattern and causes an error:

BucketArn

The Amazon Resource Name (ARN) of the S3 bucket that contains your data.

Required: Yes

Type: String

Pattern: ^arn:aws(-cn|-us-gov|-eusc|-iso(-[b-f])?)?:s3:::[a-z0-9][a-z0-9.-]{1,61}[a-z0-9]$

Minimum: 1

Maximum: 2048

Update requires: No interruption

AWS::Bedrock::DataSource S3DataSourceConfiguration - AWS CloudFormation

Syncing Data

Now I'll sync the data.

The NTFS-style volume failed with Call to Amazon S3 Source did not succeed, while the UNIX one succeeded.

  • Syncing with the S3 Access Point for NTFS-style volume:

    8.Call to Amazon S3 Source did not succeed.png

  • Syncing with the S3 Access Point for UNIX-style volume:

    9.Unix side data sync worked.png

After investigating, I found this was because the Windows username in the S3 Access Point contained a backslash (\).

Since Administrator didn't work as seen earlier, I'll create a domain user belonging to the delegated file system administrators group and create an S3 Access Point without specifying the domain's NetBIOS name.

For more on delegated file system administrators groups, refer to this article:

https://dev.classmethod.jp/articles/amazon-fsx-for-netapp-ontap-delegated-file-system-administrators-group/

Here's how I created the domain user and security group:

> New-ADUser `
  -Name "FSxAdmin" `
  -UserPrincipalName "FSxAdmin@corp.non-97.net" `
  -Accountpassword (Read-Host -AsSecureString "AccountPassword") `
  -Path "OU=FSxN,DC=corp,DC=non-97,DC=net" `
  -PasswordNeverExpires $True `
  -Enabled $True

> New-ADGroup `
  -Name FSxAdminGroup `
  -GroupCategory Security `
  -GroupScope Global `
  -Path "OU=FSxN,DC=corp,DC=non-97,DC=net"

> Get-ADGroupMember -Identity FSxAdminGroup

distinguishedName : CN=FSxAdmin,OU=FSxN,DC=corp,DC=non-97,DC=net
name              : FSxAdmin
objectClass       : user
objectGUID        : 60c64b2c-c96c-4c63-843a-d8b175d33ec0
SamAccountName    : FSxAdmin
SID               : S-1-5-21-2619605940-2058633766-932412019-1109

To make the security group part of the delegated file system administrators group, I added it to BUILTIN\Administrators:

::*> cifs users-and-groups local-group add-members -vserver svm -group-name BUILTIN\Administrators -member-names  CORP\FSxAdminGroup

::> cifs users-and-groups local-group show-members
Vserver        Group Name                   Members
-------------- ---------------------------- ------------------------
svm            BUILTIN\Administrators       SVM\Administrator
                                            CORP\Domain Admins
                                            CORP\FSxAdminGroup
               BUILTIN\Guests               CORP\Domain Guests
               BUILTIN\Users                CORP\Domain Users
3 entries were displayed.

Now I'll check if the created domain user FSxAdmin can be recognized without concatenating the domain NetBIOS name with a backslash:

::*> vserver services access-check authentication show-creds -node FsxId0f6a3b21513fb2e32-01 -vserver svm -win-name FSxAdmin -show-partial-unix-creds true

 UNIX UID: root <> Windows User: CORP\FSxAdmin (Windows Domain User)

 GID: daemon
 Supplementary GIDs:
  daemon

 Primary Group SID: CORP\Domain Users (Windows Domain group)

 Windows Membership:
  CORP\FSxAdminGroup (Windows Domain group)
  CORP\Domain Users (Windows Domain group)
  Service asserted identity (Windows Well known group)
  BUILTIN\Administrators (Windows Alias)
  BUILTIN\Users (Windows Alias)
 User is also a member of Everyone, Authenticated Users, and Network Users

 Privileges (0x22b7):
  SeBackupPrivilege
  SeRestorePrivilege
  SeTakeOwnershipPrivilege
  SeSecurityPrivilege
  SeChangeNotifyPrivilege

It can be recognized properly.

Now I'll create the S3 Access Point again:

10.FSxAdmin.png

Creation successful:

11.non-97-fsxn-vol-ntfs-fsxadmin.png

Now I'll try syncing with this:

12.Data sync now works.png

Syncing worked! Success!

After synchronization, I found that objects with keys following the pattern aws/bedrock/knowledge_bases/<knowledge base ID>/<data source ID>/ were created in the S3 bucket specified for multimodal storage, one for each image file.

Multimodal storage.png

Opening 18191294-f1e3-4786-a274-150038b7aeeb.png shows it's a normal image file:

30.18191294-f1e3-4786-a274-150038b7aeeb.png.png

The AWS official documentation explains that images are stored in multimodal storage:

Nova Multimodal Embeddings

Required: You must configure a multimodal storage destination. This destination stores copies of your multimedia files for retrieval and ensures availability even if source files are modified or deleted.

Prerequisites for multimodal knowledge bases - Amazon Bedrock

By the way, it's recommended to allocate a separate S3 bucket for multimodal storage from your data source:

Multimodal storage destination configuration

When configuring your multimodal storage destination, consider the following:

Use separate buckets (recommended): Configure different Amazon S3 buckets for your data source and multimodal storage destination. This provides the simplest setup and avoids potential conflicts.

If using the same bucket: You must specify an inclusion prefix for your data source that limits which content is ingested. This prevents re-ingesting extracted media files.

Avoid "aws/" prefix: When using the same bucket for both data source and multimodal storage destination, do not use inclusion prefixes starting with "aws/" as this path is reserved for extracted media storage.

Operation Check

Let's perform an operational check.

I'll place the following text file on a volume with NTFS security style via SMB.

13.non-97.txt.png

Now let's synchronize again. Since we have the opportunity, let's do it via Step Functions.

14.同期.png

When checking the synchronization history, I confirmed that the file I just placed was synchronized.

15.追加されていた.png

Let's ask a question.

16.回答できた.png

There was an answer based on the file content.

Since we're using Amazon Nova Multimodal Embeddings as the embedding model, let's also perform image searches.

For details on Amazon Bedrock Knowledge Bases multimodal search, please see the following article.

https://dev.classmethod.jp/articles/kb-multimodal-retrieval-reinvent2025/

I placed photos of koalas, elephants, and other animals I took at the zoo on the FSxN volume.

17.画像.png

After synchronizing, some files failed to sync as shown below.

Error warning
Encountered error: Unknown failure code: UNKNOWN [Files: null, null]. Call to Customer Source did not succeed.
Encountered error: Ignored 2 files as their file format was not supported. [Files: s3://non-97-fsxn-vol-ckrnrkqm9ufbcc7daz8tx91qajtk1use1b-ext-s3alias/IMG_2970.jpeg, s3://non-97-fsxn-vol-ckrnrkqm9ufbcc7daz8tx91qajtk1use1b-ext-s3alias/IMG_3540.jpeg]. Call to Customer Source did not succeed.
Encountered error: Ignored 1 files as their file format was not supported. [Files: s3://non-97-fsxn-vol-ckrnrkqm9ufbcc7daz8tx91qajtk1use1b-ext-s3alias/IMG_1493.jpeg]. Call to Customer Source did not succeed.

23.データの同期.png

These files fail no matter how many times I synchronize, so there might be some cause.

It's a mystery since they don't seem to violate the various limitations mentioned in the documentation, such as image file size.

When you connect to a supported data source, the content is ingested into your knowledge base.

If you use Amazon S3 to store your files or your data source includes attached files, then you first must check that each source document file adheres to the following:

  • The source files are of the following supported formats:
Format Extension
Plain text (ASCII only) .txt
Markdown .md
HyperText Markup Language .html
Microsoft Word document .doc/.docx
Comma-separated values .csv
Microsoft Excel spreadsheet .xls/.xlsx
Portable Document Format .pdf
  • Each file size doesn't exceed the quota of 50 MB.

If you use an Amazon S3 or custom data source, you can use multimodal data, including JPEG (.jpeg) or PNG (.png) images or files that contain tables, charts, diagrams, or other images.

Note

The maximum size of .JPEG and .PNG files is 3.75 MB.

Prerequisites for your Amazon Bedrock knowledge base data - Amazon Bedrock

When I asked "find koala images," it indeed returned only koala images.

18.コアラの画像探して.png

Next, when I asked "Are there elephants?", it returned elephant images but also koala images. It seems the gray color makes identification difficult.

19.ゾウいる.png

When I asked for "bird images", it returned bird images, koala images, and text files.

20.鳥の画像.png

I'd like to improve accuracy with reranking, but currently, rerank models don't support multimodal content.

Reranker model limitations: Reranker models are not supported for multimodal content

Troubleshooting multimodal knowledge bases - Amazon Bedrock

To improve accuracy, it seems we need to select a foundation model rather than the default parser.

When I asked for "animal images," it returned various animal images.

21.動物の画像.png

Let me also query about the contents of a PDF file. The data I've saved is from the following presentation materials converted to PDF.

https://dev.classmethod.jp/articles/onpremise-network-to-vpc-connection-considerations-with-government-cloud/

Now I'll ask about my favorite AWS services that I always include in my self-introduction.

24.のんピ の好きなAWSサービスを教えて.png

It answered with the source information, which is good.

I'll also ask "What are the key points to consider when connecting on-premises networks to VPC?"

25.オンプレミスネットワークとVPCとを接続する際に考慮すべきポイントって結局何.png

It gave me a reasonable response based on the PDF file contents.

Let me also query information about files placed on a volume with UNIX security style. The synchronized file contains names of koalas that I like.

26.NFSも同期.png

When I queried, I got "Unable to generate response as the retrieved content contains non-text data".

27.Unable to generate response as the retrieved content contains non-text data..png

This is because the information that matched the answer contains not only text but also images. In other words, it's because I placed koala images on the NTFS volume and synchronized them.

Since I want to return only text and not images this time, I'll filter the source file format x-amz-bedrock-kb-source-file-modality to TEXT and make the same query again.

28.のんピが好きなコアラは.png

Now I got a proper response.

Let's Integrate Various AWS Services Using S3 Access Points

I've demonstrated how to integrate data within Amazon FSx for NetApp ONTAP with Bedrock Knowledge Bases via S3 Access Points.

Moving data from on-premises to AWS is common due to availability, operational maintenance, and other factors. Using this feature allows you to advance data utilization as well.

This time I used Bedrock Knowledge Bases, but you can also analyze structured files with Athena, perform ETL processing with Glue, or integrate with SaaS services like Snowflake and Databricks.

Let's try connecting various AWS services using S3 Access Points. As of April 19, 2026, FSx for Windows File Server doesn't offer this feature or similar functionality, so if you want to manipulate data stored on SMB file servers using the S3 API, you'll need to choose FSxN.

With FSxN, you can synchronize data differences at the block level using SnapMirror, so if you're using ONTAP on-premises, it's nice that you don't need to use DataSync for data utilization. Being able to synchronize storage-to-storage means you don't need to add a DataSync Agent as an additional component, which is beneficial in terms of performance, cost, and operational burden.

Note that some S3 API operations such as object copying are not yet supported. Please check the following documentation.

https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/access-points-for-fsxn-object-api-support.html

Other precautions are the same as for the S3 Access Point feature of FSx for OpenZFS. For details, please see the following article.

https://dev.classmethod.jp/articles/amazon-fsx-openzfs-amazon-s3-access/

I hope this article helps someone.

That's all from nonpi (@non____97) of the Consulting Department, Cloud Business Division!

Share this article