[REPORT] Building Multi-tenant SaaS Agents with Amazon Bedrock AgentCore #AWSreInvent #SAS407

[REPORT] Building Multi-tenant SaaS Agents with Amazon Bedrock AgentCore #AWSreInvent #SAS407

2025.12.02

This page has been translated by machine translation. View original

Introduction

Hello, this is Kamino from the Consulting Department, who loves supermarkets.

I'm energetically participating in AWS re:Invent 2025!
This time I attended the session "Building multi-tenant SaaS agents with Amazon Bedrock AgentCore (SAS407)"!

I've been using Amazon Bedrock AgentCore, which became GA in October, and I was curious about how to implement multi-tenancy when providing agents as SaaS, so I attended this session. This session introduced specific patterns for building multi-tenant SaaS agents using Amazon Bedrock AgentCore, so I'm writing this session report.

Session Overview

  • Title: Building multi-tenant SaaS agents with Amazon Bedrock AgentCore (SAS407)
  • Date & Time: Mon, December 1, 1:00 PM - 2:00 PM PST
  • Location: Wynn | Upper Convention Promenade | Bollinger
  • Speakers: Bill Tarr (Principal Partner Solutions Architect), Ujwal Bukka (Senior Partner Solutions Architect)
  • Level: 400 – Expert

Official Abstract:

The introduction of Amazon Bedrock AgentCore equips builders with a range of new tools and technologies. These tools enable a range of new strategies and techniques that will directly impact how teams build multi-tenant AI solutions and agents. This session will dig into working examples how tenancy is leveraged and landed in an intelligence-as-a-service environment, highlighting the multi-tenant nuances and possibilities that can built with the latest AI-powered AWS services. This includes a deep dive AgentCore multi-tenant pattern, spanning the identity, memory, gateway, observability, and run-time elements of the AgentCore experience. The goal here is to understand how SaaS providers can introduce tenant-context into their agents to support core SaaS mechanisms and constructs (onboarding, isolation, data partitioning, identity, etc.).

It's a pretty large venue, but many participants gathered, showing high interest in AgentCore.
Everyone is curious about how to specifically implement AgentCore in SaaS, right?

Session Content

Five Architectural Challenges in SaaS

IMG_6208

At the beginning of the session, the speaker introduced five major challenges faced when building SaaS.
You may have heard some of these before. I was listening thinking I had vaguely heard about these in the past.

Tenant Onboarding

Tenant onboarding refers to the challenge of shortening the time from when customers learn about the product to when they actually derive value from it. In SaaS, if the onboarding experience is poor and it takes a week to start using the product, customers will leave.

SaaS Identity

The challenge of how to authenticate and authorize users, and how to propagate tenant ID and tenant context throughout the solution.

Data Partitioning

The method of properly managing tenant data. For example, tenant data needs to be separated in logical or physical buckets and securely protected from other tenant's data.

Tenant Isolation

Policies that define which resources tenants can and cannot access. Tenant resources need to be clearly and explicitly defined.

SaaS Observability

The ability to monitor tenant health. Especially in agent solutions with many moving parts, observability of who is doing what becomes important.

When I reconsider these challenges, they're all important, and trying to properly address them seems difficult.

Introduction to Amazon Bedrock AgentCore

CleanShot 2025-12-02 at 15.59.48@2x

The following features of Amazon Bedrock AgentCore were introduced to address the above challenges.
For more detailed information on each feature, please refer to the blog post below:

https://dev.classmethod.jp/articles/amazon-bedrock-agentcore-developersio-2025-osaka/

Here I'm describing these at the introductory level as presented in the session.

AgentCore Runtime

An execution environment for deploying and securely scaling agent code. It's convenient because you can host AI agents as a managed service, minimizing infrastructure management.

AgentCore Gateway

A feature for deploying tool code. It was described in the session as "MCP as a Service". Agents running in AgentCore Runtime can call tools deployed in Gateway through MCP.

Since this alone may be difficult to fully understand Gateway, those interested should refer to the blog post below for more details on how to use Gateway:

https://dev.classmethod.jp/articles/amazon-bedrock-agentcore-production-tips-ai-builders-day-2025/

My impression is that it's useful for cases where you want to convert existing assets (like Lambda functions) into MCP tools or centralize the MCP Servers you use.

AgentCore Identity

Manages authentication and authorization for inbound calls (user to agent) and outbound calls (agent to external resources).

AgentCore Memory

Stores the agent's conversation memory and maintains context for resolving tasks. There are two types: short-term memory and long-term memory.

AgentCore Observability

Provides monitoring capabilities to see what agents are doing.

It's a useful service that visualizes agent activities on a dashboard from logs in OpenTelemetry format.
You can also see the elapsed time for each process, allowing you to analyze performance bottlenecks.

Multi-tenant Models

IMG_6214

When building SaaS solutions, you need to consider different deployment models depending on customer types.

Silo/Dedicated Model

A model where each tenant has its own stack. Web servers, compute, everything has an independent architecture. It's a simple architecture, but as the number of tenants increases, operation and maintenance become difficult.

The image is that AWS resources will independently increase (for example, 10 EC2 instances per tenant), which will increase monetary costs.

Pooled/Shared Model

A model where infrastructure is shared among multiple tenants. At runtime, you need to determine "who are you?", "what are you trying to do?", and "do you have permission to perform that operation?". It's complex but efficient in terms of resource costs. It's a trade-off.

Hybrid/Bridge Model

A combination of the above two. It's a flexible configuration where some components are shared and others are dedicated. For example, EC2 might be shared but RDS is independent.

In actual SaaS, you might offer the Basic Tier as a pooled model and Premium Tier as a silo model. None is necessarily superior; it's important to consider the optimal model according to customer requirements.

Tenant Onboarding

IMG_6217

As a SaaS best practice, building a control plane is recommended. The control plane includes onboarding services, tenant provisioning services, and tenant management services.

When onboarding a tenant, appropriate infrastructure is provisioned according to which model (silo or pool) the tenant maps to.

For the silo model, dedicated resources are provisioned during tenant onboarding, while for the pool model, the tenant is linked to pre-provisioned shared resources.

SaaS Identity and AgentCore Identity

IMG_6221

An example using Amazon Cognito for SaaS identity management was introduced. It adopts the "User Pool per Tenant" approach as a tenant boundary, clearly separating tenants: users of Tenant A in pool A, users of Tenant B in pool B.

Users can be set with custom attributes (custom claims in JWT tokens) such as:

  • Tenant ID
  • Status
  • Tier

There are two types of JWT tokens: Identity Token and Access Token. Identity Token automatically inherits custom claims, but Access Token does not. Since Access Token is used within AgentCore, it's effective to use a pre-token generation Lambda trigger to copy custom claims to the Access Token.

Inserting Lambda here was a learning point for me. It's important to note that custom claims are not inherited by Access Tokens.

Inbound Authentication

IMG_6223

Inbound authentication is the authentication when users call the agent.

If you set up an IdP (discovery URL) for AgentCore Runtime, it will handle authentication, so developers don't need to implement authentication logic themselves, which is nice.

Outbound Authentication

IMG_6227

For authentication when agents access external resources, there are the following patterns:

Access to AWS Resources

Authorized by attaching IAM execution roles to AgentCore Runtime.
This is the same as with Lambda or ECS.

Access to External Resources (OAuth/API Keys)

Each agent can access external resources by obtaining API keys or OAuth permissions using Outbound Auth.

When obtaining access tokens for external resources, there's a convenient feature that allows you to dynamically acquire access tokens just by using the @requires_access_token annotation in code.

AgentCore Gateway Authentication

IMG_6228

Authentication for AgentCore Gateway was also explained. The design principles explained in AgentCore Identity can be similarly applied to Gateway.

  • AgentCore Runtime agents call AgentCore Gateway using JWT tokens
  • Set AgentCore Identity to AgentCore Gateway to authenticate requests

Additionally, AgentCore Gateway has a new feature called Gateway Interceptor that can intercept requests to the Gateway and retrieve header information. It's possible to obtain JWT tokens for use in tools or extract and utilize tenant context.

Specifically, this is the feature below. I also want to test and introduce it:

https://aws.amazon.com/jp/blogs/machine-learning/apply-fine-grained-access-control-with-bedrock-agentcore-gateway-interceptors/

Data Partitioning

IMG_6230

The session explained how to separate data access for AgentCore Memory and AWS resources.

AgentCore Memory

The session explained methods for partitioning AgentCore Memory in both silo and pool models.

For Silo Model

IMG_6233

In the silo model, dedicated AgentCore Memory is created for each tenant.

To create events in short-term memory, the following are required:

  • Memory ID: A unique ID generated when creating AgentCore Memory
  • Session ID: A unique ID for each session
  • Actor ID: A unique key for the user

The convention for Actor ID was introduced as Tenant ID : Subject.

However, the speaker explained: "If you just want to partition by tenant, Actor ID could just be Tenant ID. But in that case, in the silo model, you're already creating dedicated Memory for each tenant, so this convention isn't very useful. This convention becomes useful when you transition to the pool model."

When I looked at the diagram, I thought that with the silo model, you wouldn't need to include the tenant ID in the actor_id since memories are already separated, so I nodded in understanding.

For Pool Model

In the pool model, shared AgentCore Memory is used across multiple tenants. In this case, partitioning by Actor ID becomes important.

Actor ID = Tenant ID : Subject

This rule becomes effective for separating data by tenant user within shared Memory.

Long-term Memory

Long-term memory uses Namespaces to partition data. Actor ID is also utilized here.

Namespace = {actor_id}/...

As a personal observation, I find it questionable whether Memory should be shared in the pool model. If the design is wrong, there's a risk of accessing others' memories or extracting long-term memories from different tenants, so I feel inclined to divide memory by tenant. Of course, it depends on requirements, and as a premise, permission separation settings need careful consideration.

AWS Resources

IMG_6235

Amazon Bedrock Knowledge Base

In the silo model, dedicated Knowledge Base and vector database are created for each tenant. In the pool model, when incorporating tenant-specific data into a shared Knowledge Base, Tenant ID is added as metadata.

Amazon DynamoDB

In the silo model, tables are created for each tenant. In the pool model, a single table is shared, and Tenant ID is used as a partition key.

Amazon S3

In the silo model, buckets are created for each tenant. In the pool model, Tenant ID is used as a prefix within a single bucket.

While I understand the approach here, I personally think a bridge model where resources are divided by customer rather than forcibly using shared Knowledge Base, S3, or DynamoDB might be better in some cases.

Tenant Isolation

Partitioning data alone is insufficient. You need to ensure tenants can only access their own resources.

Silo Model

IMG_6239

For dedicated resources, attach IAM execution roles to AgentCore Runtime that only allow access to tenant-specific resources.

Pool Model

IMG_6240

For shared resources, use ABAC (Attribute Based Access Control) roles.

IAM Permission Separation Example
IAM separation example using TenantId
{
  "Condition": {
    "StringEquals": {
      "dynamodb:LeadingKey": "${aws:PrincipalTag/TenantId}"
    }
  }
}

Retrieve tenant context from JWT tokens, use STS (Security Token Service) to assume this role, and obtain permissions to access only tenant-specific data.

Observability

IMG_6244

The speaker emphasized that SaaS cannot be discussed without observability.

Tenant ID is Not a First-Class Concept

An important point is that Tenant ID is not a first-class concept in CloudWatch or third-party observability solutions because not all solutions are SaaS.

Therefore, when building multi-tenant solutions, you need to prepare SaaS-specific custom metrics and custom dashboards.

Understanding Operational Costs for Individual Tenants

In SaaS, it's important to understand the operational costs for individual tenants. Especially in AI agent applications, inference costs are the main expense.

Implement custom metrics like the following to measure input/output token counts:

IMG_6246

Collected logs can be aggregated by tenant using CloudWatch Logs Insights.

Cost Visualization

By storing collected data in DynamoDB and visualizing it with BI tools like QuickSight, dashboards like the following can be created:

IMG_6247

  • Cost per Tenant: Cost for each tenant
  • Feature Cost per Tier: Cost by tier

This allows business users to understand the operational costs of tenants and set appropriate pricing.

Supplementary: For Those Who Want to See Actual Implementations

The content presented in the session is also available as a working workshop. Sample code is published on GitHub, so checking what configuration it has might be educational. I also plan to take the workshop tomorrow to deepen my understanding.

https://github.com/aws-samples/sample-saas-multi-agents-workshop

Summary

This session provided comprehensive patterns for building multi-tenant SaaS agents using Amazon Bedrock AgentCore.

The explanation of how to implement AgentCore solutions addressing SaaS challenges helped me organize my understanding and recognize what areas need careful attention!

I hope this article was helpful. Thank you for reading to the end!

Share this article

FacebookHatena blogX

Related articles