Building Serverless RAG with GenU (Generative AI Use Cases JP) v5.4 and S3 Vectors

2026.02.19
This page has been translated by machine translation. View original
I'm from the Cloud Business Headquarters Consulting Division, Ishikawa. You can build a serverless RAG with dramatically lower running costs using S3 Vectors in the GenU (Generative AI Use Cases JP) v5.4 environment that was recently set up. I'd like to introduce how to use S3 Vectors, which just became GA at the end of last year, and AgentCore from GenU.
https://dev.classmethod.jp/articles/20260216-genu54-for-claude46opus/
 PrerequisitesVector store: Amazon S3 Vectors
RAG: Amazon Bedrock Knowledge Base
S3 Vectors and Bedrock Knowledge Base are in the Tokyo Region (ap-northeast-1)
Embedding model: Amazon Titan Text Embeddings V2
Create and deploy Amazon Bedrock AgentCore that calls Amazon Bedrock Knowledge Base
Deploy GenU (Generative AI Use Cases JP) that calls Amazon Bedrock Knowledge Base
 Resource Configuration Preparing the S3 Source BucketPlace the documents that will be stored in RAG (Amazon S3 Vectors) in an S3 bucket.
 Creating an S3 Vector BucketCreate a Vector Bucket to store vector data.


Setting item
Value


Vector bucket name
cm-rag-vector-bucket

Encryption
SSE-S3 (default)

 Creating an S3 Vector IndexCreate a vector index within the Vector Bucket.


Setting item
Value


Vector index name
cm-rag-index

Dimension
512

Distance metric
Cosine

Non-filterable metadata keys
AMAZON_BEDROCK_TEXT, AMAZON_BEDROCK_METADATA

 Creating an IAM Service RoleCreate a service role (cm-japanese-rag-kb-role) for Bedrock Knowledge Base to access various resources.
 4-1. Setting up the Trust PolicyTrusted entity type: Select Custom trust policy.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "<AccountID>"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:bedrock:<Region>:<AccountID>:knowledge-base/*"
        }
      }
    }
  ]
}
 4-2. Creating and Attaching Permission PoliciesCreate and attach the following inline policy (cm-japanese-rag-kb-policy).
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3SourceReadAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::cm-rag-source-bucket",
        "arn:aws:s3:::cm-rag-source-bucket/*"
      ]
    },
    {
      "Sid": "S3VectorsAccess",
      "Effect": "Allow",
      "Action": [
        "s3vectors:PutVectors",
        "s3vectors:GetVectors",
        "s3vectors:DeleteVectors",
        "s3vectors:QueryVectors",
        "s3vectors:GetIndex"
      ],
      "Resource": "arn:aws:s3vectors:<Region>:<AccountID>:bucket/cm-rag-vector-bucket/index/cm-rag-index"
    },
    {
      "Sid": "BedrockModelInvocation",
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": "arn:aws:bedrock:<Region>::foundation-model/amazon.titan-embed-text-v2:0"
    }
  ]
}
 Creating Amazon Bedrock Knowledge Base + Data SourceHere, we'll create an Amazon Bedrock Knowledge Base.
 Step 1: Knowledge Base DetailsCreate an Amazon Bedrock Knowledge Base and data source.


Setting item
Value


Knowledge base name
cm-japanese-rag-kb

Knowledge base description
Japanese text RAG with S3 Vectors

IAM permissions
cm-japanese-rag-kb-role

 Step 2: Setting up Data Source and Chunking StrategySet up the data source and chunking strategy. It's possible to maintain meaningful segments while considering sentence boundaries. However, additional model calls occur during processing, so there will be extra costs during data ingestion (injection).
Semantic chunking determines split positions based on semantic connections in the text, not just character count. This determination requires calling a model (embedding model).


Setting item
Value


Data source name
cm-japanese-rag-kb-s3-source

S3 URI
s3://cm-rag-source-bucket/documents/

Select Semantic chunking for Chunking strategy.


Setting item
Value


Maximum buffer size for comparing sentence groups
1

Maximum token size of chunks
15

Breakpoint threshold for sentence group similarity
90

Recommended values are based on the AWS official blog and should be adjusted according to the document type.
https://aws.amazon.com/jp/blogs/machine-learning/amazon-bedrock-knowledge-bases-now-supports-advanced-parsing-chunking-and-query-reformulation-giving-greater-control-of-accuracy-in-rag-based-applications/
 Step 3: Setting up the Embedding Model and Vector StoreEmbedding model


Setting item
Value


Embeddings model
Amazon Titan Text Embeddings V2

Vector dimensions
512

Note: Since the default dimension for Titan v2 is 1024, be sure to explicitly select 512. If it doesn't match the Vector Index dimension (512), you'll get a "Query vector contains invalid values or is invalid for this index" error.
Vector database


Setting item
Value


Vector store creation method
Create from existing vector store

Vector store
S3 Vectors

S3 Vector bucket
Select cm-rag-vector-bucket

S3 Vector index
Select cm-rag-index

 Step 4: Review and CreateReview your settings and click Create knowledge base.
 Syncing the Data Source (Ingestion)Select cm-japanese-rag-kb-s3-source in the Data Sources section
Click Sync
Wait until the status becomes Available
 Verifying Amazon Bedrock Knowledge BaseSelect the knowledge base (cm-japanese-rag-kb), open Test knowledge base, specify a model, enter a prompt, and confirm that a response is returned.
Now that we've created the RAG with S3 Vectors, let's incorporate it into GenU.
 Creating and Deploying Amazon Bedrock AgentCoreCreate and deploy an Amazon Bedrock AgentCore that calls the Amazon Bedrock Knowledge Base.
 Preparing the AgentCore ProgramHere, we'll create a strands agent that calls the Knowledge Base we created earlier. This agent retrieves information about "company regulations".
Create agent.py in the agentcore_s3vectors directory. Please replace KNOWLEDGE_BASE_ID with the ID you created earlier.
Note: I would like to explain about strands agent and AgentCore as well, but I'll omit it as the explanation would become too lengthy.
agentcore_s3vectors/agent.py
import os
import boto3
from botocore.config import Config
from strands import Agent, tool
from strands.models import BedrockModel
from bedrock_agentcore.runtime import BedrockAgentCoreApp

# Knowledge Base settings
KNOWLEDGE_BASE_ID = os.getenv("KNOWLEDGE_BASE_ID", "<Enter your KNOWLEDGE_BASE_ID here>")
KNOWLEDGE_BASE_REGION = os.getenv("KNOWLEDGE_BASE_REGION", "ap-northeast-1")

model_id = os.getenv("BEDROCK_MODEL_ID", "global.anthropic.claude-haiku-4-5-20251001-v1:0")
model = BedrockModel(
    model_id=model_id,
    max_tokens=4096,
    temperature=0.1,
    region_name="ap-northeast-1"
)

app = BedrockAgentCoreApp()

@tool
def get_rag(query: str, number_of_results: int = 5) -> str:
    """
    Search for information about company regulations, work rules, benefits, procedure guidelines.
    Always call this function when employees ask about "vacation system", "expense reimbursement", "service", 
    "childcare/nursing care leave", "condolence money", "wages", "travel expenses", "condolence leave", etc.,
    regarding internal rules or public systems.
    Provide accurate company regulations as answers based on the retrieved information.

    Args:
        query: Search query (user's question or content to search for)
        number_of_results: Maximum number of results to retrieve (default: 5)

    Returns:
        Text of search results
    """
    client = boto3.client(
        "bedrock-agent-runtime",
        region_name=KNOWLEDGE_BASE_REGION,
        config=Config(retries={"mode": "standard", "total_max_attempts": 3})
    )

    try:
        response = client.retrieve(
            knowledgeBaseId=KNOWLEDGE_BASE_ID,
            retrievalQuery={'text': query},
            retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'numberOfResults': number_of_results
                }
            }
        )

        results = response.get('retrievalResults', [])
        if not results:
            return "No relevant information found."

        # Format results
        output_lines = []
        for i, result in enumerate(results, 1):
            text = result.get('content', {}).get('text', '')
            score = result.get('score', 0.0)
            source = result.get('location', {}).get('s3Location', {}).get('uri', 'Unknown')
            output_lines.append(f"--- Result {i} (Score: {score:.2f}) ---")
            output_lines.append(f"Source: {source}")
            output_lines.append(text)
            output_lines.append("")

        return "\n".join(output_lines)

    except Exception as e:
        return f"Failed to search Knowledge Base: {str(e)}"

@app.entrypoint
async def entrypoint(payload):
    agent = Agent(model=model, tools=[get_rag])
    message = payload.get("prompt", "")
    # return {"result": agent(message).message}
    stream_messages = agent.stream_async(message)
    async for message in stream_messages:
        if "event" in message:
            yield message

if __name__ == "__main__":
    app.run()
 Installing Dependent ModulesCreate requirements.txt in the agentcore_s3vectors directory.
agentcore_s3vectors/requirements.txt
strands-agents
strands-agents-tools
bedrock-agentcore
bedrock-agentcore-starter-toolkit
Install the required modules.
pip install -r requirements.txt
 Agent Deployment ConfigurationRun agentcore configure --entrypoint agent.py to create a configuration file (agentcore_s3vectors/bedrock_agentcore/agent/.bedrock_agentcore.yaml). This time, we created it automatically with default settings.
agentcore_s3vectors % agentcore configure --entrypoint agent.py
Configuring Bedrock AgentCore...
✓ Using file: agent.py

🏷️  Inferred agent name: agent
Press Enter to use this name, or type a different one (alphanumeric without '-')
Agent name [agent]:
✓ Using agent name: agent

🔍 Detected dependency file: requirements.txt
Press Enter to use this file, or type a different path (use Tab for autocomplete):
Path or Press Enter to use detected dependency file: requirements.txt
✓ Using requirements file: requirements.txt

🚀 Deployment Configuration
Select deployment type:
  1. Direct Code Deploy (recommended) - Python only, no Docker required
  2. Container - For custom runtimes or complex dependencies
Choice [1]: 1

Select Python runtime version:
  1. PYTHON_3_10
  2. PYTHON_3_11
  3. PYTHON_3_12
  4. PYTHON_3_13
Note: Current Python 3.14 not supported, using python3.11
Choice [2]: 2
✓ Deployment type: Direct Code Deploy (python.3.11)

🔐 Execution Role
Press Enter to auto-create execution role, or provide execution role ARN/name to use existing
Execution role ARN/name (or press Enter to auto-create):
✓ Will auto-create execution role

🏗️  S3 Bucket
Press Enter to auto-create S3 bucket, or provide S3 URI/path to use existing
S3 URI/path (or press Enter to auto-create):
✓ Will auto-create S3 bucket

🔐 Authorization Configuration
By default, Bedrock AgentCore uses IAM authorization.
Configure OAuth authorizer instead? (yes/no) [no]:
✓ Using default IAM authorization

🔒 Request Header Allowlist
Configure which request headers are allowed to pass through to your agent.
Common headers: Authorization, X-Amzn-Bedrock-AgentCore-Runtime-Custom-*
Configure request header allowlist? (yes/no) [no]:
✓ Using default request header configuration
Configuring BedrockAgentCore agent: agent

Memory Configuration
Tip: Use --disable-memory flag to skip memory entirely

✅ MemoryManager initialized for region: ap-northeast-1
No existing memory resources found in your account

Options:
  • Press Enter to create new memory
  • Type 's' to skip memory setup

Your choice:
✓ Short-term memory will be enabled (default)
  • Stores conversations within sessions
  • Provides immediate context recall

Optional: Long-term memory
  • Extracts user preferences across sessions
  • Remembers facts and patterns
  • Creates session summaries
  • Note: Takes 120-180 seconds to process

Enable long-term memory? (yes/no) [no]:
✓ Using short-term memory only
Will create new memory with mode: STM_ONLY
Memory configuration: Short-term memory only
Network mode: PUBLIC
Setting 'agent' as default agent
 Deploy Agent to AgentCoreRun agentcore launch to deploy the agent to AgentCore.
agentcore_s3vectors % agentcore launch
🚀 Launching Bedrock AgentCore (cloud mode - RECOMMENDED)...
   • Deploy Python code directly to runtime
   • No Docker required (DEFAULT behavior)
   • Production-ready deployment

💡 Deployment options:
   • agentcore deploy                → Cloud (current)
   • agentcore deploy --local        → Local development

Launching with direct_code_deploy deployment for agent 'agent'
Creating memory resource for agent: agent
✅ MemoryManager initialized for region: ap-northeast-1
⠸ Launching Bedrock AgentCore...Creating new STM-only memory...
⏳ Creating memory resource (this may take 30-180 seconds)...
⠇ Launching Bedrock AgentCore...Created memory: agent_mem-Jh9uGT50Zu
Created memory agent_mem-Jh9uGT50Zu, waiting for ACTIVE status...
Waiting for memory agent_mem-Jh9uGT50Zu to return to ACTIVE state and strategies to reach terminal states...
⠋ Launching Bedrock AgentCore...[23:31:07]    ⏳ Memory: CREATING (10s elapsed)                                                                                                                                            manager.py:1029
⠏ Launching Bedrock AgentCore...[23:31:17]    ⏳ Memory: CREATING (20s elapsed)                                                                                                                                            manager.py:1029
:                                                                                   :                                                                                   :                                                                                   

⠹ Launching Bedrock AgentCore...[23:33:21]    ⏳ Memory: CREATING (144s elapsed)                                                                                                                                           manager.py:1029
⠙ Launching Bedrock AgentCore...[23:33:32]    ⏳ Memory: CREATING (155s elapsed)                                                                                                                                           manager.py:1029
⠦ Launching Bedrock AgentCore...Memory agent_mem-Jh9uGT50Zu is ACTIVE and all strategies are in terminal states (took 160 seconds)
[23:33:37]    ✅ Memory is ACTIVE (took 160s)                                                                                                                                              manager.py:1043
⠇ Launching Bedrock AgentCore...ObservabilityDeliveryManager initialized for region: ap-northeast-1, account: 123456789012
⠋ Launching Bedrock AgentCore...Created log group: /aws/vendedlogs/bedrock-agentcore/memory/APPLICATION_LOGS/agent_mem-Jh9uGT50Zu
⠋ Launching Bedrock AgentCore...✅ Logs delivery enabled for memory/agent_mem-Jh9uGT50Zu
⠧ Launching Bedrock AgentCore...Failed to enable observability for memory/agent_mem-Jh9uGT50Zu: ValidationException - X-Ray Delivery Destination is supported with CloudWatch Logs as a Trace Segment Destination. Please enable the CloudWatch Logs destination for your traces using the UpdateTraceSegmentDestination API (https://docs.aws.amazon.com/xray/latest/api/API_UpdateTraceSegmentDestination.html)
⚠️ Failed to enable observability: ValidationException: X-Ray Delivery Destination is supported with CloudWatch Logs as a Trace Segment Destination. Please enable the CloudWatch Logs destination for your
traces using the UpdateTraceSegmentDestination API (https://docs.aws.amazon.com/xray/latest/api/API_UpdateTraceSegmentDestination.html)
Memory created and active: agent_mem-Jh9uGT50Zu
Ensuring execution role...
Getting or creating execution role for agent: agent
Using AWS region: ap-northeast-1, account ID: 123456789012
Role name: AmazonBedrockAgentCoreSDKRuntime-ap-northeast-1-d4f0bc5a29
⠏ Launching Bedrock AgentCore...Role doesn't exist, creating new execution role: AmazonBedrockAgentCoreSDKRuntime-ap-northeast-1-d4f0bc5a29
Starting execution role creation process for agent: agent
✓ Role creating: AmazonBedrockAgentCoreSDKRuntime-ap-northeast-1-d4f0bc5a29
Creating IAM role: AmazonBedrockAgentCoreSDKRuntime-ap-northeast-1-d4f0bc5a29
⠹ Launching Bedrock AgentCore...✓ Role created: arn:aws:iam::123456789012:role/AmazonBedrockAgentCoreSDKRuntime-ap-northeast-1-d4f0bc5a29
⠴ Launching Bedrock AgentCore...✓ Execution policy attached: BedrockAgentCoreRuntimeExecutionPolicy-agent
Role creation complete and ready for use with Bedrock AgentCore
Execution role available: arn:aws:iam::123456789012:role/AmazonBedrockAgentCoreSDKRuntime-ap-northeast-1-d4f0bc5a29
Using entrypoint: agent.py (relative to /Users/ishikawa.satoru/workspaces/cc/rd/generative-ai-use-cases/agentcore_s3vectors)
Creating deployment package...
📦 No cached dependencies found, will build
Building dependencies (this may take a minute)...
Building dependencies for Linux ARM64 Runtime (manylinux2014_aarch64)
Installing dependencies with uv for aarch64-manylinux2014 (cross-compiling for Linux ARM64)...
⠴ Launching Bedrock AgentCore...✓ Dependencies installed with uv
Creating dependencies.zip...
⠏ Launching Bedrock AgentCore...✓ Dependencies cached
Packaging source code...
⠋ Launching Bedrock AgentCore...Creating deployment package...
⠋ Launching Bedrock AgentCore...✓ Deployment package ready: 44.26 MB
Getting or creating S3 bucket for agent: agent
Bucket doesn't exist, creating new S3 bucket: bedrock-agentcore-codebuild-sources-123456789012-ap-northeast-1
✅ Created S3 bucket: bedrock-agentcore-codebuild-sources-123456789012-ap-northeast-1
⠏ Launching Bedrock AgentCore...S3 bucket available: s3://bedrock-agentcore-codebuild-sources-123456789012-ap-northeast-1
Uploading deployment package to S3...
Uploading to s3://bedrock-agentcore-codebuild-sources-123456789012-ap-northeast-1/agent/deployment.zip...
⠴ Launching Bedrock AgentCore...✓ Deployment package uploaded: s3://bedrock-agentcore-codebuild-sources-123456789012-ap-northeast-1/agent/deployment.zip
Deploying to Bedrock AgentCore Runtime...
⠸ Launching Bedrock AgentCore...✅ Agent created/updated: arn:aws:bedrock-agentcore:ap-northeast-1:123456789012:runtime/agent-s921Yn5WVW
Waiting for agent endpoint to be ready...
⠏ Launching Bedrock AgentCore...Enabling observability...
⠹ Launching Bedrock AgentCore...Created/updated CloudWatch Logs resource policy
⠼ Launching Bedrock AgentCore...Configured X-Ray trace segment destination to CloudWatch Logs
X-Ray indexing rule already configured
Transaction Search configured: resource_policy, trace_destination
🔍 GenAI Observability Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=ap-northeast-1#gen-ai-observability/agent-core
✅ Deployment completed successfully - Agent: arn:aws:bedrock-agentcore:ap-northeast-1:123456789012:runtime/agent-s921Yn5WVW
 Call Amazon Bedrock AgentCore from GenU Edit parameter.tsEdit generative-ai-use-cases/packages/cdk/parameter.ts to add and override agentCoreRegion and agentCoreExternalRuntimes in the envs section to configure calling Amazon Bedrock AgentCore from GenU. Replace the AgentCore ARN with the ARN displayed in the last line after deploying AgentCore.
✅ Deployment completed successfully - Agent: arn:aws:bedrock-agentcore:ap-northeast-1:123456789012:runtime/agent-s921Yn5WVW
  '': {
    selfSignUpEnabled: false,
    agentCoreRegion: 'ap-northeast-1',
    agentCoreExternalRuntimes: [
      {
        name: 'Company Policies (S3 Vectors RAG)',
        arn: '<Replace with your AgentCore ARN>',
        description: 'S3 Vectors for RAG'
      }
  },

After configuration, deploy again.
npm run cdk:deploy
 Testing[AgentCore Experimental] has been added to the left navigation. Clicking on it will show "Company Policies (S3 Vectors RAG)".
For a prompt like "I got married. Do I get any money?" which would be difficult to match with keywords alone, semantic search generates an answer based on company information.
 ConclusionIn this article, we introduced how to build a cost-effective RAG with a serverless configuration by combining GenU v5.4 and Amazon S3 Vectors.
Compared to traditional vector stores like Amazon Kendra or OpenSearch Serverless, S3 Vectors offers vector search capabilities with a serverless and simple configuration, which significantly reduces running costs. Additionally, using Amazon Bedrock AgentCore allows you to seamlessly call Bedrock Knowledge Base from GenU, which is very practical.
The fact that it can provide appropriate answers to queries like "I got married. Do I get any money?" that are difficult to match by keywords demonstrates the power of RAG through semantic search. I hope you can utilize this for use cases directly related to daily business operations such as company policies and FAQs.
Building Serverless RAG with GenU (Generative AI Use Cases JP) v5.4 and S3 Vectors

Prerequisites

Resource Configuration

Preparing the S3 Source Bucket

Creating an S3 Vector Bucket

Creating an S3 Vector Index

Creating an IAM Service Role

4-1. Setting up the Trust Policy

4-2. Creating and Attaching Permission Policies

Creating Amazon Bedrock Knowledge Base + Data Source

Step 1: Knowledge Base Details

Step 2: Setting up Data Source and Chunking Strategy

Step 3: Setting up the Embedding Model and Vector Store

Step 4: Review and Create

Syncing the Data Source (Ingestion)

Verifying Amazon Bedrock Knowledge Base

Creating and Deploying Amazon Bedrock AgentCore

Preparing the AgentCore Program

Installing Dependent Modules

Agent Deployment Configuration

Deploy Agent to AgentCore

Call Amazon Bedrock AgentCore from GenU

Edit parameter.ts

Testing

Conclusion

AWS Topics

Trending Topics

Products & Services

Features and Series

Setting item	Value
Vector bucket name	`cm-rag-vector-bucket`
Encryption	SSE-S3 (default)
Setting item	Value
Vector index name	`cm-rag-index`
Dimension	`512`
Distance metric	`Cosine`
Non-filterable metadata keys	`AMAZON_BEDROCK_TEXT`, `AMAZON_BEDROCK_METADATA`
Setting item	Value
Knowledge base name	`cm-japanese-rag-kb`
Knowledge base description	`Japanese text RAG with S3 Vectors`
IAM permissions	`cm-japanese-rag-kb-role`
Setting item	Value
Data source name	`cm-japanese-rag-kb-s3-source`
S3 URI	`s3://cm-rag-source-bucket/documents/`
Setting item	Value
Maximum buffer size for comparing sentence groups	`1`
Maximum token size of chunks	`15`
Breakpoint threshold for sentence group similarity	`90`
Setting item	Value
Embeddings model	Amazon Titan Text Embeddings V2
Vector dimensions	512
Setting item	Value
Vector store creation method	Create from existing vector store
Vector store	S3 Vectors
S3 Vector bucket	Select `cm-rag-vector-bucket`
S3 Vector index	Select `cm-rag-index`