I tried Bedrock stream responses with API Gateway
This page has been translated by machine translation. View original
On November 19, 2025, Amazon API Gateway was updated to support stream responses.
Traditional API Gateway had limitations such as a 29-second integration timeout and a 10MB payload size limit. With this update, it's now possible to stream generated data in chunks directly to clients, allowing users to bypass these limitations.
I had the opportunity to test this functionality by building a Lambda function that utilizes Bedrock (Claude Haiku 4.5) stream responses and an API Gateway with streaming support using CloudFormation. I'd like to share my experience.
Test Environment
For this test, I deployed the following Lambda function and API Gateway using CloudFormation:
- Runtime: Node.js 20.x
- Model: Claude Haiku 4.5 (Bedrock)
- Feature: API Gateway Response Streaming
Lambda Function
- Instead of using the wrapper
exports.handler = async..., I usedawslambda.streamifyResponse - Output chunked data using
responseStream.write
// ... (omitted) ...
exports.handler = awslambda.streamifyResponse(async (event, responseStream, context) => {
const httpResponseMetadata = { /* ... */ };
// Required: Send metadata
responseStream = awslambda.HttpResponseStream.from(responseStream, httpResponseMetadata);
// ... (Bedrock invocation process) ...
for await (const chunk of response.body) {
// Important: Write chunked data to the stream sequentially
responseStream.write(text.delta.text);
}
responseStream.end();
});
API Gateway
- Specified
ResponseTransferMode: STREAM - Added
/response-streaming-invocationsto the end of the Uri
StreamMethod:
Type: AWS::ApiGateway::Method
Properties:
RestApiId: !Ref RestApi
ResourceId: !Ref StreamResource
HttpMethod: POST
AuthorizationType: NONE
Integration:
Type: AWS_PROXY
IntegrationHttpMethod: POST
Uri: !Sub 'arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${StreamingLambda.Arn}/response-streaming-invocations'
ResponseTransferMode: STREAM
TimeoutInMillis: 300000
CloudFormation
CloudFormation template used for testing
AWSTemplateFormatVersion: '2010-09-09'
Description: 'API Gateway Response Streaming with Lambda and Bedrock Haiku 4.5'
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: BedrockInvokePolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- bedrock:InvokeModelWithResponseStream
Resource: '*'
StreamingLambda:
Type: AWS::Lambda::Function
Properties:
FunctionName: BedrockStreamingFunction
Runtime: nodejs20.x
Handler: index.handler
Role: !GetAtt LambdaExecutionRole.Arn
Timeout: 300
Code:
ZipFile: |
const { BedrockRuntimeClient, InvokeModelWithResponseStreamCommand } = require("@aws-sdk/client-bedrock-runtime");
const client = new BedrockRuntimeClient({ region: process.env.AWS_REGION });
exports.handler = awslambda.streamifyResponse(async (event, responseStream, context) => {
const httpResponseMetadata = {
statusCode: 200,
headers: {
'Content-Type': 'text/plain',
'Access-Control-Allow-Origin': '*'
}
};
responseStream = awslambda.HttpResponseStream.from(responseStream, httpResponseMetadata);
try {
const body = JSON.parse(event.body || '{}');
const prompt = body.prompt || "こんにちは";
const command = new InvokeModelWithResponseStreamCommand({
modelId: "jp.anthropic.claude-haiku-4-5-20251001-v1:0",
contentType: "application/json",
accept: "application/json",
body: JSON.stringify({
anthropic_version: "bedrock-2023-05-31",
max_tokens: 100000,
messages: [{
role: "user",
content: prompt
}]
})
});
const response = await client.send(command);
for await (const chunk of response.body) {
if (chunk.chunk?.bytes) {
const text = JSON.parse(new TextDecoder().decode(chunk.chunk.bytes));
if (text.type === 'content_block_delta' && text.delta?.text) {
responseStream.write(text.delta.text);
}
}
}
responseStream.end();
} catch (error) {
responseStream.write(`Error: ${error.message}`);
responseStream.end();
}
});
LambdaInvokePermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref StreamingLambda
Action: lambda:InvokeFunction
Principal: apigateway.amazonaws.com
SourceArn: !Sub 'arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${RestApi}/*'
RestApi:
Type: AWS::ApiGateway::RestApi
Properties:
Name: BedrockStreamingAPI
Description: API Gateway with response streaming for Bedrock
StreamResource:
Type: AWS::ApiGateway::Resource
Properties:
RestApiId: !Ref RestApi
ParentId: !GetAtt RestApi.RootResourceId
PathPart: stream
StreamMethod:
Type: AWS::ApiGateway::Method
Properties:
RestApiId: !Ref RestApi
ResourceId: !Ref StreamResource
HttpMethod: POST
AuthorizationType: NONE
Integration:
Type: AWS_PROXY
IntegrationHttpMethod: POST
Uri: !Sub 'arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${StreamingLambda.Arn}/response-streaming-invocations'
ResponseTransferMode: STREAM
TimeoutInMillis: 300000
OptionsMethod:
Type: AWS::ApiGateway::Method
Properties:
RestApiId: !Ref RestApi
ResourceId: !Ref StreamResource
HttpMethod: OPTIONS
AuthorizationType: NONE
Integration:
Type: MOCK
IntegrationResponses:
- StatusCode: 200
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key'"
method.response.header.Access-Control-Allow-Methods: "'POST,OPTIONS'"
method.response.header.Access-Control-Allow-Origin: "'*'"
ResponseTemplates:
application/json: ''
RequestTemplates:
application/json: '{"statusCode": 200}'
MethodResponses:
- StatusCode: 200
ResponseParameters:
method.response.header.Access-Control-Allow-Headers: true
method.response.header.Access-Control-Allow-Methods: true
method.response.header.Access-Control-Allow-Origin: true
Deployment:
Type: AWS::ApiGateway::Deployment
DependsOn:
- StreamMethod
- OptionsMethod
Properties:
RestApiId: !Ref RestApi
Stage:
Type: AWS::ApiGateway::Stage
Properties:
RestApiId: !Ref RestApi
DeploymentId: !Ref Deployment
StageName: prod
Outputs:
ApiEndpoint:
Description: API Gateway endpoint URL
Value: !Sub 'https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/prod/stream'
TestCommand:
Description: Test command using curl
Value: !Sub |
curl --no-buffer -X POST https://${RestApi}.execute-api.${AWS::Region}.amazonaws.com/prod/stream \
-H "Content-Type: application/json" \
-d '{"prompt":"日本のAWSリージョンについて教えてください"}'
Testing
Execution Command
I sent a request to the deployed API using the curl command.
To observe the streaming behavior, I added the --no-buffer option.
curl --no-buffer -X POST https://****.ap-northeast-1.amazonaws.com/prod/stream \
-H "Content-Type: application/json" \
-d '{"prompt":"日本のAWSリージョンについて、以下の観点から詳しく説明してください:1) 各リージョンの歴史と開設時期、2) 提供されているサービスの違い、3) アベイラビリティゾーンの構成、4) レイテンシーとパフォーマンス特性、5) 料金体系の違い、6) ディザスタリカバリー戦略での活用方法、7) コンプライアンスと規制対応、8) 今後の展望。各項目について具体例を交えて説明してください。"}'
Results
I received about 52KB of text streamed over approximately 1.5 minutes.
100 55068 0 54536 100 532 571 5 0:01:46 0:01:35 0:00:11 472
このような選択により、ビジネス要件に応じた最適なAWS利用戦略が実現できます。
- Completed in 95.500s
I confirmed that while a process that would have hit the 29-second timeout limit in previous versions of API Gateway took 95 seconds, it wasn't disconnected due to streaming, and I was able to receive the complete response.
Summary
Previously, when implementing responses from AI generation models that run for extended periods or large data downloads, we often had to choose configurations like ALB (Application Load Balancer) + Fargate to avoid API Gateway limitations.
With this update, we can now handle these requirements while maintaining a serverless configuration (API Gateway + Lambda). Consider simplifying your architecture through Lambda implementation.
