Benefits gained from migrating Claude invocations on Bedrock from boto3 to the Anthropic SDK

A practical record of migrating from boto3's Converse API to the Anthropic Python SDK's AnthropicBedrock client. This article introduces the benefits along with concrete code diffs, including simplified async support, improved type safety, removal of the $defs/$ref workaround, and enabling prompt caching.
lin-yuchen
2026.05.29
This page has been translated by machine translation. View original
 IntroductionI'm operating a Python backend that calls Claude (Sonnet/Haiku) on Amazon Bedrock. Originally I was using boto3's converse() / converse_stream() APIs, but upon discovering that the official Anthropic Python SDK provides an AnthropicBedrock client, I carried out the migration.
To get straight to the point: code volume decreased by approximately 30%, async support became natural, and Anthropic's latest features (such as prompt caching) became immediately usable on Bedrock — so this is a recommended migration for projects using Claude via Bedrock.
This article introduces what actually changed in the migration, and what benefits were gained, along with concrete code diffs.
 Prerequisites & Environment

Item
Value


Python
3.13

Framework
FastAPI (async)

Before migration
boto3 (converse / converse_stream API)

After migration
anthropic[bedrock] (Messages API)

Model
Claude Sonnet 4.5 / Claude Haiku

Authentication
EC2 instance role (IAM)

Region
ap-northeast-1 (Tokyo)

 Why I MigratedWhen using the Bedrock Converse API with boto3, I faced the following challenges.
 1. Complexity of Async Supportboto3 is a synchronous client. To use it with an async framework like FastAPI, I needed to wrap blocking calls with asyncio.to_thread().
# boto3 era: running synchronous functions in a thread pool
def _blocking_call() -> str:
    response = _bedrock_client.converse(
        modelId=settings.bedrock_haiku_model_id,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        system=[{"text": system}],
        inferenceConfig={"maxTokens": 100, "temperature": 0.0},
    )
    return response["output"]["message"]["content"][0]["text"].strip()

return await asyncio.to_thread(_blocking_call)
Streaming was even more complex, requiring a thread-bridge using asyncio.Queue and loop.call_soon_threadsafe().
# boto3 era: thread bridge for streaming
loop = asyncio.get_running_loop()
queue: asyncio.Queue = asyncio.Queue()

def _blocking_stream() -> None:
    response = _bedrock_client.converse_stream(...)
    for event in response["stream"]:
        chunk = event["contentBlockDelta"].get("delta", {}).get("toolUse", {}).get("input", "")
        loop.call_soon_threadsafe(queue.put_nowait, chunk)
    loop.call_soon_threadsafe(queue.put_nowait, None)  # sentinel

task = asyncio.create_task(asyncio.to_thread(_blocking_stream))
while True:
    item = await queue.get()
    if item is None:
        break
    yield item
 2. Responses Are Untyped Dictionariesboto3 responses are plain Python dictionaries. IDE completion didn't work, and defensive access was necessary.
# boto3: dictionary access — typos are hard to notice
for block in response["output"]["message"]["content"]:
    if block.get("toolUse", {}).get("name") == "classify_problem":
        return ProblemClassification(**block["toolUse"]["input"])
 3. No Support for JSON Schema $defs/$refThe Bedrock Converse API's toolConfig.inputSchema does not support JSON Schema $defs/$ref. Since Pydantic model's model_json_schema() uses $ref for nested models, a workaround function to expand $ref before sending was necessary.
# boto3 era: helper to manually inline-expand $defs/$ref
def _resolve_schema_refs(schema: dict) -> dict:
    defs = schema.pop("$defs", {})

    def _resolve(obj: Any) -> Any:
        if isinstance(obj, dict):
            if "$ref" in obj:
                ref_name = obj["$ref"].split("/")[-1]
                return _resolve(dict(defs[ref_name]))
            return {k: _resolve(v) for k, v in obj.items()}
        if isinstance(obj, list):
            return [_resolve(item) for item in obj]
        return obj

    return _resolve(schema)

# Call sites (3 places)
tool_schema = _resolve_schema_refs(ChatToolInput.model_json_schema())
 4. Inability to Use Anthropic's Latest Featuresboto3's Converse API is an AWS-proprietary interface, and even when Anthropic releases new features, they can't be used until AWS adds support for them in the Converse API. Features like prompt caching and extended thinking fall into this category.
 The Actual Migration Installationuv add "anthropic[bedrock]>=0.104.0"
Since the [bedrock] extra includes boto3 signing processing, there's no need to directly import boto3 for LLM calls. However, boto3 is still needed if you're using other AWS services like S3 or Knowledge Base.
 Client Initialization# Before
import boto3
_bedrock_client = boto3.client("bedrock-runtime", region_name=settings.aws_region)

# After
from anthropic.lib.bedrock import AsyncAnthropicBedrock
_client = AsyncAnthropicBedrock(aws_region=settings.aws_region)
Default authentication chain via EC2 instance role works as-is. You can explicitly pass aws_access_key/aws_secret_key, but it's unnecessary if you're using an IAM role.
 Non-Streaming CallsThe changes were surprisingly simple.
# Before (boto3)
def _blocking_call() -> str:
    response = _bedrock_client.converse(
        modelId=settings.bedrock_haiku_model_id,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        system=[{"text": system}],
        inferenceConfig={"maxTokens": 100, "temperature": 0.0},
    )
    return response["output"]["message"]["content"][0]["text"].strip()

return await asyncio.to_thread(_blocking_call)

# After (Anthropic SDK)
response = await _client.messages.create(
    model=settings.bedrock_haiku_model_id,
    messages=[{"role": "user", "content": prompt}],
    system=system,
    max_tokens=100,
    temperature=0.0,
)
block = response.content[0]
return block.text.strip() if block.type == "text" else ""
What changed:
asyncio.to_thread() + inner function are no longer needed (AsyncAnthropicBedrock is natively async)
messages format is simpler (no need to wrap in [{"text": "..."}])
system accepts a string directly (no need for [{"text": "..."}])
Response is a typed object (block.text, block.type have completion support)
 Tool UseTool Use definitions also became flatter.
# Before (boto3 Converse API)
toolConfig={
    "tools": [
        {
            "toolSpec": {
                "name": "classify_problem",
                "description": "...",
                "inputSchema": {"json": _resolve_schema_refs(tool_schema)},
            }
        }
    ],
    "toolChoice": {"tool": {"name": "classify_problem"}},
}

# After (Anthropic SDK)
tools=[
    {
        "name": "classify_problem",
        "description": "...",
        "input_schema": tool_schema,  # $defs/$ref can be used as-is
    }
],
tool_choice={"type": "tool", "name": "classify_problem"},
Since the Messages API natively supports $defs/$ref, I was able to entirely remove the _resolve_schema_refs() helper (20 lines) and its 3 call sites.
Response retrieval also becomes typed.
# Before
for block in response["output"]["message"]["content"]:
    if block.get("toolUse", {}).get("name") == "classify_problem":
        return ProblemClassification(**block["toolUse"]["input"])

# After
for block in response.content:
    if block.type == "tool_use" and block.name == "classify_problem":
        return ProblemClassification(**block.input)
 StreamingThe most dramatic improvement was in streaming. The thread bridge of asyncio.Queue, call_soon_threadsafe, sentinel values, and asyncio.create_task(asyncio.to_thread(...)) was replaced by async with ... stream().
# Before (boto3): thread bridge of 60+ lines
loop = asyncio.get_running_loop()
queue: asyncio.Queue = asyncio.Queue()

def _blocking_stream() -> None:
    try:
        response = _bedrock_client.converse_stream(...)
        for event in response["stream"]:
            if "contentBlockDelta" not in event:
                continue
            chunk = event["contentBlockDelta"].get("delta", {}).get("toolUse", {}).get("input", "")
            # ...processing...
            loop.call_soon_threadsafe(queue.put_nowait, token)
    except Exception as exc:
        loop.call_soon_threadsafe(queue.put_nowait, exc)
    finally:
        loop.call_soon_threadsafe(queue.put_nowait, None)

task = asyncio.create_task(asyncio.to_thread(_blocking_stream))
try:
    while True:
        item = await queue.get()
        if item is None:
            break
        if isinstance(item, BaseException):
            raise item
        yield item
finally:
    await task

# After (Anthropic SDK): native async stream
async with _client.messages.stream(
    model=settings.bedrock_model_id,
    messages=sdk_messages,
    system=CHAT_SYSTEM_PROMPT,
    max_tokens=8192,
    temperature=0.3,
    tools=[...],
    tool_choice={"type": "tool", "name": "submit_response"},
) as stream:
    async for event in stream:
        if event.type != "input_json":
            continue
        chunk = event.partial_json
        # ...same state machine for token processing...
        yield token
The state machine (token extraction for general_advice and JSON parsing of TicketResult) uses exactly the same logic as the boto3 era, but the infrastructure code surrounding it was dramatically reduced.
 Benefits Gained After Migration: Prompt CachingAs a concrete example of migration benefits, I'll introduce prompt caching.
 What Is Prompt CachingAnthropic's prompt caching is a feature that caches common prefixes shared across requests (system prompts, tool definitions, etc.) to speed up processing from the second request onward. Cached tokens have 90% reduced input cost, and TTFT (time to first token) is also reduced.
This feature is only supported in the Anthropic Messages API and cannot be used with boto3's Converse API. This was one of the deciding factors for migration.
 ImplementationEnabling prompt caching in the post-migration code requires only a single line change.
tools=[
    {
        "name": "submit_response",
        "description": "Submit the structured helpdesk response",
        "input_schema": tool_schema,
        "cache_control": {"type": "ephemeral"},  # Add this one line
    }
],
By placing cache_control on the last element of the tool definitions, the entire static prefix from the system prompt through the tool definitions gets cached. Since prompt caching operates on a prefix basis, all tokens before the position of cache_control become eligible for caching.
For Sonnet, a minimum prefix of 1,024 tokens is required. Since this project's system prompt (~1,000+ tokens) combined with tool definitions (~300–500 tokens) totals ~1,500+ tokens, the condition is met.
 Verifying the CacheAfter stream completion, cache metrics can be retrieved from the usage object.
final_msg = await stream.get_final_message()
usage = final_msg.usage
if usage.cache_read_input_tokens:
    logger.info("prompt cache HIT: %d tokens read from cache", usage.cache_read_input_tokens)
elif usage.cache_creation_input_tokens:
    logger.info("prompt cache MISS (created): %d tokens written", usage.cache_creation_input_tokens)
Actual log output:
INFO: prompt cache MISS (created): 1587 tokens written   # 1st request
INFO: prompt cache HIT: 1587 tokens read from cache       # 2nd request onward (within 5 minutes)
Since the cache is shared per account/region/model, cache hits occur within 5 minutes even for different users' requests. No session management or cache key design is required.
 Other BenefitsHere are additional benefits discovered after migration.
 API ConsistencyAnthropic's official documentation and SDK reference can be used as-is. boto3's Converse API is an AWS-proprietary interface, and parameter names and response formats differ subtly from the Anthropic API (maxTokens vs max_tokens, toolConfig vs tools, etc.). This eliminates the need to mentally translate to the Converse API when referencing documentation.
 Type SafetyIDE completion and Pyright type checking work for things like response.content[0].text. With boto3, responses are dict[str, Any], so typos in key names can't be discovered until runtime.
 Readiness for Future FeaturesWhen Anthropic releases new features, they become immediately available simply by updating the anthropic package. With boto3's Converse API, you need to wait for AWS to add support, and depending on the feature, this can result in a lag of weeks to months. Extended thinking and future features to be added can also be handled this way.
 Caveats boto3 Is Not Entirely Unnecessaryanthropic[bedrock] only covers LLM calls (Messages API). boto3 is still needed for other AWS services such as S3, Knowledge Base, and Cost Explorer. In this project, boto3 remains in the dependencies.
 Message Format DifferencesIf existing code constructs messages in Bedrock Converse format ([{"role": "user", "content": [{"text": "..."}]}]), conversion to Anthropic SDK format ([{"role": "user", "content": "..."}]) is necessary. In this project, rather than modifying the upstream route handlers, a thin conversion helper was added to the service layer.
def _convert_messages(bedrock_messages: list[dict]) -> list[dict]:
    result = []
    for msg in bedrock_messages:
        content = msg["content"]
        if len(content) == 1 and "text" in content[0]:
            result.append({"role": msg["role"], "content": content[0]["text"]})
        else:
            result.append({
                "role": msg["role"],
                "content": [{"type": "text", "text": b["text"]} for b in content],
            })
    return result
 Model ID PrefixesModel IDs on Bedrock may include a region prefix (e.g., ap-northeast-1.anthropic.claude-sonnet-4-5-20250514-v1:0). The AnthropicBedrock client accepts this format as-is, so no configuration changes were needed.
 SummaryThe migration from boto3 to the Anthropic SDK was completed in about half a day, partly because LLM calls were consolidated in a single file. The results:
Code volume: 474 lines → 265 lines deleted, 430 lines added (net ~30% reduction)
Async support: asyncio.to_thread + Queue → native async/await
Type safety: dict[str, Any] → typed objects
Schema compatibility: _resolve_schema_refs() workaround → unnecessary (native $defs/$ref support)
Latest features: Prompt caching can be enabled with one added line
The barrier to migration is low and the benefits are large. For projects using Claude on Bedrock, I think it's worth considering.
Benefits gained from migrating Claude invocations on Bedrock from boto3 to the Anthropic SDK

Introduction

Prerequisites & Environment

Why I Migrated

1. Complexity of Async Support

2. Responses Are Untyped Dictionaries

3. No Support for JSON Schema `$defs/$ref`

4. Inability to Use Anthropic's Latest Features

The Actual Migration

Installation

Client Initialization

Non-Streaming Calls

Tool Use

Streaming

Benefits Gained After Migration: Prompt Caching

What Is Prompt Caching

Implementation

Verifying the Cache

Other Benefits

API Consistency

Type Safety

Readiness for Future Features

Caveats

boto3 Is Not Entirely Unnecessary

Message Format Differences

Model ID Prefixes

Summary

Claudeならクラスメソッドにお任せください

AWS Topics

Trending Topics

Products & Services

Features and Series

Item	Value
Python	3.13
Framework	FastAPI (async)
Before migration	`boto3` (`converse` / `converse_stream` API)
After migration	`anthropic[bedrock]` (Messages API)
Model	Claude Sonnet 4.5 / Claude Haiku
Authentication	EC2 instance role (IAM)
Region	ap-northeast-1 (Tokyo)