Benefits gained from migrating Claude invocations on Bedrock from boto3 to the Anthropic SDK

Benefits gained from migrating Claude invocations on Bedrock from boto3 to the Anthropic SDK

A practical account of migrating from the boto3 Converse API to the Anthropic Python SDK's AnthropicBedrock client. This article introduces the benefits along with concrete code diffs, including simplified async support, improved type safety, removal of the $defs/$ref workaround, and enabling prompt caching.
2026.05.29

This page has been translated by machine translation. View original

Introduction

I'm operating a Python backend that calls Claude (Sonnet/Haiku) on Amazon Bedrock. I originally used boto3's converse() / converse_stream() APIs, but upon discovering that the official Anthropic Python SDK provides an AnthropicBedrock client, I proceeded with the migration.

To cut to the chase, the code volume decreased by about 30%, async support became natural, and the latest Anthropic features (such as prompt caching) became immediately available on Bedrock — making this a recommended migration for projects using Claude via Bedrock.

This article introduces what changed in the actual migration and what benefits it brought, along with concrete code diffs.

Prerequisites & Environment

Item Value
Python 3.13
Framework FastAPI (async)
Before migration boto3 (converse / converse_stream API)
After migration anthropic[bedrock] (Messages API)
Model Claude Sonnet 4.5 / Claude Haiku
Authentication EC2 instance role (IAM)
Region ap-northeast-1 (Tokyo)

Why We Migrated

When using the Bedrock Converse API with boto3, we faced the following challenges.

1. Complexity of Async Support

boto3 is a synchronous client. To use it with an async framework like FastAPI, blocking calls had to be wrapped with asyncio.to_thread().

# boto3 era: running sync function in thread pool
def _blocking_call() -> str:
    response = _bedrock_client.converse(
        modelId=settings.bedrock_haiku_model_id,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        system=[{"text": system}],
        inferenceConfig={"maxTokens": 100, "temperature": 0.0},
    )
    return response["output"]["message"]["content"][0]["text"].strip()

return await asyncio.to_thread(_blocking_call)

For streaming in particular, it was even more complex, requiring a cross-thread bridge using asyncio.Queue and loop.call_soon_threadsafe().

# boto3 era: cross-thread bridge for streaming
loop = asyncio.get_running_loop()
queue: asyncio.Queue = asyncio.Queue()

def _blocking_stream() -> None:
    response = _bedrock_client.converse_stream(...)
    for event in response["stream"]:
        chunk = event["contentBlockDelta"].get("delta", {}).get("toolUse", {}).get("input", "")
        loop.call_soon_threadsafe(queue.put_nowait, chunk)
    loop.call_soon_threadsafe(queue.put_nowait, None)  # sentinel

task = asyncio.create_task(asyncio.to_thread(_blocking_stream))
while True:
    item = await queue.get()
    if item is None:
        break
    yield item

2. Responses as Untyped Dictionaries

boto3 responses are plain Python dictionaries. IDE completion didn't work, and defensive access was required.

# boto3: dictionary access — typos are hard to catch
for block in response["output"]["message"]["content"]:
    if block.get("toolUse", {}).get("name") == "classify_problem":
        return ProblemClassification(**block["toolUse"]["input"])

3. No Support for JSON Schema $defs/$ref

The Bedrock Converse API's toolConfig.inputSchema does not support JSON Schema's $defs/$ref. Since Pydantic model's model_json_schema() uses $ref for nested models, a workaround function to expand $ref before sending was needed.

# boto3 era: helper for manually inlining $defs/$ref expansions
def _resolve_schema_refs(schema: dict) -> dict:
    defs = schema.pop("$defs", {})

    def _resolve(obj: Any) -> Any:
        if isinstance(obj, dict):
            if "$ref" in obj:
                ref_name = obj["$ref"].split("/")[-1]
                return _resolve(dict(defs[ref_name]))
            return {k: _resolve(v) for k, v in obj.items()}
        if isinstance(obj, list):
            return [_resolve(item) for item in obj]
        return obj

    return _resolve(schema)

# Call sites (3 locations)
tool_schema = _resolve_schema_refs(ChatToolInput.model_json_schema())

4. Inability to Use Latest Anthropic Features

boto3's Converse API is an AWS-proprietary interface, meaning even when Anthropic releases new features, they can't be used until AWS supports them in the Converse API. Features like prompt caching and extended thinking fall into this category.

The Actual Migration

Installation

uv add "anthropic[bedrock]>=0.104.0"

Since the [bedrock] extra includes boto3 signing processing, there's no longer a need to directly import boto3 for LLM calls. However, boto3 is still needed if you're using other AWS services like S3 or Knowledge Base.

Client Initialization

# Before
import boto3
_bedrock_client = boto3.client("bedrock-runtime", region_name=settings.aws_region)

# After
from anthropic.lib.bedrock import AsyncAnthropicBedrock
_client = AsyncAnthropicBedrock(aws_region=settings.aws_region)

The default authentication chain via EC2 instance role works as-is. You can also explicitly pass aws_access_key/aws_secret_key, but this is unnecessary if you're using an IAM role.

Non-Streaming Calls

The changes were surprisingly simple.

# Before (boto3)
def _blocking_call() -> str:
    response = _bedrock_client.converse(
        modelId=settings.bedrock_haiku_model_id,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        system=[{"text": system}],
        inferenceConfig={"maxTokens": 100, "temperature": 0.0},
    )
    return response["output"]["message"]["content"][0]["text"].strip()

return await asyncio.to_thread(_blocking_call)

# After (Anthropic SDK)
response = await _client.messages.create(
    model=settings.bedrock_haiku_model_id,
    messages=[{"role": "user", "content": prompt}],
    system=system,
    max_tokens=100,
    temperature=0.0,
)
block = response.content[0]
return block.text.strip() if block.type == "text" else ""

What changed:

  • asyncio.to_thread() + inner function is no longer needed (AsyncAnthropicBedrock is natively async)
  • The messages format is simpler (no need for [{"text": "..."}] wrapping)
  • system accepts a string directly (no [{"text": "..."}] needed)
  • Response is a typed object (block.text, block.type have completion support)

Tool Use

The Tool Use definition also became flatter.

# Before (boto3 Converse API)
toolConfig={
    "tools": [
        {
            "toolSpec": {
                "name": "classify_problem",
                "description": "...",
                "inputSchema": {"json": _resolve_schema_refs(tool_schema)},
            }
        }
    ],
    "toolChoice": {"tool": {"name": "classify_problem"}},
}

# After (Anthropic SDK)
tools=[
    {
        "name": "classify_problem",
        "description": "...",
        "input_schema": tool_schema,  # $defs/$ref can be used as-is
    }
],
tool_choice={"type": "tool", "name": "classify_problem"},

Since the Messages API natively supports $defs/$ref, we could completely remove the _resolve_schema_refs() helper (20 lines) and its 3 call sites.

Response retrieval also becomes typed.

# Before
for block in response["output"]["message"]["content"]:
    if block.get("toolUse", {}).get("name") == "classify_problem":
        return ProblemClassification(**block["toolUse"]["input"])

# After
for block in response.content:
    if block.type == "tool_use" and block.name == "classify_problem":
        return ProblemClassification(**block.input)

Streaming

Streaming saw the most dramatic improvement. The cross-thread bridge with asyncio.Queue, call_soon_threadsafe, sentinel values, and asyncio.create_task(asyncio.to_thread(...)) was replaced by async with ... stream().

# Before (boto3): 60+ lines of cross-thread bridge
loop = asyncio.get_running_loop()
queue: asyncio.Queue = asyncio.Queue()

def _blocking_stream() -> None:
    try:
        response = _bedrock_client.converse_stream(...)
        for event in response["stream"]:
            if "contentBlockDelta" not in event:
                continue
            chunk = event["contentBlockDelta"].get("delta", {}).get("toolUse", {}).get("input", "")
            # ...processing...
            loop.call_soon_threadsafe(queue.put_nowait, token)
    except Exception as exc:
        loop.call_soon_threadsafe(queue.put_nowait, exc)
    finally:
        loop.call_soon_threadsafe(queue.put_nowait, None)

task = asyncio.create_task(asyncio.to_thread(_blocking_stream))
try:
    while True:
        item = await queue.get()
        if item is None:
            break
        if isinstance(item, BaseException):
            raise item
        yield item
finally:
    await task

# After (Anthropic SDK): natively async stream
async with _client.messages.stream(
    model=settings.bedrock_model_id,
    messages=sdk_messages,
    system=CHAT_SYSTEM_PROMPT,
    max_tokens=8192,
    temperature=0.3,
    tools=[...],
    tool_choice={"type": "tool", "name": "submit_response"},
) as stream:
    async for event in stream:
        if event.type != "input_json":
            continue
        chunk = event.partial_json
        # ...same state machine for token processing...
        yield token

The state machine (token extraction for general_advice and JSON parsing of TicketResult) uses exactly the same logic as the boto3 era, but the infrastructure code surrounding it was dramatically reduced.

Benefits Gained After Migration: Prompt Caching

As a concrete example of migration benefits, let me introduce prompt caching.

What Is Prompt Caching

Anthropic's prompt caching is a feature that caches common prefixes shared across requests (system prompts, tool definitions, etc.) to speed up processing from the second request onward. Cached tokens have input costs reduced by 90%, and TTFT (time to first token) is also shortened.

This feature is only supported with the Anthropic Messages API and is not available with boto3's Converse API. This was one of the deciding factors for migration.

Implementation

Enabling prompt caching in the migrated code requires only a single line change.

tools=[
    {
        "name": "submit_response",
        "description": "Submit the structured helpdesk response",
        "input_schema": tool_schema,
        "cache_control": {"type": "ephemeral"},  # Add this one line
    }
],

By placing cache_control on the last element of the tool definition, the entire static prefix from the system prompt through the tool definitions is cached. Since prompt caching works on a prefix basis, all tokens before the position of cache_control become cache targets.

For Sonnet, a minimum prefix of 1,024 tokens is required. This project's system prompt (approximately 1,000+ tokens) combined with the tool definitions (approximately 300–500 tokens) totals around 1,500+ tokens, satisfying the requirement.

Verifying the Cache

Cache metrics can be retrieved from the usage object after stream completion.

final_msg = await stream.get_final_message()
usage = final_msg.usage
if usage.cache_read_input_tokens:
    logger.info("prompt cache HIT: %d tokens read from cache", usage.cache_read_input_tokens)
elif usage.cache_creation_input_tokens:
    logger.info("prompt cache MISS (created): %d tokens written", usage.cache_creation_input_tokens)

Actual log output:

INFO: prompt cache MISS (created): 1587 tokens written   # 1st request
INFO: prompt cache HIT: 1587 tokens read from cache       # 2nd request onward (within 5 minutes)

Since the cache is shared per account/region/model, requests from different users will also get cache hits within 5 minutes. No session management or cache key design is required.

Other Benefits

Here are additional benefits discovered after migration.

API Consistency

Anthropic's official documentation and SDK reference can be used as-is. boto3's Converse API is an AWS-proprietary interface, and parameter names and response formats differ subtly from the Anthropic API (e.g., maxTokens vs max_tokens, toolConfig vs tools). This eliminates the need to translate to the Converse API format when referencing documentation.

Type Safety

IDE completion and Pyright type checking work for things like response.content[0].text. With boto3, responses are dict[str, Any], so typos in key names can't be caught until runtime.

Readiness for Future Feature Additions

When Anthropic releases new features, they become immediately available simply by updating the anthropic package. With boto3's Converse API, you need to wait for AWS to support them, and depending on the feature, this can introduce a lag of weeks to months. Extended thinking and any future features added going forward can also be adopted this way.

Caveats

boto3 Is Not Entirely Unnecessary

anthropic[bedrock] only covers LLM calls (Messages API). boto3 is still required for other AWS services like S3, Knowledge Base, and Cost Explorer. boto3 remains a dependency in this project as well.

Message Format Differences

If existing code constructs messages in Bedrock Converse format ([{"role": "user", "content": [{"text": "..."}]}]), conversion to Anthropic SDK format ([{"role": "user", "content": "..."}]) is required. In this project, rather than modifying the calling route handlers, a thin conversion helper was added to the service layer.

def _convert_messages(bedrock_messages: list[dict]) -> list[dict]:
    result = []
    for msg in bedrock_messages:
        content = msg["content"]
        if len(content) == 1 and "text" in content[0]:
            result.append({"role": msg["role"], "content": content[0]["text"]})
        else:
            result.append({
                "role": msg["role"],
                "content": [{"type": "text", "text": b["text"]} for b in content],
            })
    return result

Model ID Prefixes

Model IDs on Bedrock may include a region prefix (e.g., ap-northeast-1.anthropic.claude-sonnet-4-5-20250514-v1:0). The AnthropicBedrock client accepts this format as-is, so no configuration changes were needed.

Summary

The migration from boto3 to the Anthropic SDK was completed in about half a day, partly because LLM calls were consolidated into a single file. The results:

  • Code volume: 474 lines → 265 lines deleted, 430 lines added (net reduction of approximately 30%)
  • Async support: asyncio.to_thread + Queue → native async/await
  • Type safety: dict[str, Any] → typed objects
  • Schema compatibility: _resolve_schema_refs() workaround → unnecessary ($defs/$ref natively supported)
  • Latest features: Prompt caching can be enabled with a single line addition

The barrier to migration is low and the benefits are significant. For projects using Claude on Bedrock, I think it's worth considering.


生成AI活用はクラスメソッドにお任せ

過去に支援してきた生成AIの支援実績100+を元にホワイトペーパーを作成しました。御社が抱えている課題のうち、どれが解決できて、どのようなサービスが受けられるのか?4つのフェーズに分けてまとめています。どうぞお気軽にご覧ください。

生成AI資料イメージ

無料でダウンロードする

Share this article

AWSのお困り事はクラスメソッドへ