
Bedrock Converse API: Tracking Down "No Response from the 10th Call" — The Pitfall When Using Extended Thinking × Tool Use
This page has been translated by machine translation. View original
TL;DR — Notes on Using Extended Thinking × Tool Use
A summary for those who aren't interested in the investigation process and just want the conclusion.
Problem: When using Extended Thinking (reasoningContent) together with tool use in the Bedrock Converse API, a ValidationException occurs when reasoningContent blocks become consecutive due to manipulation of the conversation history.
Cause: reasoningContent blocks are given a cryptographic signature (signature). This signature proves the authenticity of the block (that it was generated by Claude) and is not a hash of the text content. In addition to verifying signature authenticity, the API structurally verifies whether the consecutive pattern of reasoning blocks matches the model's original output. When toolUse/toolResult blocks are excluded from the conversation history, reasoningContent blocks that were originally non-consecutive become adjacent, creating a consecutive pattern that didn't exist in the original output, causing structural validation to fail.
Model output: [reasoning_A, toolUse, reasoning_B, text]
↓ toolUse excluded
After filter: [reasoning_A, reasoning_B, text]
^^^^^^^^^^^^^^^^^^^^^^^^
Consecutive pattern not present in model output → ValidationException
Countermeasures:
- If
reasoningContentblocks become consecutive after filtering, combine the text into a single block (since the signature does not verify text content, retaining either signature is sufficient) - Messages whose
contentbecomes empty after filtering should be excluded from the conversation history (to prevent cascade failures)
Reference: Anthropic Official Documentation — Extended thinking
the entire sequence of consecutive thinking blocks must match the outputs generated by the model during the original request
Introduction
"After asking about 10 questions to the chat assistant, it stopped responding from the 11th question onward. No error is displayed."
When I received this report, my first hypothesis was "context window exceeded." I assumed the token limit had been reached after 10 exchanges, causing an input-too-long error.
As it turned out, this hypothesis was wrong.
The actual cause was a message structure constraint violation in the Bedrock Converse API, and reaching that conclusion required an investigation spanning multiple data sources: CloudWatch, DynamoDB, Bedrock Model Invocation Logging, and direct API calls. This article walks through that investigation process.
Organizing the Symptoms
The application where the problem occurred is a chat assistant using the Bedrock Converse API, operating with the following configuration:
- Model: Claude Sonnet 4 (Extended Thinking enabled)
- Tool use: Database queries and other tools via Function Calling
- Conversation history: Stored in DynamoDB, with the full history sent to the API on every request
Reported symptoms:
- Normal responses up to about 10 questions
- From the 11th question onward, no error displayed, immediately ready to send the next question
- No application crash or error screen
The "no error displayed" aspect was troublesome.
Chapter 1: Discovering the Silent Error
Nothing in CloudWatch
First, I checked CloudWatch Logs. I searched for logs at the relevant time in the application's log group, but found no logs at WARN (level 40) or above.
fields @timestamp, @message
| filter level >= 40
| sort @timestamp desc
| limit 100
Result: 0 entries. All logs were INFO (level 30) only.
There Was a Clue in DynamoDB
Next, I checked the chat history table in DynamoDB. This application stores chat messages encoded as gzip-compressed Base64. After decoding:
[
{ "type": "note", "key": "InternalServerError" },
{ "type": "note", "key": "InternalServerError" },
{ "type": "note", "key": "InternalServerError" }
]
Errors were being saved to DynamoDB, but were not being sent to the client.
Reviewing the code revealed a problem with error handling during streaming response processing. Although error information was being saved to the DB, both the SSE transmission to the client and the log output were missing.
This is a pitfall specific to streaming processing. With normal request-response handling, you can simply return an error via HTTP status code, but errors that occur during SSE streaming happen "after the response has already started being returned," requiring dedicated processing to notify of the error. This notification processing had been omitted, causing the catch block to swallow the error without reaching the outer error handler.

Lessons Learned at This Point
- No visible error ≠ No error occurred: The existence of the error could only be confirmed by directly checking the data in the persistence layer
- Silent catch blocks are dangerous: If you catch an error, you must always both log it and notify the user
Chapter 2: The True Nature of ValidationException
When I found InternalServerError in DynamoDB, I was still suspecting "context window exceeded." However, estimating the token count for the stored messages revealed they were using only about 15% of the 200K token limit. I needed to look for a different cause.
| Item | Size |
|---|---|
| System prompt | Approx. 10,000 characters |
| Total text content | Approx. 26,000 characters |
| Total reasoning text | Approx. 7,000 characters |
| Total reasoning signatures | Approx. 19,000 characters |
The total was approximately 62,000 characters (≈ 20,000–30,000 tokens). The model in use was Claude Sonnet 4.6 (context window 200K tokens), and it turned out we were using only about 15% of the limit.
The application logs contained no error details whatsoever, and only the code InternalServerError was stored in DynamoDB. To confirm the true nature of the error, I enabled Bedrock Model Invocation Logging.
Enabling Bedrock Model Invocation Logging
- Create an IAM role: A service role for Bedrock to write logs
- Create a CloudWatch Logs log group: Retention period of 1 day (for temporary debugging)
- Enable in Bedrock settings: Amazon Bedrock → Settings → Model invocation logging
As a note on costs, CloudWatch ingestion costs $0.76/GB, but for temporary debugging in a staging environment, this is negligible. I chose CloudWatch this time for the advantage of being able to query immediately with Logs Insights.
Identified Error Type
After reproducing the issue in the staging environment and checking the Invocation Log:
{
"operation": "ConverseStream",
"modelId": "jp.anthropic.claude-sonnet-4-6",
"errorCode": "ValidationException"
}
It was ValidationException, not InternalServerError. The InternalServerError was a code assigned by the application's catch block; the actual error type returned by the Bedrock API was ValidationException, indicating a request structure constraint violation.
Unfortunately, Bedrock Model Invocation Logging does not record the request body or error message details when an error occurs. However, since the body of the immediately preceding successful request is fully recorded, I proceeded with the investigation using this as a clue.
Chapter 3: Root Cause — Consecutive Reasoning Blocks
Conversation History Filter Processing
First, some background. When sending conversation history to the Bedrock API, this application was selecting only the content types needed by the API (text, image, attachment, reasoning) using an allowlist approach.
This allowlist was designed when the application didn't yet have tool use functionality. At that time, only text and attachment existed, and the allowlist was sufficient. When reasoning (Extended Thinking) and image were added later, they were added to the list, but tool blocks introduced afterward were left without being added to the allowlist.

tool blocks are UI display metadata that retain tool execution state (tool name, parameters, results) within the application, and there's normally no problem with them not being included in the allowlist since they don't need to be sent to the Bedrock API.
However, this design had an unexpected side effect.
Differences Between Successful Requests and Stored Data
By cross-referencing the successful request (Invocation Log) with the stored messages (DynamoDB), I made a decisive discovery.
The allowlist-based filter works without issue in most cases. However, when the model "reconsiders" between tool calls — for example, when reasoning occurs before calling another tool after seeing the tool execution result — tool blocks can become the only separator between reasoning blocks:

Verifying the Hypothesis with AWS CLI
The analysis so far formed the hypothesis that "consecutive reasoning blocks cause an error." However, the error details couldn't be obtained from Bedrock Model Invocation Logging. To confirm the hypothesis, I directly sent test payloads to the Bedrock Converse API using the AWS CLI.
The base for the tests was the payload from the last successful request obtained from Invocation Logging. Since this payload contains actual signed reasoning blocks, it can accurately verify API constraints.
aws bedrock-runtime converse \
--region ap-northeast-1 \
--model-id jp.anthropic.claude-sonnet-4-6 \
--cli-input-json file://test-payload.json
Four tests were conducted, yielding the following results:
| Test | Payload Content | Result |
|---|---|---|
| Baseline | Successful request as-is (tool blocks excluded, reasoning non-consecutive) | Success |
| Test 1 | Make reasoning blocks consecutive (delete text in between) | ValidationException |
| Test 2 | Make reasoning consecutive in a past assistant message (not the latest) | ValidationException |
| Test 3 | Set assistant message content to empty array |
ValidationException |
Test 1 result:
An error occurred (ValidationException) when calling the Converse operation:
The model returned the following errors:
messages.1.content.1: `thinking` or `redacted_thinking` blocks in the
latest assistant message cannot be modified. These blocks must remain
as they were in the original response.
Comparing the baseline and Test 1 confirmed that the shift in block position (index) itself is not a problem — an error only occurs when reasoning blocks are adjacent.
Test 2 results showed that the API validates not just the latest assistant message, but all assistant messages in the conversation history. This means that once a message with consecutive reasoning blocks exists in the history, all subsequent requests will fail.
The True Nature of the Signature — Clarifying the API Validation Mechanism Through Experimentation
From the error message These blocks must remain as they were in the original response, it's clear that signature-based validation is involved. However, what exactly the signature validates is not clear from the documentation alone.
The Anthropic official documentation states:
the entire sequence of consecutive thinking blocks must match the outputs generated by the model during the original request; you cannot rearrange or modify the sequence of these blocks
To determine what this "sequence" refers to — whether it's the text content or the block structure — I conducted additional experiments to identify the validation target of the signature.
What Does the Signature Validate?
Using the earlier test payload (two consecutive reasoning blocks [reasoning_A, reasoning_B, text]), I conducted four additional tests related to signatures:
| Test | Operation | Result |
|---|---|---|
| Test 4 | Rewrite reasoning_A's text to completely different content, keep signature as-is | Success |
| Test 5 | Swap signatures of reasoning_A and reasoning_B (keep text as-is) | Success |
| Test 6 | Combine text from both reasonings into one block, use either signature | Success |
| Test 7 | Use a completely forged signature string | ValidationException |
Test 7 error message:
messages.1.content.0: Invalid `signature` in `thinking` block
These results clearly clarified the role of the signature:
1. The signature is not a hash of the text content
Test 4 succeeded with the text completely rewritten, and Test 5 succeeded with signatures swapped. The signature is not tied to the content of reasoningText.
2. The signature is proof of authenticity that "Claude generated this"
Only the forged signature in Test 7 failed. The role of the signature is to prove that the block was generated by the Claude API (an authenticity proof). Conceptually, it's similar to JWT (JSON Web Token) — signed with the server's private key and verified with the same key.
3. The API is stateless — the signature encapsulates the "state"
LLM APIs are inherently stateless. Without storing conversation history on the server side, how can it verify "whether it matches the original output"? The answer is that the signature itself contains the information needed for verification. It's the same mechanism by which JWT can verify token authenticity without a server-side session store.
4. Consecutive pattern validation is a separate structural check from the signature
The signature verifies "whether it's a block generated by Claude," while consecutive pattern validation verifies "whether it matches the model's original output structure." These are two separate layers of validation:
- Signature validation (confirmed in Test 7): Whether the block was generated by the Claude API
- Structural validation (confirmed in Test 1): Whether the consecutive pattern of reasoning blocks matches the model's output
What the Test Results Mean
This finding directly impacts the countermeasure. Since the signature does not verify text content, it is possible to combine the text of consecutive reasoning blocks into a single block (confirmed in Test 6). This is a better countermeasure than simply deleting blocks, since it resolves the consecutive issue without losing the model's reasoning context.
Note that the same documentation permits omitting entire thinking blocks from previous turns (except when using tools). The problem is "creating a consecutive pattern that doesn't exist in the original output."
Test 3 result:
An error occurred (ValidationException) when calling the Converse operation:
The content field in the Message object at messages.1 is empty.
Add a ContentBlock object to the content field and try again.
This also confirmed the cascade failure mechanism (details in Chapter 4).

Why It Occurs at a Specific Number of Exchanges
This problem does not occur in every exchange. The trigger is the pattern where only reasoning blocks are inserted between tool calls.
For example, when a tool call fails and the model retries:
- reasoning (thinking about the query) → text → tool (executed, failed)
- reasoning (thinking about the fix) → tool (re-executed, failed)
- reasoning (thinking about further fixes) → tool (succeeded) → text (explaining results)
In step 2, since there is no text block between the reasoning and tool, excluding the tool results in consecutive reasoning.
In early exchanges, text blocks often exist before and after tool blocks, so reasoning doesn't become consecutive after filtering. As the number of exchanges increases, tool retries and complex calls occur more frequently, increasing the probability of this pattern appearing. This is why the issue reproduces "around the 10th exchange."
Chapter 4: Cascade Failure — Once It Breaks, It Stays Broken Permanently
In addition to the root cause, a cascade failure occurs where once a failure happens, all subsequent requests fail permanently. This made the problem even more serious.
This failure pattern is not limited to this case — it can occur in any chat application that persists conversation history and resends it each time. If an incomplete assistant message is saved on error, that broken message continues to be included in all subsequent requests.
The mechanism in this case:
- On initial failure, the assistant message is saved in an incomplete state (without a valid content block)
- On the next request, all blocks in this message are excluded by the filter, sending an assistant message with empty content to Bedrock
- Empty content also results in
ValidationException→ permanent failure loop
// After filtering, content becomes empty
{
"role": "assistant",
"content": [] // Bedrock API constraint violation
}
As confirmed in Test 3 of the previous chapter, an empty content array also returns a ValidationException.
In other words, even if the root cause (consecutive reasoning blocks) is fixed, chats that failed in the past remain permanently broken. Without also addressing the handling to skip empty-content messages, already-broken chats cannot be recovered.

Fixes and Countermeasures
Fix ①: Resolving Silent Errors
Added log output and client notification to the error handling during streaming processing. Errors that occur during SSE streaming need to be notified through a different path than normal HTTP error responses, making this an easy point to overlook.
Fix ②: Resolving the Root Cause
The following two points need to be addressed in the filter processing when sending conversation history to the Bedrock API.
1. Resolving consecutive reasoning blocks
When reasoning blocks become consecutive after excluding tool blocks, the consecutive reasoning blocks should be combined into one. As noted earlier, since the signature does not verify text content, combining the text and retaining either signature will pass validation. This allows the consecutive pattern to be resolved while preserving the model's reasoning context.
// Example of resolving consecutive reasoning after excluding tool blocks
function sanitizeContentBlocks(blocks: ContentBlock[]): ContentBlock[] {
const filtered = blocks.filter(b => b.type !== 'toolUse' && b.type !== 'toolResult');
// Combine consecutive reasoning blocks' text into a single block
const result: ContentBlock[] = [];
for (const block of filtered) {
const prev = result[result.length - 1];
if (prev?.type === 'reasoning' && block.type === 'reasoning') {
prev.reasoningText += '\n\n' + block.reasoningText;
// Since the signature is not tied to text content, the first block's signature is retained as-is
} else {
result.push({ ...block });
}
}
return result;
}
2. Skipping empty-content messages
Messages whose content becomes empty after filtering should be excluded from the conversation history. This prevents cascade failures in chats that failed in the past.
// Example of skipping messages with empty content
const messages = history
.map(msg => ({ ...msg, content: sanitizeContentBlocks(msg.content) }))
.filter(msg => msg.content.length > 0);
Reflecting on the Investigation Process
Here is a summary of the methods used in this investigation and the effectiveness of each.
| Method | What It Revealed | Limitations |
|---|---|---|
| CloudWatch Logs | The fact that no logs were being output was itself a clue | No direct information since errors were caught and not logged |
| DynamoDB | Error code, overall message structure | Detailed error messages were not stored |
| Bedrock Model Invocation Logging | Actual error type (ValidationException), successful request payload |
Request body and error message details on failure are not recorded |
| Cross-referencing DynamoDB × code × Invocation Log | Identifying the root cause | — |
| Direct API calls with AWS CLI (7 patterns) | Exact wording of error messages, confirming hypotheses, that all messages are validated, identifying what the signature validates | — |

The most effective approach was "cross-referencing multiple data sources." A single log source did not provide the full picture; the cause was only identifiable by combining DynamoDB stored data × Invocation Log success payload × static code analysis.
Summary
For Bedrock Converse API Users
- When combining Extended Thinking with tool use, be careful to ensure that reasoningContent blocks don't become consecutive when reconstructing conversation history
- When filtering specific content types from conversation history, verify that the block order after filtering satisfies API constraints
- The
signatureon reasoning blocks is not a hash of the text content, but a proof of authenticity that Claude generated it. Text combining or rewriting is permissible, but forged signatures or consecutive patterns that don't exist in the original output are rejected
As a Debugging Approach
- Suspect silent errors: Even when users don't see an error, error information may remain in the persistence layer
- Reject hypotheses quickly: Rather than being anchored to the assumption of "context window exceeded," I should have measured the actual token count and dismissed it early
- Leverage Bedrock Model Invocation Logging: The most direct means of confirming the reality of API calls. Since it can be temporarily enabled and then immediately disabled, it should be actively used during debugging
- Cross-reference multiple data sources: When a single log source doesn't give the full picture, analyze across stored data, application logs, and service logs
