Using Structured Outputs with Amazon Bedrock batch inference as well, I tried to stabilize AI output

I'd like to introduce a practical record of using Amazon Bedrock's "Batch Inference" and "Structured Outputs" together to safely generate article summaries in bulk with type safety. We achieved a 50% cost reduction benefit from batch inference while avoiding errors from traditional regular expression parsing.

suzuki.ryo

2026.02.16

This page has been translated by machine translation. View original

Our technical blog "DevelopersIO" is attempting to create summaries of over 60,000 articles using AI. This time, we needed to reprocess a large number of summaries to reflect improvements in our prompts. Before processing the entire volume at once, we first present the results of a pilot implementation targeting approximately 7,000 recent articles.
Previously, we introduced Amazon Bedrock's batch inference and Structured Outputs separately.
https://dev.classmethod.jp/articles/amazon-bedrock-batch-inference-efficiency/
https://dev.classmethod.jp/articles/amazon-bedrock-structured-outputs-json/
This article is a practical record of combining these two technologies.

Processing one by one on demand would take about 50 hours (about 3 seconds/item × 60,000 items), but Batch Inference promises a significant reduction in processing time and a 50% cost reduction. For basic Batch Inference setup, please refer to the first article.
 Challenge: Batch Inference JSON Output Was UnstableIn our first batch inference implementation, we instructed "Please output in JSON format" in the system prompt and parsed the results using regular expressions. This method created a divergence from our production prompt by adding JSON output instructions for batch processing. Additionally, we encountered issues such as JSON being broken by double quotes or line breaks in Japanese text, missing latter parts of multiple fields, and missing closing brackets in long texts.
Along with improving our prompts, we've already migrated the on-demand side to Structured Outputs. If we could process with the same schema on the batch side, we could achieve both unified prompt management and type-safe JSON output.
 Solution: Structured Outputs Worked with Batch InferenceIn conclusion, Structured Outputs worked as-is with Batch Inference. Initially, we assumed "it probably won't work with batch" and proceeded with implementation using the conventional method, but the official documentation clearly states, "Batch inference - Use structured outputs within batch inference without any additional setup."
The following prompts and schema are minimal samples for reproducing functionality. Details of our production prompts remain confidential.
 Sample SchemaWe generate four fields in one request: summaries and details in both Japanese and English.
SCHEMA = {
    "type": "object",
    "properties": {
        "summary_ja": {"type": "string"},
        "summary_en": {"type": "string"},
        "detail_ja": {"type": "string"},
        "detail_en": {"type": "string"}
    },
    "required": ["summary_ja", "summary_en", "detail_ja", "detail_en"],
    "additionalProperties": False
}
Since output of all fields specified in required is guaranteed, missing fields or JSON syntax errors fundamentally cannot occur.
 Only 3 Changes Needed in JSONLFor existing batch inference JSONL, we simply removed the JSON output instruction from the system prompt and added output_config. Result parsing also simplifies to just json.loads().
 {
   "recordId": "article_id",
   "modelInput": {
     "anthropic_version": "bedrock-2023-05-31",
     "max_tokens": 4096,
-    "system": "... Generate a JSON with summary_ja, summary_en ... Output ONLY valid JSON.",
+    "system": "... Generate summaries in both Japanese and English.",
     "messages": [{"role": "user", "content": "..."}],
+    "output_config": {
+      "format": {
+        "type": "json_schema",
+        "schema": { ... }
+      }
+    }
   }
 }
Using the same prompt and schema for both on-demand and batch ensures consistency in quality.
 Schema Format Differences Between APIsWhen using Structured Outputs across multiple APIs, how schemas are passed differs by API.


API
Parameter
schema type
name required


Converse API
outputConfig.textFormat.structure.jsonSchema
JSON string
Yes

InvokeModel API
modelInput.output_config.format
dict object
No

Batch Inference (JSONL)
modelInput.output_config.format
dict object
No

It's practical to centrally manage the schema as a dict and only use json.dumps() for the Converse API.
# Converse API -- stringify schema, name required
outputConfig={"textFormat": {"type": "json_schema", "structure": {
    "jsonSchema": {"name": "my_schema", "schema": json.dumps(SCHEMA)}}}}

# Batch Inference (in JSONL) -- dict as is, no name
"output_config": {"format": {"type": "json_schema", "schema": SCHEMA}}
 When Batch Is Not Suitable: Comparison with Prompt CachingIf the system prompt is large and repeated with the same prompt, on-demand + prompt caching can be more cost-efficient.
Our system includes a tag master CSV (about 3,000 entries, approximately 15,000 tokens) in the system prompt for tag classification processing. For this process, we reduce input costs for the second and subsequent items by 90% using prompt caching.
system=[
    {"text": tag_prompt},
    {"cachePoint": {"type": "default"}}
]
Cost comparison for 1,000 tag classifications (system prompt about 15,000 tokens, article input about 500 tokens, output about 100 tokens):


Method
Input Cost
Output Cost
Total


Batch (50% OFF)
$6.20
$0.20
$6.40

On-demand (no cache)
$12.40
$0.40
$12.80

On-demand (with cache)
$1.60
$0.40
$2.00

*Pricing calculated using Claude 3.5 Haiku (input $0.80/1M, output $4.00/1M)
When a large system prompt is repeatedly used, the 90% OFF from prompt caching exceeds the 50% OFF from batch. However, prompt caching has a TTL constraint (5 minutes), and the cache expires if the interval between requests exceeds 5 minutes, reverting to normal pricing. This assumes a workload that can be processed continuously. Batch inference has no such time constraints, offering the operational convenience of "deploy anytime and forget."
 Decision FlowchartThe minimum requirement for Batch Inference is 1,000 items. For fewer items, it's not suitable as wait time alone exceeds 15 minutes. For 1,000+ items, processing time remains nearly constant due to parallel processing (measured: 1,000 items in 17 minutes, 7,950 items in 21 minutes).
Our system's approach is as follows:


Process
System Prompt
Method
Reason


AI Summary Generation
About 800 tokens
Batch Inference
Small prompt, 50% OFF effective

Article Evaluation
About 600 tokens
Batch Inference
Small prompt, 50% OFF effective

Tag Classification
About 15,000 tokens
On-demand + Cache
Large prompt, 90% OFF with caching is advantageous

 Implementation PointsAdding the us. prefix to model IDs enables Cross-Region Inference Profiles, which helps avoid capacity shortages in specific regions through cross-region load balancing. Additionally, checking for duplicate recordIds, ensuring file size is under 200MB, and confirming record counts are within the 1,000-50,000 range before submission helps prevent failures at the Validating stage.
 Execution Results First Run: Pilot (1,000 items)

Item
Value


Processing Time
17 minutes

Success Rate
973/1,000 (97.3%)

JSON Parsing Success Rate
100% (successful records)

Error Breakdown
Grammar compilation timed out: 26 items, Content filtering: 1 item

No JSON syntax errors occurred in successful records.

The 26 errors showed no reproducibility; based on the phenomenon, they may have been due to temporary infrastructure load or cold start timeouts.
 Second Run: Scale-up (6,284 items)

Item
Value


Processing Time
18 minutes

Success Rate
6,281/6,284 (99.95%)

Errors
3 items

Processing time remained almost the same despite a 6-fold increase in volume. The error rate improved from 2.7% to 0.05%, suggesting some warm-up effect on the infrastructure side.
 Retrying Failed ItemsThe 30 failed items from the first and second runs were individually retried using the on-demand Converse API, resolving all cases. Since the minimum batch requirement is 1,000 items, it was more reasonable to retry a small number of failures immediately using on-demand.
 Summary of Benefits

Aspect
On-demand One by One
Batch + Structured Outputs


Processing Time (1,000 items)
About 50 minutes (3 sec/item)
17 minutes (parallel processing)

Processing Time (7,000 items)
About 6 hours
About 20 minutes (2 batches)

Cost
On-demand pricing 100%
50% reduction

JSON Parsing
Regular expressions + validation
Only json.loads()

Parsing Failure Rate
Could occur at several %
0% (schema guaranteed)

Prompt Management
Separately managed for batch
Completely identical to production

Improvements were achieved in processing time, cost, and quality. In particular, the ability to unify prompt management was a significant gain for future maintainability.
 ConclusionBy combining Batch Inference and Structured Outputs, we achieved both type-safe JSON output and 50% cost reduction simultaneously. Since prompts and schemas can be completely shared between on-demand and batch processing, we avoided the operational burden of "managing separate prompts for batch."
In the future, when bulk update maintenance of existing data becomes necessary due to new model releases or prompt tuning, we plan to utilize Batch Inference and Structured Outputs.

Using Structured Outputs with Amazon Bedrock batch inference as well, I tried to stabilize AI output

Challenge: Batch Inference JSON Output Was Unstable

Solution: Structured Outputs Worked with Batch Inference

Sample Schema

Only 3 Changes Needed in JSONL

Schema Format Differences Between APIs

When Batch Is Not Suitable: Comparison with Prompt Caching

Decision Flowchart

Implementation Points

Execution Results

First Run: Pilot (1,000 items)

Second Run: Scale-up (6,284 items)

Retrying Failed Items

Summary of Benefits

Conclusion

AWS Topics

Trending Topics

Products & Services

Features and Series

API	Parameter	schema type	name required
Converse API	`outputConfig.textFormat.structure.jsonSchema`	JSON string	Yes
InvokeModel API	`modelInput.output_config.format`	dict object	No
Batch Inference (JSONL)	`modelInput.output_config.format`	dict object	No

Method	Input Cost	Output Cost	Total
Batch (50% OFF)	$6.20	$0.20	$6.40
On-demand (no cache)	$12.40	$0.40	$12.80
On-demand (with cache)	$1.60	$0.40	$2.00

Process	System Prompt	Method	Reason
AI Summary Generation	About 800 tokens	Batch Inference	Small prompt, 50% OFF effective
Article Evaluation	About 600 tokens	Batch Inference	Small prompt, 50% OFF effective
Tag Classification	About 15,000 tokens	On-demand + Cache	Large prompt, 90% OFF with caching is advantageous

Item	Value
Processing Time	17 minutes
Success Rate	973/1,000 (97.3%)
JSON Parsing Success Rate	100% (successful records)
Error Breakdown	Grammar compilation timed out: 26 items, Content filtering: 1 item

Item	Value
Processing Time	18 minutes
Success Rate	6,281/6,284 (99.95%)
Errors	3 items

Aspect	On-demand One by One	Batch + Structured Outputs
Processing Time (1,000 items)	About 50 minutes (3 sec/item)	17 minutes (parallel processing)
Processing Time (7,000 items)	About 6 hours	About 20 minutes (2 batches)
Cost	On-demand pricing 100%	50% reduction
JSON Parsing	Regular expressions + validation	Only `json.loads()`
Parsing Failure Rate	Could occur at several %	0% (schema guaranteed)
Prompt Management	Separately managed for batch	Completely identical to production