Optimizing Summarization of 14,000 Articles with Amazon Bedrock Batch Inference

Generated 14,000 summary texts from technical blog articles. By implementing Amazon Bedrock batch inference, we reduced costs by 50% and improved processing efficiency 75-fold.
suzuki.ryo
2026.01.25
This page has been translated by machine translation. View original
I had an opportunity to use LLMs for creating summary texts for about 60,000 technical blog articles, which would be used for indexing and SEO purposes.
Initially, I was using Bedrock on-demand, but cost and time issues became apparent.
I tried switching to Amazon Bedrock batch inference midway through the project, which reduced processing costs by 50%. I'll share how I improved processing efficiency and throughput.
 Why I Chose Batch InferenceWith on-demand execution, we faced high costs (about $40 per 10,000 articles) and long processing times (about 11 articles/minute → approx. 15 hours per 10,000 articles). Since real-time processing wasn't necessary, we considered switching to batch inference.
Amazon Bedrock's batch inference has these features:
Cost: 50% of on-demand
Processing time: Completes within 24 hours (auto-scales)
It's ideal for processing large amounts of data when real-time response isn't required.
For this project, I used Claude Haiku 4.5 (released October 2025) in the us-west-2 (Oregon) region.
 Preparation Data PreparationI processed article records obtained from CMS and extracted the necessary information for generating summaries.
Cleaning process:
Removing HTML tags (using BeautifulSoup)
Removing code blocks (using regex)
Removing image links
Normalizing Markdown syntax
 AWS Environment Setup 1. Creating S3 Bucketaws s3 mb s3://my-bedrock-batch-bucket --region us-west-2
Bucket structure：
s3://my-bedrock-batch-bucket/
├── bedrock-batch/input/   # input JSONL
└── bedrock-batch/output/  # output results
 2. Creating IAM RoleTrust policy (trust-policy.json):
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "bedrock.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
Permission policy (permissions-policy.json):
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bedrock-batch-bucket/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "*"
    }
  ]
}
Role creation commands:
# Create role
aws iam create-role \
  --role-name BedrockBatchInferenceRole \
  --assume-role-policy-document file://trust-policy.json

# Attach permissions
aws iam put-role-policy \
  --role-name BedrockBatchInferenceRole \
  --policy-name BedrockBatchS3Access \
  --policy-document file://permissions-policy.json
 Common Mistake: Insufficient IAM PermissionsWith batch inference, permission errors are only discovered 10-15 minutes after execution begins. Especially when using cross-region inference, resource specification becomes complex because it includes the inference profile ARN.
In this case, I set the resource for bedrock:InvokeModel to "*" to avoid errors caused by insufficient permissions.
If you need to narrow down resources, pay attention to:
Model ARN (arn:aws:bedrock:*::foundation-model/*)
Inference Profile ARN (arn:aws:bedrock:*:ACCOUNT_ID:inference-profile/*)
For Cross-Region Inference Profiles, region and account ID specifications can be complex
When using the minimum privilege principle, I recommend running a test with the minimum batch inference billing unit (1,000 records).
 Batch Input JSONL FormatThe input for batch inference is in JSONL (JSON Lines) format. Each line contains one JSON object.
Format example:
{
  "recordId": "wp-post-12345",
  "modelInput": {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "temperature": 0.1,
    "system": "You are a technical blog summary expert. Generate summaries in Japanese and English from articles, and output in JSON format.",
    "messages": [
      {
        "role": "user",
        "content": "Title: How to use Amazon Bedrock\n\nArticle:\nMain text content..."
      }
    ]
  }
}
JSONL creation script (simplified):
import json
import boto3
from bs4 import BeautifulSoup
import re

def clean_text(text):
    """Remove HTML tags and Markdown, format for batch input"""
    soup = BeautifulSoup(text, 'html.parser')
    text = soup.get_text()
    text = re.sub(r'```[\s\S]*?```', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text[:50000]  # Limit to 50,000 characters per record

# Load article data
articles = load_articles()  # Article data obtained from CMS

# Create JSONL
with open('batch_input.jsonl', 'w', encoding='utf-8') as f:
    for article in articles:
        record = {
            "recordId": article['id'],
            "modelInput": {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 4096,
                "temperature": 0.1,
                "system": "You are a technical blog summary expert. Generate summaries in Japanese and English from articles, and output in JSON format.",
                "messages": [{
                    "role": "user",
                    "content": f"Title: {article['title']}\n\nArticle:\n{clean_text(article['content'])}"
                }]
            }
        }
        f.write(json.dumps(record, ensure_ascii=False) + '\n')

print(f"✓ JSONL creation completed: {len(articles)} records")
 Adjusting Batch SizeBatch inference has AWS quotas (limits). I adjusted the batch size to stay within these limits.
 AWS Bedrock Quotas

Item
Limit


Minimum records
1,000

Maximum records
50,000

Maximum file size
200MB

Maximum job size
1GB

 Calculating Optimal Batch SizeThe target had about 15,000 articles. After testing with 1,000, I planned to process the rest in a single batch within quota limits.
From measurements, I found the average size per article was 6.2KB.
1,000 records = about 6.2MB
14,000 records = about 87MB
Quota limits: 50,000 records, 200MB
Since 14,000 records fell within both quota limits, I decided to process them in one batch.
 Pre-submission ValidationI recommend confirming that your batch won't violate quota limits before submission. Errors may only become apparent after about 10 minutes of execution, causing wasted time. I prepared the following validation script:
Validation script (validate_batch_input.py):
import json
from pathlib import Path

MIN_RECORDS = 1000
MAX_RECORDS = 50000
MAX_FILE_SIZE_MB = 200

def validate_batch_input(file_path):
    path = Path(file_path)

    # Check file size
    size_mb = path.stat().st_size / (1024 * 1024)
    print(f"File size: {size_mb:.2f} MB (limit: {MAX_FILE_SIZE_MB} MB)")
    assert size_mb <= MAX_FILE_SIZE_MB, "File size exceeded"

    # Check record count
    record_count = 0
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            record = json.loads(line)
            assert 'recordId' in record, "recordId missing"
            assert 'modelInput' in record, "modelInput missing"
            record_count += 1

    print(f"Record count: {record_count:,} (min: {MIN_RECORDS:,}, max: {MAX_RECORDS:,})")
    assert MIN_RECORDS <= record_count <= MAX_RECORDS, "Record count out of range"

    print(f"✓ Validation complete: Ready for submission")
    return True

# Execute
validate_batch_input('batch_input.jsonl')
Execution result:
File size: 87.78 MB (limit: 200 MB)
✓ File size OK
Record count: 14,108 (min: 1,000, max: 50,000)
✓ Record count OK
✓ Format check OK
✓ Required fields OK

✓ Validation complete: Ready for submission
 100 Records Test RunBefore the production run, I conducted a test with 100 records to verify the output format and error handling.
 Batch Job SubmissionSubmission script (run_batch_job.py):
import boto3
from datetime import datetime

S3_BUCKET = 'my-bedrock-batch-bucket'
# Cross-Region Inference Profile model ID
# To avoid throttling during large requests, using the entire US region (us.) instead of a specific region
MODEL_ID = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
ROLE_ARN = 'arn:aws:iam::ACCOUNT_ID:role/BedrockBatchInferenceRole'

# S3 upload
s3 = boto3.client('s3', region_name='us-west-2')
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
s3_key = f"bedrock-batch/input/batch_{timestamp}.jsonl"

s3.upload_file('batch_input.jsonl', S3_BUCKET, s3_key)
input_uri = f"s3://{S3_BUCKET}/{s3_key}"
print(f"✓ S3 upload complete: {input_uri}")

# Create batch job
bedrock = boto3.client('bedrock', region_name='us-west-2')
job_name = f"blog-summary-batch-{timestamp.replace('_', '')}"

response = bedrock.create_model_invocation_job(
    jobName=job_name,
    roleArn=ROLE_ARN,
    modelId=MODEL_ID,
    inputDataConfig={
        's3InputDataConfig': {
            's3Uri': input_uri
        }
    },
    outputDataConfig={
        's3OutputDataConfig': {
            's3Uri': f"s3://{S3_BUCKET}/bedrock-batch/output/"
        }
    }
)

job_arn = response['jobArn']
job_id = job_arn.split('/')[-1]

print(f"✓ Batch job submission complete")
print(f"Job name: {job_name}")
print(f"Job ID: {job_id}")
print(f"ARN: {job_arn}")
Execution result:
✓ S3 upload complete: s3://my-bedrock-batch-bucket/bedrock-batch/input/batch_20260125_083843.jsonl
✓ Batch job submission complete
Job name: blog-summary-batch-20260125083843
Job ID: 3utuwhgemacy
ARN: arn:aws:bedrock:us-west-2:123456789012:model-invocation-job/3utuwhgemacy
 Job Monitoring# Check status
aws bedrock get-model-invocation-job \
  --region us-west-2 \
  --job-identifier 3utuwhgemacy
 Test ResultsProcessing time: about 10 minutes
Success rate: 100% (all 100 records successful)
Input tokens: 255,324
Output tokens: 60,592
The test completed successfully, and I confirmed that Japanese and English summaries were correctly generated.
Note: Even for the 100-record test, the minimum billing unit of 1,000 records was charged. When conducting preliminary tests, I recommend doing them in units of 1,000 records.
Output sample:
{
  "recordId": "wp-post-898154",
  "modelOutput": {
    "content": [{
      "type": "text",
      "text": "{\"title_en\":\"Creating Multiple Amazon WorkSpaces for Users in a Single Directory\",\"summary_ja\":\"Simple ADディレクトリを1つ作成し、複数のユーザー名を登録。同じディレクトリ内で異なるWorkSpaces 2つを作成...\",\"summary_en\":\"Create a single Simple AD directory with multiple user accounts. Deploy two WorkSpaces in the same directory...\"}"
    }],
    "usage": {
      "input_tokens": 1281,
      "output_tokens": 529
    }
  }
}
Note: The modelOutput.content[0].text in the output is a JSON-formatted string. When using it in a program, this string needs to be parsed again with json.loads().
 Production Run (14,108 records)After the successful test, I ran the production batch with the remaining 14,108 records.
 Data PreparationI created a JSONL file with the remaining approximately 14,000 records, excluding the 100 already tested.
 Batch SubmissionAfter creating a JSONL with the remaining approximately 14,000 records, I submitted it using the same script as the test. Since the script references the input file name (batch_input.jsonl), I only needed to replace the file to run the production batch.
# Create JSONL for the remaining ~14,000 records (excluding the 100 already tested)
python3 create_remaining_batch.py

# Confirm record count
wc -l batch_input.jsonl
# 14108 batch_input.jsonl

# Submit using the same script
python3 run_batch_job.py
Execution result:
✓ S3 upload complete: s3://my-bedrock-batch-bucket/bedrock-batch/input/batch_remaining_20260125_090512.jsonl
✓ Batch job submission complete

Job name: blog-summary-remaining-20260125090512
Job ID: siasz3eopo31
ARN: arn:aws:bedrock:us-west-2:123456789012:model-invocation-job/siasz3eopo31
 Processing ResultsProcessing time: about 17 minutes
Success rate: 100% (all 14,108 records successful)
Input tokens: 28,385,402
Output tokens: 8,044,345
manifest.json:
{
  "totalRecordCount": 14108,
  "processedRecordCount": 14108,
  "successRecordCount": 14108,
  "errorRecordCount": 0,
  "inputTokenCount": 28385402,
  "outputTokenCount": 8044345
}
Processing of over 14,000 records was completed in about 17 minutes. There were 0 errors, and summaries were successfully generated for all articles.
Note: Output results are not saved directly under the specified S3 path, but under the job ID. manifest.json.out is also generated in the same directory.
s3://my-bedrock-batch-bucket/bedrock-batch/output/
├── 3utuwhgemacy/                    # Job ID
│   ├── batch_20260125_083843.jsonl.out
│   └── manifest.json.out
└── siasz3eopo31/                    # Job ID
    ├── batch_remaining_20260125_090512.jsonl.out
    └── manifest.json.out
 Results and Benefits Cost Comparison

Method
Input Cost
Output Cost
Total
Savings


On-demand
$22.71
$32.18
$54.89
-

Batch
$11.35
$16.09
$27.44
$27.44 (50%)

Token pricing (Haiku 4.5):
On-demand: Input $0.80/1M, Output $4.00/1M
Batch: Input $0.40/1M, Output $2.00/1M
For processing about 15,000 records, I achieved a cost savings of $27.44.
 Processing TimeBatch inference:
Processing time: about 17 minutes (14,108 records)
Throughput: about 830 records/minute
On-demand (based on measured values):
Throughput: about 11 records/minute (Haiku 4.5 measured value)
Estimated processing time: about 21 hours
In this case, batch inference improved throughput by about 75 times, saving approximately 21 hours of processing time.
Note: Batch inference processing time can vary depending on resource availability and concurrent jobs. In this case, resources may have been relatively available, resulting in the fast 17-minute processing time.
 SummaryUsing Amazon Bedrock's batch inference, I was able to efficiently generate summaries for about 15,000 articles. I achieved a 50% cost reduction and significantly reduced processing time.
If you need to process 1,000 or more records with an LLM, I highly recommend trying Amazon Bedrock's batch inference.
 Reference LinksBatch inference - Amazon Bedrock
Supported models for batch inference
Cross-Region Inference