Optimizing Summarization of 14,000 Articles with Amazon Bedrock Batch Inference

Optimizing Summarization of 14,000 Articles with Amazon Bedrock Batch Inference

Generated 14,000 summary texts from technical blog articles. By implementing Amazon Bedrock batch inference, we reduced costs by 50% and improved processing efficiency 75-fold.
2026.01.25

This page has been translated by machine translation. View original

I had an opportunity to use LLMs for creating summary texts for about 60,000 technical blog articles, which would be used for indexing and SEO purposes.

Initially, I was using Bedrock on-demand, but cost and time issues became apparent.

I tried switching to Amazon Bedrock batch inference midway through the project, which reduced processing costs by 50%. I'll share how I improved processing efficiency and throughput.

Why I Chose Batch Inference

With on-demand execution, we faced high costs (about $40 per 10,000 articles) and long processing times (about 11 articles/minute → approx. 15 hours per 10,000 articles). Since real-time processing wasn't necessary, we considered switching to batch inference.

Amazon Bedrock's batch inference has these features:

  • Cost: 50% of on-demand
  • Processing time: Completes within 24 hours (auto-scales)

It's ideal for processing large amounts of data when real-time response isn't required.

For this project, I used Claude Haiku 4.5 (released October 2025) in the us-west-2 (Oregon) region.

Preparation

Data Preparation

I processed article records obtained from CMS and extracted the necessary information for generating summaries.

Cleaning process:

  • Removing HTML tags (using BeautifulSoup)
  • Removing code blocks (using regex)
  • Removing image links
  • Normalizing Markdown syntax

AWS Environment Setup

1. Creating S3 Bucket

aws s3 mb s3://my-bedrock-batch-bucket --region us-west-2

Bucket structure:

s3://my-bedrock-batch-bucket/
├── bedrock-batch/input/   # input JSONL
└── bedrock-batch/output/  # output results

2. Creating IAM Role

Trust policy (trust-policy.json):

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "bedrock.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}

Permission policy (permissions-policy.json):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bedrock-batch-bucket/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "*"
    }
  ]
}

Role creation commands:

# Create role
aws iam create-role \
  --role-name BedrockBatchInferenceRole \
  --assume-role-policy-document file://trust-policy.json

# Attach permissions
aws iam put-role-policy \
  --role-name BedrockBatchInferenceRole \
  --policy-name BedrockBatchS3Access \
  --policy-document file://permissions-policy.json

Common Mistake: Insufficient IAM Permissions

With batch inference, permission errors are only discovered 10-15 minutes after execution begins. Especially when using cross-region inference, resource specification becomes complex because it includes the inference profile ARN.

In this case, I set the resource for bedrock:InvokeModel to "*" to avoid errors caused by insufficient permissions.

If you need to narrow down resources, pay attention to:

  • Model ARN (arn:aws:bedrock:*::foundation-model/*)
  • Inference Profile ARN (arn:aws:bedrock:*:ACCOUNT_ID:inference-profile/*)
  • For Cross-Region Inference Profiles, region and account ID specifications can be complex

When using the minimum privilege principle, I recommend running a test with the minimum batch inference billing unit (1,000 records).

Batch Input JSONL Format

The input for batch inference is in JSONL (JSON Lines) format. Each line contains one JSON object.

Format example:

{
  "recordId": "wp-post-12345",
  "modelInput": {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "temperature": 0.1,
    "system": "You are a technical blog summary expert. Generate summaries in Japanese and English from articles, and output in JSON format.",
    "messages": [
      {
        "role": "user",
        "content": "Title: How to use Amazon Bedrock\n\nArticle:\nMain text content..."
      }
    ]
  }
}

JSONL creation script (simplified):

import json
import boto3
from bs4 import BeautifulSoup
import re

def clean_text(text):
    """Remove HTML tags and Markdown, format for batch input"""
    soup = BeautifulSoup(text, 'html.parser')
    text = soup.get_text()
    text = re.sub(r'```[\s\S]*?```', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text[:50000]  # Limit to 50,000 characters per record

# Load article data
articles = load_articles()  # Article data obtained from CMS

# Create JSONL
with open('batch_input.jsonl', 'w', encoding='utf-8') as f:
    for article in articles:
        record = {
            "recordId": article['id'],
            "modelInput": {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 4096,
                "temperature": 0.1,
                "system": "You are a technical blog summary expert. Generate summaries in Japanese and English from articles, and output in JSON format.",
                "messages": [{
                    "role": "user",
                    "content": f"Title: {article['title']}\n\nArticle:\n{clean_text(article['content'])}"
                }]
            }
        }
        f.write(json.dumps(record, ensure_ascii=False) + '\n')

print(f"✓ JSONL creation completed: {len(articles)} records")

Adjusting Batch Size

Batch inference has AWS quotas (limits). I adjusted the batch size to stay within these limits.

AWS Bedrock Quotas

Item Limit
Minimum records 1,000
Maximum records 50,000
Maximum file size 200MB
Maximum job size 1GB

Calculating Optimal Batch Size

The target had about 15,000 articles. After testing with 1,000, I planned to process the rest in a single batch within quota limits.

From measurements, I found the average size per article was 6.2KB.

  • 1,000 records = about 6.2MB
  • 14,000 records = about 87MB
  • Quota limits: 50,000 records, 200MB

Since 14,000 records fell within both quota limits, I decided to process them in one batch.

Pre-submission Validation

I recommend confirming that your batch won't violate quota limits before submission. Errors may only become apparent after about 10 minutes of execution, causing wasted time. I prepared the following validation script:

Validation script (validate_batch_input.py):

import json
from pathlib import Path

MIN_RECORDS = 1000
MAX_RECORDS = 50000
MAX_FILE_SIZE_MB = 200

def validate_batch_input(file_path):
    path = Path(file_path)

    # Check file size
    size_mb = path.stat().st_size / (1024 * 1024)
    print(f"File size: {size_mb:.2f} MB (limit: {MAX_FILE_SIZE_MB} MB)")
    assert size_mb <= MAX_FILE_SIZE_MB, "File size exceeded"

    # Check record count
    record_count = 0
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            record = json.loads(line)
            assert 'recordId' in record, "recordId missing"
            assert 'modelInput' in record, "modelInput missing"
            record_count += 1

    print(f"Record count: {record_count:,} (min: {MIN_RECORDS:,}, max: {MAX_RECORDS:,})")
    assert MIN_RECORDS <= record_count <= MAX_RECORDS, "Record count out of range"

    print(f"✓ Validation complete: Ready for submission")
    return True

# Execute
validate_batch_input('batch_input.jsonl')

Execution result:

File size: 87.78 MB (limit: 200 MB)
✓ File size OK
Record count: 14,108 (min: 1,000, max: 50,000)
✓ Record count OK
✓ Format check OK
✓ Required fields OK

✓ Validation complete: Ready for submission

100 Records Test Run

Before the production run, I conducted a test with 100 records to verify the output format and error handling.

Batch Job Submission

Submission script (run_batch_job.py):

import boto3
from datetime import datetime

S3_BUCKET = 'my-bedrock-batch-bucket'
# Cross-Region Inference Profile model ID
# To avoid throttling during large requests, using the entire US region (us.) instead of a specific region
MODEL_ID = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
ROLE_ARN = 'arn:aws:iam::ACCOUNT_ID:role/BedrockBatchInferenceRole'

# S3 upload
s3 = boto3.client('s3', region_name='us-west-2')
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
s3_key = f"bedrock-batch/input/batch_{timestamp}.jsonl"

s3.upload_file('batch_input.jsonl', S3_BUCKET, s3_key)
input_uri = f"s3://{S3_BUCKET}/{s3_key}"
print(f"✓ S3 upload complete: {input_uri}")

# Create batch job
bedrock = boto3.client('bedrock', region_name='us-west-2')
job_name = f"blog-summary-batch-{timestamp.replace('_', '')}"

response = bedrock.create_model_invocation_job(
    jobName=job_name,
    roleArn=ROLE_ARN,
    modelId=MODEL_ID,
    inputDataConfig={
        's3InputDataConfig': {
            's3Uri': input_uri
        }
    },
    outputDataConfig={
        's3OutputDataConfig': {
            's3Uri': f"s3://{S3_BUCKET}/bedrock-batch/output/"
        }
    }
)

job_arn = response['jobArn']
job_id = job_arn.split('/')[-1]

print(f"✓ Batch job submission complete")
print(f"Job name: {job_name}")
print(f"Job ID: {job_id}")
print(f"ARN: {job_arn}")

Execution result:

✓ S3 upload complete: s3://my-bedrock-batch-bucket/bedrock-batch/input/batch_20260125_083843.jsonl
✓ Batch job submission complete
Job name: blog-summary-batch-20260125083843
Job ID: 3utuwhgemacy
ARN: arn:aws:bedrock:us-west-2:123456789012:model-invocation-job/3utuwhgemacy

Job Monitoring

# Check status
aws bedrock get-model-invocation-job \
  --region us-west-2 \
  --job-identifier 3utuwhgemacy

Test Results

  • Processing time: about 10 minutes
  • Success rate: 100% (all 100 records successful)
  • Input tokens: 255,324
  • Output tokens: 60,592

The test completed successfully, and I confirmed that Japanese and English summaries were correctly generated.

Note: Even for the 100-record test, the minimum billing unit of 1,000 records was charged. When conducting preliminary tests, I recommend doing them in units of 1,000 records.

Output sample:

{
  "recordId": "wp-post-898154",
  "modelOutput": {
    "content": [{
      "type": "text",
      "text": "{\"title_en\":\"Creating Multiple Amazon WorkSpaces for Users in a Single Directory\",\"summary_ja\":\"Simple ADディレクトリを1つ作成し、複数のユーザー名を登録。同じディレクトリ内で異なるWorkSpaces 2つを作成...\",\"summary_en\":\"Create a single Simple AD directory with multiple user accounts. Deploy two WorkSpaces in the same directory...\"}"
    }],
    "usage": {
      "input_tokens": 1281,
      "output_tokens": 529
    }
  }
}

Note: The modelOutput.content[0].text in the output is a JSON-formatted string. When using it in a program, this string needs to be parsed again with json.loads().

Production Run (14,108 records)

After the successful test, I ran the production batch with the remaining 14,108 records.

Data Preparation

I created a JSONL file with the remaining approximately 14,000 records, excluding the 100 already tested.

Batch Submission

After creating a JSONL with the remaining approximately 14,000 records, I submitted it using the same script as the test. Since the script references the input file name (batch_input.jsonl), I only needed to replace the file to run the production batch.

# Create JSONL for the remaining ~14,000 records (excluding the 100 already tested)
python3 create_remaining_batch.py

# Confirm record count
wc -l batch_input.jsonl
# 14108 batch_input.jsonl

# Submit using the same script
python3 run_batch_job.py

Execution result:

✓ S3 upload complete: s3://my-bedrock-batch-bucket/bedrock-batch/input/batch_remaining_20260125_090512.jsonl
✓ Batch job submission complete

Job name: blog-summary-remaining-20260125090512
Job ID: siasz3eopo31
ARN: arn:aws:bedrock:us-west-2:123456789012:model-invocation-job/siasz3eopo31

Processing Results

  • Processing time: about 17 minutes
  • Success rate: 100% (all 14,108 records successful)
  • Input tokens: 28,385,402
  • Output tokens: 8,044,345

manifest.json:

{
  "totalRecordCount": 14108,
  "processedRecordCount": 14108,
  "successRecordCount": 14108,
  "errorRecordCount": 0,
  "inputTokenCount": 28385402,
  "outputTokenCount": 8044345
}

Processing of over 14,000 records was completed in about 17 minutes. There were 0 errors, and summaries were successfully generated for all articles.

Note: Output results are not saved directly under the specified S3 path, but under the job ID. manifest.json.out is also generated in the same directory.

s3://my-bedrock-batch-bucket/bedrock-batch/output/
├── 3utuwhgemacy/                    # Job ID
│   ├── batch_20260125_083843.jsonl.out
│   └── manifest.json.out
└── siasz3eopo31/                    # Job ID
    ├── batch_remaining_20260125_090512.jsonl.out
    └── manifest.json.out

Results and Benefits

Cost Comparison

Method Input Cost Output Cost Total Savings
On-demand $22.71 $32.18 $54.89 -
Batch $11.35 $16.09 $27.44 $27.44 (50%)

Token pricing (Haiku 4.5):

  • On-demand: Input $0.80/1M, Output $4.00/1M
  • Batch: Input $0.40/1M, Output $2.00/1M

For processing about 15,000 records, I achieved a cost savings of $27.44.

Processing Time

Batch inference:

  • Processing time: about 17 minutes (14,108 records)
  • Throughput: about 830 records/minute

On-demand (based on measured values):

  • Throughput: about 11 records/minute (Haiku 4.5 measured value)
  • Estimated processing time: about 21 hours

In this case, batch inference improved throughput by about 75 times, saving approximately 21 hours of processing time.

Note: Batch inference processing time can vary depending on resource availability and concurrent jobs. In this case, resources may have been relatively available, resulting in the fast 17-minute processing time.

Summary

Using Amazon Bedrock's batch inference, I was able to efficiently generate summaries for about 15,000 articles. I achieved a 50% cost reduction and significantly reduced processing time.

If you need to process 1,000 or more records with an LLM, I highly recommend trying Amazon Bedrock's batch inference.

Share this article

FacebookHatena blogX

Related articles