I compared Gemma 4 31B and Claude Sonnet 4.6 for OCR and structured extraction of business documents, and evaluated their costs and accuracy
This page has been translated by machine translation. View original
Hello, I'm Kema.
Google's Gemma 4 31B is now available on Amazon Bedrock.
Since it's a low-cost model with image input support, one potential use case is OCR for business documents.
However, when it comes to actual use, questions like "how does the accuracy compare to frontier models?" and "how much does the cost differ?" naturally arise.
In this article, I compare Gemma 4 31B and Claude Sonnet 4.6, summarizing verification results as of June 13, 2026.
I performed structured extraction of fictional business documents with both models and organized the comparison around three points: cost, speed, and accuracy.
Verification was conducted in both PNG and PDF formats.
I hope this serves as a reference for considering where to use low-cost models.
Target audience: Those considering document OCR on Amazon Bedrock, and those weighing the balance between cost and accuracy when selecting a model.
1. Verification Premises
1.1 Target Models and Connection Methods
Gemma 4 31B is a multimodal model that accepts both text and image inputs.
Gemma 4 31B is Google's 30.7-billion parameter dense model with built-in reasoning, native function calling, and multimodal input across text and image, supporting a 256K token context window.
Source: Model Card: Gemma 4 31B | AWS
Gemma 4 is exclusively for the bedrock-mantle endpoint, not the conventional bedrock-runtime.
Gemma 4 models are available only on the
bedrock-mantleendpoint.
Source: Model Card: Gemma 4 31B | AWS
On the other hand, Claude Sonnet 4.6 does not support bedrock-mantle.
Since image input is required, I used the conventional bedrock-runtime Converse API.
Here is a summary of the connection conditions for both models.
| Item | Gemma 4 31B | Claude Sonnet 4.6 |
|---|---|---|
| Model ID | google.gemma-4-31b |
us.anthropic.claude-sonnet-4-6 |
| Endpoint | bedrock-mantle | bedrock-runtime |
| API | OpenAI-compatible Chat Completions | Converse |
| Region | us-east-1 | us-east-1 |
| Authentication | IAM authentication (token generation) | IAM authentication |
Gemma 4 31B is available in a limited number of regions.
According to the model card, as of June 13, 2026, the available regions are us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-2 (Oregon), and eu-central-1 (Frankfurt).
All are In-Region inference only, with no support for Geo / Global Cross-Region inference.
The Tokyo region is not included. I'm hoping it will be supported soon!
Source: Model Card: Gemma 4 31B | AWS
1.2 Documents Under Verification
I prepared 3 types that are complex and likely to be used in practice, in both PNG and PDF formats.
All companies and individuals are fictional.
| Document | Features |
|---|---|
| Invoice | Itemized table with tax calculation |
| Pay Stub | Landscape orientation with 3 tables: earnings, deductions, and attendance. The most complex in this verification |
| Business Travel Expense Report | Line items by date |
Each test was run once, and the exchange rate was fixed at 1 USD = 150 JPY.
1.3 Common Prompt
The same prompt was used for both models.
This document is a Japanese business document. Please read all the information
described and output it in structured JSON format. Also determine the document type (doc_type).
No extra explanations or markdown decorations are needed; please return JSON only.
2. Verification Scripts
Both models process the 3 types (invoice, pay stub, business travel expense report) sequentially in PNG/PDF format, measuring response time, input/output tokens, and cost, then saving the results as JSON.
The cost is calculated by multiplying the actual token count returned in the response by the unit price of each model (exchange rate fixed at 1 USD = 150 JPY).
Install the required packages in advance.
python3 -m venv .venv
.venv/bin/pip install openai aws-bedrock-token-generator boto3 "botocore[crt]"
2.1 Gemma 4 31B (bedrock-mantle / OpenAI-compatible)
Point the OpenAI SDK's base_url to bedrock-mantle, and pass the IAM token to api_key.
Images are sent in image_url format, and PDFs in file format.
Save the following as scripts/gemma4_ocr.py.
Full content of scripts/gemma4_ocr.py (click to expand)
"""Performs OCR and structured extraction of documents using Gemma 4 31B (Bedrock Mantle), measuring speed/tokens/cost.
Tries both PNG and PDF for the same document to compare differences by input format.
Connection method:
- Endpoint: https://bedrock-mantle.{REGION}.api.aws/openai/v1 (OpenAI-compatible Chat Completions)
- Authentication: Generate IAM token using provide_token(region) from aws-bedrock-token-generator
- Gemma 4 31B available regions: us-east-1, us-east-2, us-west-2, eu-central-1
Usage:
export AWS_PROFILE="<your-profile>" # Specify the profile to use
export AWS_REGION="us-east-1"
python3 scripts/gemma4_ocr.py
"""
from __future__ import annotations
import base64
import json
import os
import time
from dataclasses import dataclass, asdict
from pathlib import Path
from openai import OpenAI
from aws_bedrock_token_generator import provide_token
# --- Configuration ---
MODEL_ID = "google.gemma-4-31b"
REGION = os.environ.get("AWS_REGION", "us-east-1")
BASE_URL = f"https://bedrock-mantle.{REGION}.api.aws/openai/v1"
# Pricing (USD per 1M tokens, Bedrock official on-demand)
PRICE_INPUT_PER_1M = 0.14
PRICE_OUTPUT_PER_1M = 0.40
USD_JPY = 150.0 # Exchange rate is approximate and subject to change
ROOT = Path(__file__).resolve().parent.parent
PNG_DIR = ROOT / "png"
PDF_DIR = ROOT / "pdf"
RESULT_DIR = ROOT / "results" / "gemma"
# Verification targets (3 types that are complex and commonly used in practice)
SELECTED = ["invoice", "payslip", "travel_expense"]
PROMPT = (
"This document is a Japanese business document. Please read all the information "
"described and output it in structured JSON format. Also determine the document type (doc_type). "
"No extra explanations or markdown decorations are needed; please return JSON only."
)
@dataclass
class OcrResult:
file: str
fmt: str # "png" or "pdf"
elapsed_sec: float
input_tokens: int
output_tokens: int
total_tokens: int
finish_reason: str
cost_usd: float
cost_jpy: float
output_text: str
def build_content(path: Path) -> list[dict]:
"""Assembles content for Chat Completions based on file format.
PNG is passed as image_url, PDF is passed in file format (OpenAI-compatible).
"""
b64 = base64.b64encode(path.read_bytes()).decode()
if path.suffix.lower() == ".pdf":
media = {
"type": "file",
"file": {
"filename": path.name,
"file_data": f"data:application/pdf;base64,{b64}",
},
}
else:
media = {
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64}"},
}
return [media, {"type": "text", "text": PROMPT}]
def build_client() -> OpenAI:
"""Creates an OpenAI-compatible client for Bedrock Mantle.
Authentication uses provide_token(region) to generate an IAM token passed to api_key.
Uses the standard credentials chain (AWS_PROFILE, etc.).
"""
if not os.environ.get("AWS_PROFILE") and not os.environ.get("AWS_ACCESS_KEY_ID"):
raise SystemExit("AWS credentials not found. Please specify the profile to use.")
return OpenAI(base_url=BASE_URL, api_key=provide_token(region=REGION))
def ocr_one(client: OpenAI, path: Path, fmt: str) -> OcrResult:
"""Performs OCR and structured extraction of a single file and returns measurements."""
start = time.perf_counter()
resp = client.chat.completions.create(
model=MODEL_ID,
messages=[{"role": "user", "content": build_content(path)}],
)
elapsed = time.perf_counter() - start
usage = resp.usage
choice = resp.choices[0]
in_tok = usage.prompt_tokens
out_tok = usage.completion_tokens
cost_usd = (
in_tok / 1_000_000 * PRICE_INPUT_PER_1M
+ out_tok / 1_000_000 * PRICE_OUTPUT_PER_1M
)
return OcrResult(
file=path.stem,
fmt=fmt,
elapsed_sec=round(elapsed, 2),
input_tokens=in_tok,
output_tokens=out_tok,
total_tokens=in_tok + out_tok,
finish_reason=choice.finish_reason,
cost_usd=round(cost_usd, 6),
cost_jpy=round(cost_usd * USD_JPY, 4),
output_text=choice.message.content or "",
)
def run_format(client: OpenAI, fmt: str, results: list[OcrResult]) -> None:
"""Processes targets in the specified format (png/pdf) and appends results to the list."""
base_dir = PNG_DIR if fmt == "png" else PDF_DIR
ext = ".png" if fmt == "png" else ".pdf"
print(f"\n=== Format: {fmt.upper()} ===")
print(f"{'File':<24}{'Sec':>7}{'InputTok':>9}{'OutputTok':>9}{'Finish':>9}{'JPY':>9}")
print("-" * 67)
for stem in SELECTED:
path = base_dir / f"{stem}{ext}"
if not path.exists():
print(f"{stem:<24} Not found: {path}")
continue
try:
r = ocr_one(client, path, fmt)
except Exception as e: # noqa: BLE001 Catch-all for verification script
print(f"{stem:<24} Error: {str(e)[:120]}")
continue
results.append(r)
print(
f"{r.file:<24}{r.elapsed_sec:>7}{r.input_tokens:>9}"
f"{r.output_tokens:>9}{r.finish_reason:>9}{r.cost_jpy:>9}"
)
(RESULT_DIR / f"{r.file}__{fmt}.json").write_text(
r.output_text, encoding="utf-8"
)
def summarize(fmt: str, results: list[OcrResult]) -> dict | None:
"""Displays summary by format and returns a summary dict."""
rs = [r for r in results if r.fmt == fmt]
if not rs:
return None
n = len(rs)
tin = sum(r.input_tokens for r in rs)
tout = sum(r.output_tokens for r in rs)
tjpy = sum(r.cost_jpy for r in rs)
tusd = sum(r.cost_usd for r in rs)
tsec = sum(r.elapsed_sec for r in rs)
print(
f"[{fmt.upper()}] {n} items | Avg {tsec/n:.2f}sec/item | "
f"Input {tin//n}tok/item Output {tout//n}tok/item | "
f"{tjpy/n:.3f}JPY/item → 1000 items approx {tjpy/n*1000:.0f}JPY"
)
return {
"format": fmt,
"count": n,
"avg_sec": round(tsec / n, 2),
"avg_input_tokens": tin // n,
"avg_output_tokens": tout // n,
"total_cost_jpy": round(tjpy, 4),
"total_cost_usd": round(tusd, 6),
"per_1000_jpy": round(tjpy / n * 1000, 1),
}
def main() -> None:
client = build_client()
print(f"Endpoint: {BASE_URL}")
print(f"Model: {MODEL_ID} / Targets: {len(SELECTED)} types x PNG/PDF")
results: list[OcrResult] = []
run_format(client, "png", results)
run_format(client, "pdf", results)
if not results:
raise SystemExit("No successful results.")
print("\n=== Summary (by format) ===")
png_sum = summarize("png", results)
pdf_sum = summarize("pdf", results)
summary = {
"model": MODEL_ID,
"region": REGION,
"usd_jpy": USD_JPY,
"by_format": [s for s in (png_sum, pdf_sum) if s],
"results": [asdict(r) for r in results],
}
(RESULT_DIR / "summary.json").write_text(
json.dumps(summary, ensure_ascii=False, indent=2), encoding="utf-8"
)
print(f"\nDetails: {RESULT_DIR}/summary.json")
if __name__ == "__main__":
main()
The execution command is as follows.
AWS_PROFILE=<your-profile> AWS_REGION=us-east-1 .venv/bin/python scripts/gemma4_ocr.py
2.2 Claude Sonnet 4.6 (bedrock-runtime / Converse)
Uses the boto3 Converse API.
Images are sent as image blocks, and PDFs as document blocks.
Save the following as scripts/sonnet46_ocr.py.
Full content of scripts/sonnet46_ocr.py (click to expand)
"""Performs OCR and structured extraction of documents using Claude Sonnet 4.6 (Bedrock Runtime), measuring speed/tokens/cost.
A script for comparison under the same conditions as Gemma 4. Since Sonnet 4.6 does not support bedrock-mantle,
the conventional bedrock-runtime Converse API (with image/PDF input support) is used.
Connection method:
- API: bedrock-runtime Converse (boto3)
- Model: us.anthropic.claude-sonnet-4-6 (inference profile)
- Input: PNG=image block, PDF=document block
Usage:
export AWS_PROFILE="<your-profile>" # Specify the profile to use
export AWS_REGION="us-east-1"
python3 scripts/sonnet46_ocr.py
"""
from __future__ import annotations
import json
import os
import time
from dataclasses import dataclass, asdict
from pathlib import Path
import boto3
# --- Configuration ---
MODEL_ID = "us.anthropic.claude-sonnet-4-6"
REGION = os.environ.get("AWS_REGION", "us-east-1")
# Pricing (USD per 1M tokens, Bedrock official on-demand)
PRICE_INPUT_PER_1M = 3.00
PRICE_OUTPUT_PER_1M = 15.00
USD_JPY = 150.0 # Exchange rate is approximate and subject to change
ROOT = Path(__file__).resolve().parent.parent
PNG_DIR = ROOT / "png"
PDF_DIR = ROOT / "pdf"
RESULT_DIR = ROOT / "results" / "sonnet"
# Verification targets (same 3 types as Gemma 4)
SELECTED = ["invoice", "payslip", "travel_expense"]
PROMPT = (
"This document is a Japanese business document. Please read all the information "
"described and output it in structured JSON format. Also determine the document type (doc_type). "
"No extra explanations or markdown decorations are needed; please return JSON only."
)
@dataclass
class OcrResult:
file: str
fmt: str
elapsed_sec: float
input_tokens: int
output_tokens: int
total_tokens: int
stop_reason: str
cost_usd: float
cost_jpy: float
output_text: str
def build_content(path: Path) -> list[dict]:
"""Assembles content blocks for the Converse API.
PNG is passed as an image block, PDF as a document block.
A fixed name is used for document name since only alphanumeric characters are allowed.
"""
raw = path.read_bytes()
if path.suffix.lower() == ".pdf":
media = {
"document": {
"format": "pdf",
"name": "document",
"source": {"bytes": raw},
}
}
else:
media = {
"image": {
"format": "png",
"source": {"bytes": raw},
}
}
return [media, {"text": PROMPT}]
def ocr_one(client, path: Path, fmt: str) -> OcrResult:
"""Performs OCR and structured extraction of a single file and returns measurements."""
start = time.perf_counter()
resp = client.converse(
modelId=MODEL_ID,
messages=[{"role": "user", "content": build_content(path)}],
)
elapsed = time.perf_counter() - start
usage = resp["usage"]
in_tok = usage["inputTokens"]
out_tok = usage["outputTokens"]
text = resp["output"]["message"]["content"][0]["text"]
cost_usd = (
in_tok / 1_000_000 * PRICE_INPUT_PER_1M
+ out_tok / 1_000_000 * PRICE_OUTPUT_PER_1M
)
return OcrResult(
file=path.stem,
fmt=fmt,
elapsed_sec=round(elapsed, 2),
input_tokens=in_tok,
output_tokens=out_tok,
total_tokens=in_tok + out_tok,
stop_reason=resp.get("stopReason", ""),
cost_usd=round(cost_usd, 6),
cost_jpy=round(cost_usd * USD_JPY, 4),
output_text=text,
)
def run_format(client, fmt: str, results: list[OcrResult]) -> None:
"""Processes targets in the specified format (png/pdf) and appends results to the list."""
base_dir = PNG_DIR if fmt == "png" else PDF_DIR
ext = ".png" if fmt == "png" else ".pdf"
print(f"\n=== Format: {fmt.upper()} ===")
print(f"{'File':<24}{'Sec':>7}{'InputTok':>9}{'OutputTok':>9}{'Stop':>10}{'JPY':>9}")
print("-" * 68)
for stem in SELECTED:
path = base_dir / f"{stem}{ext}"
if not path.exists():
print(f"{stem:<24} Not found: {path}")
continue
try:
r = ocr_one(client, path, fmt)
except Exception as e: # noqa: BLE001 Catch-all for verification script
print(f"{stem:<24} Error: {str(e)[:120]}")
continue
results.append(r)
print(
f"{r.file:<24}{r.elapsed_sec:>7}{r.input_tokens:>9}"
f"{r.output_tokens:>9}{r.stop_reason:>10}{r.cost_jpy:>9}"
)
(RESULT_DIR / f"{r.file}__{fmt}.json").write_text(
r.output_text, encoding="utf-8"
)
def summarize(fmt: str, results: list[OcrResult]) -> dict | None:
"""Displays summary by format and returns a summary dict."""
rs = [r for r in results if r.fmt == fmt]
if not rs:
return None
n = len(rs)
tin = sum(r.input_tokens for r in rs)
tout = sum(r.output_tokens for r in rs)
tjpy = sum(r.cost_jpy for r in rs)
tusd = sum(r.cost_usd for r in rs)
tsec = sum(r.elapsed_sec for r in rs)
print(
f"[{fmt.upper()}] {n} items | Avg {tsec/n:.2f}sec/item | "
f"Input {tin//n}tok/item Output {tout//n}tok/item | "
f"{tjpy/n:.3f}JPY/item → 1000 items approx {tjpy/n*1000:.0f}JPY"
)
return {
"format": fmt,
"count": n,
"avg_sec": round(tsec / n, 2),
"avg_input_tokens": tin // n,
"avg_output_tokens": tout // n,
"total_cost_jpy": round(tjpy, 4),
"total_cost_usd": round(tusd, 6),
"per_1000_jpy": round(tjpy / n * 1000, 1),
}
def main() -> None:
if not os.environ.get("AWS_PROFILE") and not os.environ.get("AWS_ACCESS_KEY_ID"):
raise SystemExit("AWS credentials not found. Please specify the profile to use.")
client = boto3.client("bedrock-runtime", region_name=REGION)
print(f"Model: {MODEL_ID} / Targets: {len(SELECTED)} types x PNG/PDF")
results: list[OcrResult] = []
run_format(client, "png", results)
run_format(client, "pdf", results)
if not results:
raise SystemExit("No successful results.")
print("\n=== Summary (by format) ===")
png_sum = summarize("png", results)
pdf_sum = summarize("pdf", results)
summary = {
"model": MODEL_ID,
"region": REGION,
"usd_jpy": USD_JPY,
"by_format": [s for s in (png_sum, pdf_sum) if s],
"results": [asdict(r) for r in results],
}
(RESULT_DIR / "summary.json").write_text(
json.dumps(summary, ensure_ascii=False, indent=2), encoding="utf-8"
)
print(f"\nDetails: {RESULT_DIR}/summary.json")
if __name__ == "__main__":
main()
The execution command is as follows.
AWS_PROFILE=<your-profile> AWS_REGION=us-east-1 .venv/bin/python scripts/sonnet46_ocr.py
When executed, the result JSON for each document is output to results/gemma/ and results/sonnet/, and the summaries are output to each summary.json.
3. Verification Results
For each of the 3 types, I include the input image and the output JSON from each model and format.
The JSON is presented with formatting applied for readability from what each model actually returned (the ```json code fences added by Gemma 4 have been removed, and some objects have been combined into single lines). The values themselves have not been changed.
3.1 Invoice

Invoice (fictional, A4)
Gemma 4 31B / PNG output (click to expand)
{
"doc_type": "請求書",
"issuer": {
"name": "合同会社テストワークス",
"address": "〒150-0001 東京都渋谷区架空1-1-1 架空ビル5F",
"tel": "03-1234-5678",
"registration_number": "T1234567890123",
"contact_person": "鈴木太郎"
},
"recipient": {
"name": "架空システム開発株式会社 御中"
},
"invoice_details": {
"invoice_number": "INV-2026-0612",
"issue_date": "2026年6月12日",
"payment_deadline": "2026年7月31日",
"total_amount": 495000,
"currency": "JPY"
},
"items": [
{ "description": "Webアプリ設計・開発(6月分)", "quantity": 1, "unit_price": 350000, "amount": 350000 },
{ "description": "保守運用サポート", "quantity": 1, "unit_price": 80000, "amount": 80000 },
{ "description": "サーバー利用料", "quantity": 2, "unit_price": 10000, "amount": 20000 }
],
"summary": {
"subtotal": 450000,
"consumption_tax_rate": "10%",
"consumption_tax_amount": 45000,
"total_amount": 495000
},
"payment_info": {
"bank_name": "架空銀行",
"branch_name": "渋谷支店",
"account_type": "普通",
"account_number": "1234567",
"account_holder": "ゴウドウガイシャテストワークス"
}
}
Gemma 4 31B / PDF output (click to expand)
{
"doc_type": "請求書",
"bill_to": "架空システム開発 株式会社 御中",
"issuer": {
"company_name": "合同会社テストワークス",
"address": "〒150-0001 東京都渋谷区架空1-1-1 架空ビル5F",
"tel": "03-1234-5678",
"registration_number": "T1234567890123",
"contact_person": "鈴木 太郎"
},
"invoice_details": {
"invoice_number": "INV-2026-0612",
"issue_date": "2026年6月12日",
"payment_deadline": "2026年7月31日",
"total_amount": 495000,
"currency": "JPY"
},
"line_items": [
{ "description": "Webアプリ設計・開発 (6月分)", "quantity": 1, "unit_price": 350000, "amount": 350000 },
{ "description": "保守運用サポート", "quantity": 1, "unit_price": 80000, "amount": 80000 },
{ "description": "サーバー利用料", "quantity": 2, "unit_price": 10000, "amount": 20000 }
],
"summary": {
"subtotal": 450000,
"tax_rate": "10%",
"tax_amount": 45000,
"total": 495000
},
"payment_info": {
"bank_name": "架空銀行",
"branch_name": "渋谷支店",
"account_type": "普通",
"account_number": "1234567",
"account_holder": "ゴウドウガイシャテストワークス"
}
}
Claude Sonnet 4.6 / PNG output (click to expand)
{
"doc_type": "請求書",
"invoice_number": "INV-2026-0612",
"issue_date": "2026-06-12",
"payment_due_date": "2026-07-31",
"bill_to": { "company_name": "架空システム開発 株式会社" },
"bill_from": {
"company_name": "合同会社テストワークス",
"postal_code": "150-0001",
"address": "東京都渋谷区架空1-1-1 架空ビル5F",
"tel": "03-1234-5678",
"registration_number": "T1234567890123",
"contact_person": "鈴木 太郎"
},
"total_amount_with_tax": 495000,
"items": [
{ "description": "Webアプリ設計・開発(6月分)", "quantity": 1, "unit_price": 350000, "amount": 350000 },
{ "description": "保守運用サポート", "quantity": 1, "unit_price": 80000, "amount": 80000 },
{ "description": "サーバー利用料", "quantity": 2, "unit_price": 10000, "amount": 20000 }
],
"subtotal": 450000,
"tax_rate": 0.10,
"tax_amount": 45000,
"total": 495000,
"bank_transfer": {
"bank_name": "架空銀行",
"branch_name": "渋谷支店",
"account_type": "普通",
"account_number": "1234567",
"account_holder": "ゴウドウガイシャテストワークス"
}
}
Claude Sonnet 4.6 / PDF output (click to expand)
{
"doc_type": "請求書",
"invoice_number": "INV-2026-0612",
"issue_date": "2026-06-12",
"bill_to": { "company_name": "架空システム開発 株式会社" },
"bill_from": {
"company_name": "合同会社テストワークス",
"postal_code": "150-0001",
"address": "東京都渋谷区架空1-1-1 架空ビル5F",
"tel": "03-1234-5678",
"registration_number": "T1234567890123",
"contact_person": "鈴木 太郎"
},
"items": [
{ "description": "Webアプリ設計・開発(6月分)", "quantity": 1, "unit_price": 350000, "amount": 350000 },
{ "description": "保守運用サポート", "quantity": 1, "unit_price": 80000, "amount": 80000 },
{ "description": "サーバー利用料", "quantity": 2, "unit_price": 10000, "amount": 20000 }
],
"subtotal": 450000,
"tax_rate": 0.10,
"tax_amount": 45000,
"total_amount": 495000,
"payment_due_date": "2026-07-31",
"bank_transfer_info": {
"bank_name": "架空銀行",
"branch_name": "渋谷支店",
"account_type": "普通",
"account_number": "1234567",
"account_holder": "ゴウドウガイシャテストワークス"
}
}
For the invoice, the invoice number, 3 line items, subtotal, tax, total, and bank transfer details were all numerically accurate across all 4 patterns.
Key names for the billing party and issuer varied by model and format (billing party as recipient / bill_to, issuer as issuer / bill_from), and in Gemma 4's PDF output, the billing party was returned as a string, and the parentheses in the line item were also half-width ((6月分)).
3.2 Pay Stub

Pay stub (fictional, A4 landscape)
Gemma 4 31B / PNG output (click to expand)
{
"doc_type": "給与明細書",
"employee_info": {
"employee_number": "E-10293",
"name": "佐藤 健太",
"department": "開発部",
"company": "架空テクノロジー株式会社"
},
"pay_period": {
"target_month": "2026年5月分",
"payment_date": "2026年6月15日"
},
"earnings": {
"basic_salary": 320000,
"executive_allowance": 40000,
"overtime_allowance": 52300,
"commuting_allowance": 14800,
"total_earnings": 427100
},
"deductions": {
"health_insurance": 21300,
"employees_pension": 39000,
"employment_insurance": 2562,
"income_tax": 11200,
"residence_tax": 23500,
"total_deductions": 97562
},
"attendance": {
"working_days": 20,
"absent_days": 0,
"overtime_hours": 22.5,
"paid_leave_days": 12
},
"net_pay": 329538
}
Gemma 4 31B / PDF output (click to expand)
{
"doc_type": "給与明細書",
"employee_info": {
"employee_number": "E-10293",
"name": "佐藤 健太",
"department": "開発部",
"company": "架空テクノロジー株式会社"
},
"pay_period": {
"target_month": "2026年5月分",
"payment_date": "2026年6月15日"
},
"earnings": {
"breakdown": [
{ "item": "基本給", "amount": 320000 },
{ "item": "役職手当", "amount": 40000 },
{ "item": "残業手当", "amount": 52300 },
{ "item": "通勤手当", "amount": 14800 }
],
"total_earnings": 427100
},
"deductions": {
"breakdown": [
{ "item": "健康保険", "amount": 21300 },
{ "item": "厚生年金", "amount": 39000 },
{ "item": "雇用保険", "amount": 2562 },
{ "item": "所得税", "amount": 11200 },
{ "item": "住民税", "amount": 23500 }
],
"total_deductions": 97562
},
"attendance": {
"working_days": 20,
"absent_days": 0,
"overtime_hours": 22.5,
"paid_leave_remaining": 12
},
"net_payment": 329538
}
Claude Sonnet 4.6 / PNG output (click to expand)
{
"doc_type": "給与明細書",
"period": { "year": 2026, "month": 5, "label": "2026年5月分" },
"payment_date": "2026年6月15日",
"employee": {
"employee_id": "E-10293",
"name": "佐藤 健太",
"department": "開発部",
"company": "架空テクノロジー株式会社"
},
"earnings": {
"base_salary": 320000,
"position_allowance": 40000,
"overtime_allowance": 52300,
"commuting_allowance": 14800,
"total_earnings": 427100
},
"deductions": {
"health_insurance": 21300,
"welfare_pension": 39000,
"employment_insurance": 2562,
"income_tax": 11200,
"resident_tax": 23500,
"total_deductions": 97562
},
"attendance": {
"working_days": 20,
"absent_days": 0,
"overtime_hours": 22.5,
"paid_leave_remaining": 12
},
"net_payment": 329538
}
Claude Sonnet 4.6 / PDF output (click to expand)
{
"doc_type": "給与明細書",
"period": { "year": 2026, "month": 5, "label": "2026年5月分" },
"payment_date": "2026年6月15日",
"employee": {
"employee_number": "E-10293",
"name": "佐藤 健太",
"department": "開発部"
},
"company": { "name": "架空テクノロジー株式会社" },
"earnings": {
"base_salary": 320000,
"position_allowance": 40000,
"overtime_allowance": 52300,
"commuting_allowance": 14800,
"total_earnings": 427100
},
"deductions": {
"health_insurance": 21300,
"welfare_pension": 39000,
"employment_insurance": 2562,
"income_tax": 11200,
"resident_tax": 23500,
"total_deductions": 97562
},
"attendance": {
"working_days": 20,
"absent_days": 0,
"overtime_hours": 22.5,
"remaining_paid_leave_days": 12
},
"net_payment": 329538
}
The pay stub has a 3-table structure, making it the most complex document in this verification, but all 5 earnings items, 5 deduction items, attendance details, and the net pay (329,538 yen) were numerically accurate across all 4 patterns.
However, even within the same Gemma 4 model, the PNG output returned earnings and deductions as flat keys, while the PDF returned them as arrays (breakdown), resulting in different structures.
3.3 Business Travel Expense Report

Business travel expense report (fictional, A4)
Gemma 4 31B / PNG output (click to expand)
{
"doc_type": "出張旅費精算書",
"application_date": "2026年6月12日",
"department": "営業部",
"name": "渡辺 三郎",
"business_trip_period": {
"start_date": "2026年6月3日",
"end_date": "2026年6月5日"
},
"destination": "名古屋(空架商談)",
"expenses": [
{ "date": "6/3", "description": "東京→名古屋", "transportation": "新幹線", "amount": 11300 },
{ "date": "6/3", "description": "宿泊費(1泊)", "transportation": "ホテル", "amount": 9800 },
{ "date": "6/4", "description": "市内移動", "transportation": "地下鉄・タクシー", "amount": 2400 },
{ "date": "6/4", "description": "宿泊費(1泊)", "transportation": "ホテル", "amount": 9800 },
{ "date": "6/5", "description": "名古屋→東京", "transportation": "新幹線", "amount": 11300 },
{ "date": "6/3-5", "description": "日当(3日分)", "transportation": null, "amount": 9000 }
],
"total_amount": 53600,
"advance_payment": 50000,
"settlement_amount": 3600
}
Gemma 4 31B / PDF output (click to expand)
{
"doc_type": "出張旅費精算書",
"application_date": "2026年6月12日",
"department": "営業部",
"name": "渡辺 三郎",
"trip_period": {
"start_date": "2026年6月3日",
"end_date": "2026年6月5日"
},
"destination": "名古屋(架空商談)",
"expenses": [
{ "date": "6/3", "description": "東京→名古屋", "transportation": "新幹線", "amount": 11300 },
{ "date": "6/3", "description": "宿泊費(1泊)", "transportation": "ホテル", "amount": 9800 },
{ "date": "6/4", "description": "市内移動", "transportation": "地下鉄・タクシー", "amount": 2400 },
{ "date": "6/4", "description": "宿泊費(1泊)", "transportation": "ホテル", "amount": 9800 },
{ "date": "6/5", "description": "名古屋→東京", "transportation": "新幹線", "amount": 11300 },
{ "date": "6/3-5", "description": "日当(3日分)", "transportation": "―", "amount": 9000 }
],
"total_amount": 53600,
"advance_payment": 50000,
"settlement_amount": 3600
}
Claude Sonnet 4.6 / PNG output (click to expand)
{
"doc_type": "出張旅費精算書",
"申請日": "2026年6月12日",
"所属": "営業部",
"氏名": "渡辺 三郎",
"出張期間": { "開始": "2026年6月3日", "終了": "2026年6月5日" },
"出張先": "名古屋",
"出張目的": "架空商談",
"明細": [
{ "日付": "6/3", "区間・内容": "東京→名古屋", "交通手段": "新幹線", "金額": 11300 },
{ "日付": "6/3", "区間・内容": "宿泊費(1泊)", "交通手段": "ホテル", "金額": 9800 },
{ "日付": "6/4", "区間・内容": "市内移動", "交通手段": "地下鉄・タクシー", "金額": 2400 },
{ "日付": "6/4", "区間・内容": "宿泊費(1泊)", "交通手段": "ホテル", "金額": 9800 },
{ "日付": "6/5", "区間・内容": "名古屋→東京", "交通手段": "新幹線", "金額": 11300 },
{ "日付": "6/3-5", "区間・内容": "日当(3日分)", "交通手段": null, "金額": 9000 }
],
"合計金額": 53600,
"仮払金": 50000,
"精算額": { "種別": "追加支給", "金額": 3600 },
"承認欄": { "申請": null, "上長": null, "経理": null }
}
Claude Sonnet 4.6 / PDF output (click to expand)
{
"doc_type": "出張旅費精算書",
"申請日": "2026年6月12日",
"所属": "営業部",
"氏名": "渡辺 三郎",
"出張期間": { "開始": "2026年6月3日", "終了": "2026年6月5日" },
"出張先": "名古屋(架空商談)",
"明細": [
{ "日付": "6/3", "区間・内容": "東京→名古屋", "交通手段": "新幹線", "金額": 11300 },
{ "日付": "6/3", "区間・内容": "宿泊費(1泊)", "交通手段": "ホテル", "金額": 9800 },
{ "日付": "6/4", "区間・内容": "市内移動", "交通手段": "地下鉄・タクシー", "金額": 2400 },
{ "日付": "6/4", "区間・内容": "宿泊費(1泊)", "交通手段": "ホテル", "金額": 9800 },
{ "日付": "6/5", "区間・内容": "名古屋→東京", "交通手段": "新幹線", "金額": 11300 },
{ "日付": "6/3-5", "区間・内容": "日当(3日分)", "交通手段": null, "金額": 9000 }
],
"合計金額": 53600,
"仮払金": 50000,
"精算額": { "種別": "追加支給", "金額": 3600 },
"承認欄": { "申請": null, "上長": null, "経理": null }
}
The business travel expense report has 6 line items, but the amount for each line, the total of 53,600 yen, the advance payment of 50,000 yen, and the additional reimbursement of 3,600 yen were all accurate across all 4 patterns.
However, only Gemma 4's PNG misread the destination "Nagoya (Fictional Business Meeting)" as "Nagoya (Kūka Business Meeting)", confusing "架空" (fictional) with "空架" (a reversal of the characters).
{ "destination": "名古屋(空架商談)" }
Even the same Gemma 4 model read it correctly in PDF format.
While not a numeric value or line item, this is an example where misreading of proper nouns can occur.
This document also showed differences in how blank fields were handled and how items were divided, depending on the model and format.
The approval section was blank, but Sonnet 4.6 output null for both PNG and PDF, while Gemma 4 omitted the approval section key entirely for both PNG and PDF.
For the destination, only Sonnet 4.6's PNG split it into two separate fields — "destination" and "trip purpose" — while the other 3 combined them into one.
4. Cost and Speed Comparison
Here is a summary by format and model.
Costs were calculated by multiplying the actual measured input and output tokens by each model's unit price.
| Model | Format | Avg. Speed | Avg. Input tok | Avg. Output tok | Cost/request | Per 1,000 requests |
|---|---|---|---|---|---|---|
| Gemma 4 31B | PNG | 3.83 sec | 337 | 465 | ¥0.035 | ¥35.0 |
| Gemma 4 31B | 3.85 sec | 337 | 504 | ¥0.037 | ¥37.3 | |
| Sonnet 4.6 | PNG | 8.00 sec | 1,648 | 485 | ¥1.834 | ¥1,833.6 |
| Sonnet 4.6 | 6.15 sec | 2,020 | 482 | ¥1.995 | ¥1,995.0 |
The cost difference between Gemma 4 PNG (¥0.035/request) and Sonnet 4.6 PNG (¥1.834/request) is approximately 52x.
The primary reason the gap exceeds the unit price multiplier is the difference in input token counts.
Gemma 4 processes images and PDFs at approximately 337 tokens, whereas Sonnet 4.6 processes images at 1,648 to 2,020 tokens.
In addition to the unit price difference, the gap in input token counts further widens the cost difference.
Note that the number of input tokens for an image is not determined by file size (KB).
Gemma 4 is designed to convert images into a fixed number of tokens, with a default of 280 tokens per image.
As a result, the token count for images remains constant whether PNG or PDF is used, and in this case it stabilized at approximately 337 tokens for both formats after adding the Japanese prompt.
the default is 280 soft tokens per image
Source: Gemma 4 | Hugging Face Transformers
For unit pricing, Gemma 4 31B is $0.14 per 1M input tokens and $0.40 per 1M output tokens (US region).
"Gemma 4 31B" costs "$0.14" per 1M input tokens and "$0.40" per 1M output tokens.
Source: Amazon Bedrock Pricing | AWS
Claude Sonnet 4.6 is $3.00 per 1M input tokens and $15.00 per 1M output tokens.
Claude Sonnet 4.6 — Price per 1M input tokens: $3.00 / Price per 1M output tokens: $15.00
Source: Amazon Bedrock Pricing | AWS
In terms of speed, Gemma 4 was approximately 1.5 to 2 times faster in the measurements taken this time.
However, response times are affected by API routing and network conditions, so these results should not be treated as a rigorous latency comparison of the models themselves.
5. Accuracy Comparison
Each major field in each document was verified one item at a time against the source data.
The fields verified were amounts, line items, dates, and totals.
| Document | Key fields verified | Gemma PNG | Gemma PDF | Sonnet PNG | Sonnet PDF |
|---|---|---|---|---|---|
| Invoice | Invoice number, 3 line items, subtotal/tax/total, bank transfer details, registration number | Match | Match | Match | Match |
| Pay stub | 5 earnings items, 5 deduction items, 4 attendance items, net pay | Match | Match | Match | Match |
| Business travel expense report | 6 line items, total, advance payment, additional reimbursement | Match | Match | Match | Match |
There were no errors in amounts or numerical values across all 12 patterns.
Calculated results such as the net pay of 329,538 yen on the pay stub also matched the source data in all cases.
The accuracy of key amounts and numerical values was on par between Gemma 4 31B and Claude Sonnet 4.6.
However, when including proper nouns, Gemma 4's PNG had one misreading in this verification, while Sonnet 4.6 had none.
Within the scope of our observations, Sonnet 4.6 was slightly more stable.
On the other hand, as shown for each document in Section 3, there were differences and inconsistencies in supplementary fields.
These were not misreadings of amounts or line items, but rather differences in how blank fields, compound expressions, structures, and key names were interpreted.
- Invoice: Key names for the billing party and issuer varied by model and format
- Pay stub: Even within the same Gemma 4 model, the structure of earnings and deductions differed between flat keys and arrays
- Business travel expense report: Gemma 4's PNG misread one proper noun. Handling of the approval section and destination also differed
These inconsistencies arise from run to run because no schema was specified.
If you need to fix field names or structure for business use, consider combining this with Structured Outputs schema specification.
6. Summary
For JSON structured extraction of standard business documents (3 types: invoice, pay stub, and business travel expense report), the accuracy of key amounts and numerical values was on par between Gemma 4 31B and Claude Sonnet 4.6.
However, in this verification, Gemma 4's PNG had one instance of a proper noun misreading, while Sonnet 4.6 had none, and within the scope of our observations, Sonnet 4.6 was slightly more stable.
Costs were significantly lower with Gemma 4, approximately 1/50th of the cost based on unit prices in this verification.
Speed was also faster with Gemma 4, approximately 2x faster in our measurements.
For tasks like OCR and structured extraction that require processing large volumes, low unit costs are impactful, making Gemma 4 31B a strong candidate.
It is worth considering a division of labor where frontier models handle quality-critical processing, while low-cost models handle high-volume processing.
