Translate Word documents while preserving layout with Cloud Translation API

Translate Word documents while preserving layout with Cloud Translation API

I actually tried using the Cloud Translation API's document translation feature to translate a Word (DOCX) file into English while preserving the layout. I also tested how text boxes are handled and how to fix proper nouns using a glossary.
2026.06.03

This page has been translated by machine translation. View original

Hi, I'm Kema.

There are many situations where you want to translate a proposal or manual created in Word into another language while preserving the layout.
For example, you might need to translate a document with tables, figures, and text boxes into English without breaking the appearance.
However, if you extract only the text from Word and translate it, the tables and column layouts will break and the original appearance cannot be reproduced.
Furthermore, unique proper nouns such as product names and character names often need to be consistently translated to a fixed term every time.

Previously, I wrote an article about translating PDFs using the Document Translation feature of Google Cloud's Cloud Translation API.
In this article, I used the same feature to translate a Word (DOCX) file directly, and verified how well the layout is preserved, how text boxes and tables are handled, and whether custom terminology can be fixed using a glossary, by actually running it.

This article is the Word edition of a series that verifies document translation by format.
Specifications and behavior were confirmed against Google Cloud's official documentation, with relevant sections quoted.
I then verified whether things behaved as described in the official documentation by actually running it.

Series Articles

Format Article
PDF Edition Cloud Translation API で PDF をレイアウトを保ったまま翻訳する
Word Edition (this article)

Target audience: Those considering automating the translation of Word documents

1. Conclusion: Official Documentation vs. Verified Results

For those who want to get straight to the conclusion, here is a summary of the official Google Cloud documentation and the results I verified by actually translating a Word (DOCX) file.
Detailed steps and before/after images are in §3 and beyond.

Aspect Official Documentation Verified Result (confirmed in this article)
Body text, formatting, layout Translated while preserving format and layout Headings, body text, tables, multi-column layout, colored text, and headers/footers were preserved
Text inside text boxes Not translated and remains in the source language As documented, remained in Japanese (all of rectangle, rounded rectangle, and ellipse shapes)
Text inside images (Not mentioned) Not translated (because it is not text data)
Glossary (fixing custom terminology) Glossary can fix translations of terms Nearly consistent throughout body text and tables
Region where glossaries can be created Custom resources only in us-central1 Created in us-central1 (Tokyo region not available)

The official documentation is quoted at the relevant section in each part of this article.
The most important point is that the content inside text boxes is not translated.
This is a behavior explicitly stated in the official documentation, and it was reproduced exactly as described.
If you are in a hurry, the overview can be grasped by looking at this table and the images in §3.

2. What Is Document Translation in Cloud Translation API?

In addition to the feature that translates text, Cloud Translation API has a feature called Document Translation that translates files as-is.
When you pass a PDF or DOCX file, it translates the content while preserving the formatting and layout, and returns the result.
An overview of this feature, the differences between Basic (v2) and Advanced (v3), when to use synchronous vs. batch, and authentication (API keys are not supported; use ADC or a service account) were all covered in detail in the previous PDF edition.

Reference: DevelopersIO: Cloud Translation API で PDF をレイアウトを保ったまま翻訳する

The supported input formats include not only PDF but also Word, PowerPoint, and Excel formats (including older formats).
Word, which is covered in this article, is one of the DOCX formats.

Input Format MIME Type Output Format
DOCX application/vnd.openxmlformats-officedocument.wordprocessingml.document DOCX
DOC application/msword DOC or DOCX

Reference: Official Documentation: Translate documents | Google Cloud

For Word, there is one important note explicitly stated in the official documentation.
That is, text inside text boxes is not translated and remains in the source language.

Content inside text boxes aren't translated and remain in the source language.

Reference: Official Documentation: Translate documents | Google Cloud

This behavior was reproduced exactly in the verification in this article (confirmed in §4).

3. Preparing for Verification

3.1 Environment

The verification in this article was run in the following environment.

Item Environment Used
OS macOS
Python 3.12.x (venv created within this)
Google Cloud SDK (gcloud) 565.0.0
google-cloud-translate 3.26.0
GCP Project Personal project (billing enabled)

In addition, the following prerequisites are assumed to be in place.

  • A Google Cloud project with billing enabled
  • The gcloud command is available
  • IAM permissions to manage translations and glossaries (equivalent to roles/cloudtranslate.editor; if using a glossary, write permissions to the target bucket are also required)

3.2 Sample Document (Original Fictional Anime)

For verification purposes, I had Claude create a Word document formatted as official setting materials for a fictional anime called "Hoshirei Monogatari: Lumina Chronicle."
The content is completely original and has no relation to any real works, people, or organizations.
It shares the same worldview as the PDF edition, with consistent proper nouns (coined terms).

This document intentionally includes elements that are prone to breaking and elements I wanted to verify in translation.

  • Headings, body text, multiple tables (Hoshirei Codex, glossary, broadcast information), 2-column layout (multi-column)
  • Colored text, hyperlinks, headers/footers
  • Text boxes (shapes) with rounded rectangles, rectangles, and ellipses containing text (for verifying text box translation behavior)
  • Coined terms such as 星霊, 共鳴進化, and 雷狼ボルテ (for verifying glossary behavior)

The Word document translated this time consists of 4 pages in total.
All pages before translation are shown below.

Pre-translation Word page 1 (Japanese)
Pre-translation Word page 1: Title, key visual image, orange-bordered text box ("Highlights"), 2-column world setting, and Hoshirei Codex table

Pre-translation Word page 2 (Japanese)
Pre-translation Word page 2: Continuation of Hoshirei Codex table, glossary table, and synopsis

Pre-translation Word page 3 (Japanese)
Pre-translation Word page 3: Popularity ranking chart image, production notes text box, and rectangle/rounded rectangle/ellipse shapes

Pre-translation Word page 4 (Japanese)
Pre-translation Word page 4: Broadcast information table, footnotes, and footer

3.3 Environment Setup

The setup process is the same as in the PDF edition.

First, enable the Cloud Translation API.

gcloud services enable translate.googleapis.com --project <YOUR_PROJECT_ID>

Next, authentication.
Document Translation (v3 / Advanced) does not support API keys, so use ADC (Application Default Credentials).

gcloud auth application-default login
gcloud auth application-default set-quota-project <YOUR_PROJECT_ID>

Create a virtual environment (venv) and install the library inside it.

python3 -m venv .venv
source .venv/bin/activate
pip install google-cloud-translate

4. Translating Word (DOCX)

I'll try synchronous translation, which sends the file's byte stream directly in the request.
The script used for translation is designed to determine the MIME type from the file extension and handle PDF, DOCX, XLSX, and PPTX with the same code.
PDF-specific options (native detection and skew correction) are only applied for PDFs and are not included for Office formats.

Full text of translate_document_handson.py (click to expand)
from __future__ import annotations

import argparse
import time
from pathlib import Path

from google.cloud import translate_v3 as translate

# Glossaries and custom models must be placed in us-central1.
DEFAULT_LOCATION = "us-central1"

# Extension → MIME type (Document Translation supported formats)
MIME_BY_SUFFIX = {
    ".pdf": "application/pdf",
    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    ".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
}

def mime_for(path: str) -> str:
    """Returns the MIME type based on the input file extension. Raises ValueError for unsupported extensions."""
    suffix = Path(path).suffix.lower()
    if suffix not in MIME_BY_SUFFIX:
        raise ValueError(f"Unsupported extension: {suffix} (supported: {', '.join(MIME_BY_SUFFIX)})")
    return MIME_BY_SUFFIX[suffix]

def parse_args() -> argparse.Namespace:
    p = argparse.ArgumentParser(description="Cloud Translation API Document Translation (synchronous)")
    p.add_argument("--project", required=True, help="GCP project ID")
    p.add_argument("--input", required=True, help="Input file path")
    p.add_argument("--output", required=True, help="Output file path (result without glossary)")
    p.add_argument("--source", default="ja", help="Source language code (default: ja)")
    p.add_argument("--target", default="en", help="Target language code (default: en)")
    p.add_argument("--location", default=DEFAULT_LOCATION, help=f"Location (default: {DEFAULT_LOCATION})")
    p.add_argument("--glossary-id", default=None, help="Glossary ID (if specified, outputs both with and without glossary)")
    return p.parse_args()

def build_request(args: argparse.Namespace, content: bytes) -> dict:
    """Builds the request dictionary for translate_document.

    Specifying the source language is required when using a glossary (per official spec).
    """
    parent = f"projects/{args.project}/locations/{args.location}"
    mime_type = mime_for(args.input)
    request: dict = {
        "parent": parent,
        "source_language_code": args.source,
        "target_language_code": args.target,
        "document_input_config": {"content": content, "mime_type": mime_type},
    }
    if args.glossary_id:
        glossary_path = (
            f"projects/{args.project}/locations/{args.location}"
            f"/glossaries/{args.glossary_id}"
        )
        request["glossary_config"] = translate.TranslateTextGlossaryConfig(glossary=glossary_path)
    return request

def write_bytes(path: str, data: bytes) -> None:
    Path(path).parent.mkdir(parents=True, exist_ok=True)
    Path(path).write_bytes(data)

def main() -> None:
    args = parse_args()
    content = Path(args.input).read_bytes()
    print(f"Input: {args.input} ({len(content):,} bytes, {args.source}{args.target})")

    client = translate.TranslationServiceClient()
    request = build_request(args, content)

    started = time.perf_counter()
    response = client.translate_document(request=request)
    elapsed = time.perf_counter() - started

    base = response.document_translation
    write_bytes(args.output, base.byte_stream_outputs[0])
    print(f"Processing time: {elapsed:.2f} seconds")
    print(f"Output (without glossary): {args.output} ({len(base.byte_stream_outputs[0]):,} bytes)")

    # With glossary: a single call returns a separate output in glossary_document_translation
    if args.glossary_id and response.glossary_document_translation.byte_stream_outputs:
        out = Path(args.output)
        glossary_out = str(out.with_name(f"{out.stem}_glossary{out.suffix}"))
        write_bytes(glossary_out, response.glossary_document_translation.byte_stream_outputs[0])
        print(f"Output (with glossary): {glossary_out}")

if __name__ == "__main__":
    main()

First, translate the Word document from Japanese to English without a glossary.

python translate_document_handson.py \
    --project <YOUR_PROJECT_ID> \
    --input hoshirei_ja.docx \
    --output hoshirei_en.docx
# Example output
Input: hoshirei_ja.docx (125,305 bytes, ja→en)
Processing time: 1.30 seconds
Output (without glossary): hoshirei_en.docx (119,172 bytes)

A multi-page DOCX was translated in about 1 second.
Let me show a side-by-side comparison of before and after translation (without glossary) for all 4 pages.

Before/after translation comparison (Word, page 1)
Page 1: Left is before translation (Japanese), right is after translation (without glossary). Body text, headings, colored text, and headers/footers are translated to English, but the text box "Highlights" remains in Japanese

Before/after translation comparison (Word, page 2)
Page 2: Left is before translation, right is after translation. The Hoshirei Codex and glossary tables, along with the synopsis, are translated to English while preserving the layout

Before/after translation comparison (Word, page 3)
Page 3: Left is before translation, right is after translation. The body text is translated to English, but the production notes and the content inside the rectangle, rounded rectangle, and ellipse shapes remain in Japanese without being translated

Before/after translation comparison (Word, page 4)
Page 4: Left is before translation, right is after translation. The broadcast information table, footnotes, and footer are translated to English

Headings, body text, colored text, 2-column layout, tables, and even headers/footers—the layout is preserved almost entirely intact.

What stood out particularly in terms of formatting preservation was how background and text colors were handled.
In places where text color changed mid-sentence in the original, or where background color was applied, those colors were carried over exactly in the translated output.
Sections that were bold for emphasis also remained bold after translation.
This shows that the translation preserves not just the text replacement, but also per-character formatting information.

On the other hand, elements that were not translated were also clearly distinguishable.

First, text boxes.
The orange-bordered box on page 1 ("Highlights"), the production notes text box on page 3, and the rectangle, rounded rectangle, and ellipse shapes all had their content remain in Japanese.
This is exactly the behavior described in the official documentation quoted in §2: "Content inside text boxes aren't translated and remain in the source language."
Note that in the PDF edition, the font size was reduced to make the layout match the pre-translation version, but in the Word case, the font size did not change, and only the page break positions shifted slightly.
However, page breaks occurred at natural positions, so there were no issues with the content.

The other element was the bar chart inserted as an image.
The popularity ranking chart on page 3 was inserted as an image and therefore was not subject to translation.
Since text within images is not text data, Document Translation cannot handle it.

5. Fixing Custom Terminology with a Glossary

I'll verify whether proper nouns and custom terms can be fixed to consistent translations.
To use a glossary, you need to (1) prepare a TSV file mapping source terms to translations, (2) upload it to Cloud Storage, and (3) create a glossary resource.
When you specify the created glossary during translation, the registered terms will be aligned to their fixed translations.

5.1 Prepare the Glossary TSV

The glossary is prepared as a TSV file where source terms (Japanese) and target translations (English) are listed one per line, tab-separated.
No header row is needed; the left column is the source coined term and the right column is the desired fixed translation.
This time, I prepared 20 coined terms scattered throughout the sample as glossary_ja_en.tsv.

星霊	Hoshirei
共鳴進化	Reso-Evolution
輝光石	Lumina Shard
雷狼ボルテ	Voltefang
焔狐ココ	Pyrofox Coco
水亀アクオ	Aquortle
草鹿リーフィ	Leafawn
輝竜ルミナ	Lumidragon
月読の祠	Moonread Shrine
守護者	Warden
星導士	Starwright
星霊守護協会	Hoshirei Warden Guild
共鳴値	Reso-Value
共鳴の灯	Resonance Flame
絆ゲージ	Bond Gauge
星霊酔い	Hoshirei-sickness
星霊図鑑	Hoshirei Codex
ルミナ群島	Lumina Archipelago
七つの祠	Seven Shrines
共鳴結界	Reso-Barrier

5.2 Upload the TSV to Cloud Storage and Create the Glossary Resource

Rather than uploading the TSV directly, you first place it in Cloud Storage and then create the glossary resource by specifying that GCS URI.
Create the bucket in the same us-central1 region as the glossary.

# Create the bucket (skip if it already exists)
gcloud storage buckets create gs://<YOUR_BUCKET> \
    --project <YOUR_PROJECT_ID> --location us-central1

# Upload the TSV
gcloud storage cp glossary_ja_en.tsv \
    gs://<YOUR_BUCKET>/glossaries/glossary_ja_en.tsv

Next, create the glossary resource from the uploaded TSV.
Creation is a long-running operation (LRO), so run it from the client library and wait for completion.
Save the following code as setup_glossary.py and run it within the venv from §3.3.

Full text of setup_glossary.py (click to expand)
from google.cloud import translate_v3 as translate

PROJECT_ID = "<YOUR_PROJECT_ID>"
LOCATION = "us-central1"   # Glossaries are only available in us-central1
GLOSSARY_ID = "hoshirei-ja-en"
INPUT_URI = "gs://<YOUR_BUCKET>/glossaries/glossary_ja_en.tsv"

client = translate.TranslationServiceClient()
name = client.glossary_path(PROJECT_ID, LOCATION, GLOSSARY_ID)
glossary = translate.Glossary(
    name=name,
    # Unidirectional (ja→en) glossary
    language_pair=translate.Glossary.LanguageCodePair(
        source_language_code="ja", target_language_code="en"
    ),
    input_config=translate.GlossaryInputConfig(
        gcs_source=translate.GcsSource(input_uri=INPUT_URI)
    ),
)
parent = f"projects/{PROJECT_ID}/locations/{LOCATION}"
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(180)   # Wait up to 180 seconds for completion
print(f"Creation complete: {result.name} (entry count: {result.entry_count})")
python setup_glossary.py
# Example output
Creation complete: projects/.../locations/us-central1/glossaries/hoshirei-ja-en (entry count: 20)

The official documentation also explicitly states that custom resources must use us-central1.

Note: All of your resources in a single request to Cloud Translation - Advanced must have the same location. Currently, only global and us-central1 locations are supported. For all custom resources—AutoML models, glossaries, long-running-operations—you must use us-central1.

Reference: Official Documentation: Migrate to Cloud Translation - Advanced (v3) | Google Cloud

5.3 Comparing Results With and Without Glossary

Translate using the created glossary.
Passing the glossary ID created in 5.2 to --glossary-id will return both a "without glossary" and a "with glossary" result in a single response.

python translate_document_handson.py \
    --project <YOUR_PROJECT_ID> \
    --input hoshirei_ja.docx \
    --output hoshirei_en.docx \
    --glossary-id <YOUR_GLOSSARY_ID>

Looking at the result without the glossary first, the translations of proper nouns were quite inconsistent (per Claude's analysis).
Among the 26 occurrences of 星霊 in the source text, the breakdown of main translations in the version without a glossary was as follows.

Translation of 星霊 (without glossary) Occurrences
Star Spirit 8
star spirits 6
Celestial Spirit 3
star spirit 3
Other variants Several

The same word was split into multiple variants, including differences in capitalization and singular/plural.
Since machine translation selects the most appropriate translation based on context, this is what happens without a mechanism to align proper nouns.

Switching to the version with a glossary, these were neatly unified.
The places where the translation actually changed were the title (page 1, where many proper nouns appear) and the glossary explanation table (page 2).

Comparison without/with glossary (Word, page 1)
Page 1: Left is without glossary (title is "Star Spirit Story," 星霊 varies as Star Spirit), right is with glossary (fixed as "Hoshirei Story," 星霊 unified as Hoshirei)

Comparison without/with glossary (Word, page 2)
Page 2: Left is without glossary, right is with glossary. In the glossary explanation table, 共鳴進化 becomes Reso-Evolution, 星導士 becomes Starwright, and 共鳴結界 becomes Reso-Barrier, with all registered coined terms fixed to their designated translations

The title changed from "Star Spirit Story" to "Hoshirei Story," and 星霊, which had been split into 4 variants, was unified to Hoshirei.
Other coined terms were also fixed to their registered translations: 守護者 became Warden, 共鳴進化 became Reso-Evolution, 輝光石 became Lumina Shard, and 雷狼ボルテ became Voltefang.
The glossary was applied not only to the body text but also to table cells (text boxes were already excluded from translation to begin with, so they are also outside the scope of the glossary).

However, when checking the full output with the glossary (per Claude's analysis), it was not 100% consistent.
One occurrence of 星霊 remained as star spirits instead of being fixed to Hoshirei.
In the body text "Hoshirei become Warden who form bonds with these star spirits," within the same sentence, one 星霊 was fixed to Hoshirei and 守護者 was fixed to Warden, but the other 星霊 remained as star spirits.
Since the glossary applies matches on a per-occurrence basis, it seems that occasionally the same word may be missed depending on the surrounding context.
That said, the vast majority were fixed, so the practical conclusion is "nearly unified, but not guaranteed to be zero missed cases."

6. Pricing and Processing Time

The pricing for document translation varies by the translation model used.
For the standard NMT model, document translation is billed per page.

Item Unit Price
NMT document translation $0.08 / page

Reference: Pricing Page: Pricing | Google Cloud

The sample in this article (a DOCX of a few pages) cost well under $1 even when combining both with and without glossary results.
Measured processing time was approximately 1–2 seconds.

7. Attribution ("Machine Translated by Google")

In the PDF edition, a "Machine Translated by Google" attribution appeared in the upper left of the translated PDF.
This time with Word (DOCX), even though no attribution was specified, this label was not visible in the translated file.
Since the same conditions yielded different behavior between PDF and Office formats, I looked into the specification.

The attribution text can be specified as customizedAttribution in the API request, and the default when not specified is "Machine Translated by Google."
This field is not PDF-specific; it is common to the entire translateDocument (document translation) feature.

customizedAttribution string

Optional. This flag is to support user customized attribution. If not provided, the default is Machine Translated by Google. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logos

Reference: API Reference: Method: projects.locations.translateDocument | Google Cloud

The important thing to note here is that customizedAttribution only describes the "attribution text," and there is no explicit statement in the official documentation about how or whether that text is embedded in the output depending on the file format.
The reason why the presence of attribution differs by format could not be confirmed from official sources.
Based on what I actually ran, the attribution was embedded in the PDF output but was not present in Office format outputs (DOCX/XLSX/PPTX).
My hypothesis is that PDFs are reconstructed by overlaying the translation on the page to preserve layout, which is a fundamentally different output construction method from editable Office formats, but this is not backed by any official documentation.

Another separate aspect worth noting is the explicit attribution requirement in the brand guidelines.
This is distinct from "whether it is embedded in the output file" and requires that whenever translation results are shown to users, it must be made clear that they are viewing machine translations, regardless of format.

Whenever you display translation results from Google Translate directly to users, you must make it clear to users that they are viewing automatic translations from Google Translate using the appropriate text or brand elements.

Reference: Brand Guidelines: Attribution requirements | Google Cloud

In other words, while it is not a problem in itself that Office format outputs do not have attribution embedded, when publishing or distributing translation results, it is the user's responsibility to make clear that the content is machine-translated, regardless of the format.

8. Summary

Cloud Translation API's Document Translation can translate a Word (DOCX) file as-is, preserving the layout of body text, tables, multi-column layouts, colors, and headers/footers, and using a glossary can fix the translation of proper nouns for the most part.
As explicitly stated in the official documentation, the content inside text boxes and shapes is not translated and remains in the source language.
It is worth considering placing text that needs to be translated in the body text or tables rather than in text boxes.

Note that how text boxes are handled differs by format.
I will also be writing blog posts covering the Excel and PowerPoint editions, so please look forward to those.

I hope this article is helpful for those considering automating the translation of Word documents.

References

Share this article