I tried translating an entire Word document with the Cloud Translation API

I tried translating an entire Word document with the Cloud Translation API

I tried using the Cloud Translation API's document translation feature to translate a Word (DOCX) file into English while preserving the layout. I also verified how text boxes are handled and how to fix proper nouns using a glossary.
2026.06.03

This page has been translated by machine translation. View original

Hi, I'm Keema.

There are many situations where you want to translate a proposal or manual created in Word into another language while preserving the layout.
For example, you might need to translate a document with tables, figures, and text boxes into English without breaking the appearance.
However, if you extract only the text from Word and translate it, tables and multi-column layouts break and the original appearance cannot be reproduced.
Furthermore, unique proper nouns such as product names and character names often need to be consistently translated to a fixed term every time.

Previously, I wrote an article about translating PDFs using the Document Translation feature of Google Cloud's Cloud Translation API.
In this article, I used the same feature to translate a Word (DOCX) file directly, and verified how well the layout is preserved, how text boxes and tables are handled, and whether custom terminology can be fixed using a glossary — all by actually running it.

This article is the Word edition of a series verifying document translation by format.
The specifications and behavior were confirmed in Google Cloud's official documentation, with relevant sections quoted.
I then verified whether the behavior matches the official documentation by actually running it.

Series Articles

Format Article
PDF Edition Cloud Translation API で PDF をレイアウトを保ったまま翻訳する
Word Edition (this article)

Target audience: Those considering automating the translation of Word documents

1. Conclusion: Official Documentation vs. Verified Results

For those who want to know the conclusion quickly, here is a summary of the official Google Cloud documentation statements and the results actually confirmed through translation for Word (DOCX).
Detailed steps and before/after images are in §3 and beyond.

Aspect Official Documentation Statement Verified Result (confirmed in this article)
Body text, formatting, layout Translates while preserving format and layout Headings, body text, tables, multi-column layouts, colored text, and headers/footers were preserved
Text inside text boxes Not translated and remains in the source language As stated, remained in Japanese (all shapes: rectangle, rounded rectangle, and ellipse)
Text inside images (Not stated) Not translated (because it is not text data)
Glossary (fixing custom terms) Can fix translations using a glossary Almost consistently unified in both body text and tables
Regions where glossaries can be created Custom resources only in us-central1 Created in us-central1 (Tokyo region not available)

The official statements are quoted in the relevant sections throughout each section.
The most important point is that the content inside text boxes is not translated.
This is a behavior explicitly stated in the official documentation, and it was reproduced exactly as described.
If you're in a hurry, reviewing this table and the images in §3 will give you an overview.

2. What Is Document Translation in Cloud Translation API?

Cloud Translation API has a feature called Document Translation, separate from the plain text translation feature, which translates entire files.
When you pass a PDF or DOCX file, it returns the translated result while preserving the formatting and layout.
An overview of this feature, the differences between Basic (v2) and Advanced (v3), how to choose between synchronous and batch processing, and authentication (API keys cannot be used; ADC or a service account must be used) were covered in detail in the previous PDF edition.

Reference: DevelopersIO: Cloud Translation API で PDF をレイアウトを保ったまま翻訳する

Supported input formats include not only PDF but also Word, PowerPoint, and Excel formats (including legacy formats).
The Word format covered in this article, DOCX, is one of them.

Input Format MIME Type Output Format
DOCX application/vnd.openxmlformats-officedocument.wordprocessingml.document DOCX
DOC application/msword DOC or DOCX

Reference: Official Documentation: Translate documents | Google Cloud

For Word, the official documentation explicitly states one important caveat.
That is, text inside text boxes is not translated and remains in the source language.

Content inside text boxes aren't translated and remain in the source language.

Reference: Official Documentation: Translate documents | Google Cloud

This behavior was reproduced exactly in the verification in this article (confirmed in §4).

3. Preparing for Verification

3.1 Environment Prerequisites

The verification in this article was run in the following environment.

Item Environment Used
OS macOS
Python 3.12.x (with a venv created inside)
Google Cloud SDK (gcloud) 565.0.0
google-cloud-translate 3.26.0
GCP Project Personal project (billing enabled)

In addition, the following prerequisites are assumed to be in place.

  • A Google Cloud project with billing enabled
  • The gcloud command is available
  • IAM permissions to manage translations and glossaries (equivalent to roles/cloudtranslate.editor; write permissions to the target bucket are also required when using a glossary)

3.2 Sample Document (Original Fictional Anime)

For verification purposes, I had Claude create a Word document styled as official setting materials for a fictional anime called "Hoshirei Monogatari: Lumina Chronicle."
This is entirely original content with no relation to any real works, persons, or organizations.
It shares the same world as the PDF edition, with matching proper nouns (coined terms).

This document intentionally includes elements that are prone to layout breakage and elements I wanted to verify for translation.

  • Headings, body text, multiple tables (Hoshirei Codex, glossary, broadcast information), two-column layout (multi-column)
  • Colored text, hyperlinks, headers/footers
  • Text boxes (shapes) with rounded rectangles, rectangles, and ellipses containing text (for verifying text box translation behavior)
  • Coined terms such as 星霊, 共鳴進化, and 雷狼ボルテ (for verifying glossary behavior)

The Word document translated this time consists of 4 pages in total.
All pages before translation are shown below.

Pre-translation Word page 1 (Japanese)
Pre-translation Word page 1: Title, key visual image, orange-bordered text box ("Highlights"), two-column world setting, Hoshirei Codex table

Pre-translation Word page 2 (Japanese)
Pre-translation Word page 2: Continuation of the Hoshirei Codex table, glossary table, synopsis

Pre-translation Word page 3 (Japanese)
Pre-translation Word page 3: Popularity ranking bar chart image, production notes text box, rectangle/rounded rectangle/ellipse shapes

Pre-translation Word page 4 (Japanese)
Pre-translation Word page 4: Broadcast information table, footnotes, footer

3.3 Environment Setup

The setup flow is the same as in the PDF edition.

First, enable the Cloud Translation API.

gcloud services enable translate.googleapis.com --project <YOUR_PROJECT_ID>

Next, authentication.
Document Translation (v3 / Advanced) does not support API keys, so ADC (Application Default Credentials) is used.

gcloud auth application-default login
gcloud auth application-default set-quota-project <YOUR_PROJECT_ID>

Create a virtual environment (venv) and install the library inside it.

python3 -m venv .venv
source .venv/bin/activate
pip install google-cloud-translate

4. Translating Word (DOCX)

Let's try synchronous translation, which loads the file's byte stream directly into the request.
The script used for translation determines the MIME type from the file extension and can handle PDF, DOCX, XLSX, and PPTX with the same code.
PDF-specific options (native detection and skew correction) are only added for PDFs and are not included for Office formats.

Full text of translate_document_handson.py (click to expand)
from __future__ import annotations

import argparse
import time
from pathlib import Path

from google.cloud import translate_v3 as translate

# Glossaries and custom models must be placed in us-central1.
DEFAULT_LOCATION = "us-central1"

# Extension → MIME type (Document Translation supported formats)
MIME_BY_SUFFIX = {
    ".pdf": "application/pdf",
    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    ".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
}

def mime_for(path: str) -> str:
    """Returns the MIME type based on the input file extension. Raises ValueError for unsupported extensions."""
    suffix = Path(path).suffix.lower()
    if suffix not in MIME_BY_SUFFIX:
        raise ValueError(f"Unsupported extension: {suffix} (supported: {', '.join(MIME_BY_SUFFIX)})")
    return MIME_BY_SUFFIX[suffix]

def parse_args() -> argparse.Namespace:
    p = argparse.ArgumentParser(description="Cloud Translation API Document Translation (synchronous)")
    p.add_argument("--project", required=True, help="GCP project ID")
    p.add_argument("--input", required=True, help="Input file path")
    p.add_argument("--output", required=True, help="Output file path (result without glossary)")
    p.add_argument("--source", default="ja", help="Source language code (default: ja)")
    p.add_argument("--target", default="en", help="Target language code (default: en)")
    p.add_argument("--location", default=DEFAULT_LOCATION, help=f"Location (default: {DEFAULT_LOCATION})")
    p.add_argument("--glossary-id", default=None, help="Glossary ID (when specified, outputs both with and without glossary)")
    return p.parse_args()

def build_request(args: argparse.Namespace, content: bytes) -> dict:
    """Builds the request dictionary for translate_document.

    Specifying the source language is required when using a glossary (official spec).
    """
    parent = f"projects/{args.project}/locations/{args.location}"
    mime_type = mime_for(args.input)
    request: dict = {
        "parent": parent,
        "source_language_code": args.source,
        "target_language_code": args.target,
        "document_input_config": {"content": content, "mime_type": mime_type},
    }
    if args.glossary_id:
        glossary_path = (
            f"projects/{args.project}/locations/{args.location}"
            f"/glossaries/{args.glossary_id}"
        )
        request["glossary_config"] = translate.TranslateTextGlossaryConfig(glossary=glossary_path)
    return request

def write_bytes(path: str, data: bytes) -> None:
    Path(path).parent.mkdir(parents=True, exist_ok=True)
    Path(path).write_bytes(data)

def main() -> None:
    args = parse_args()
    content = Path(args.input).read_bytes()
    print(f"Input: {args.input} ({len(content):,} bytes, {args.source}{args.target})")

    client = translate.TranslationServiceClient()
    request = build_request(args, content)

    started = time.perf_counter()
    response = client.translate_document(request=request)
    elapsed = time.perf_counter() - started

    base = response.document_translation
    write_bytes(args.output, base.byte_stream_outputs[0])
    print(f"Processing time: {elapsed:.2f} seconds")
    print(f"Output (without glossary): {args.output} ({len(base.byte_stream_outputs[0]):,} bytes)")

    # With glossary: a single call returns a separate output in glossary_document_translation
    if args.glossary_id and response.glossary_document_translation.byte_stream_outputs:
        out = Path(args.output)
        glossary_out = str(out.with_name(f"{out.stem}_glossary{out.suffix}"))
        write_bytes(glossary_out, response.glossary_document_translation.byte_stream_outputs[0])
        print(f"Output (with glossary): {glossary_out}")

if __name__ == "__main__":
    main()

First, translate the Word document from Japanese to English without a glossary.

python translate_document_handson.py \
    --project <YOUR_PROJECT_ID> \
    --input hoshirei_ja.docx \
    --output hoshirei_en.docx
# Example output
Input: hoshirei_ja.docx (125,305 bytes, ja→en)
Processing time: 1.30 seconds
Output (without glossary): hoshirei_en.docx (119,172 bytes)

A multi-page DOCX was translated in just over a second.
Let me compare all 4 pages before and after translation (without glossary).

Before/after translation comparison (Word, page 1)
Page 1: Left is before translation (Japanese), right is after translation (without glossary). Body text, headings, colored text, and headers/footers are translated into English, but the "Highlights" text box remains in Japanese

Before/after translation comparison (Word, page 2)
Page 2: Left is before translation, right is after translation. The Hoshirei Codex table, glossary table, and synopsis are translated into English while preserving the layout

Before/after translation comparison (Word, page 3)
Page 3: Left is before translation, right is after translation. Body text is translated into English, but the production notes and the content inside the rectangle, rounded rectangle, and ellipse shapes remain in Japanese without being translated

Before/after translation comparison (Word, page 4)
Page 4: Left is before translation, right is after translation. The broadcast information table, footnotes, and footer are translated into English

Headings, body text, colored text, two-column layout, tables, and even headers/footers — the layout is almost entirely preserved.

What stood out in particular for formatting preservation was the handling of background colors and text colors.
In sections where the text color changed mid-sentence or where background colors were applied, those colors were carried over exactly as-is after translation.
Portions that were bold for emphasis also remained bold after translation.
This shows that the translation preserves not just the text replacement but also character-level formatting information.

On the other hand, elements that were not translated were clearly delineated.

First, text boxes.
The orange-bordered shape on page 1 ("Highlights"), the production notes text box on page 3, and the rectangle, rounded rectangle, and ellipse shapes — the text inside all of them remained in Japanese.
This matches the official statement quoted in §2: "Content inside text boxes aren't translated and remain in the source language."
Note that in the PDF edition, font sizes were reduced to fit the layout to match the pre-translation appearance, but in the Word case, font sizes did not change and page breaks shifted slightly.
However, page breaks occurred at natural positions and there were no issues with the content.

The other element was the bar chart inserted as an image.
The popularity ranking chart on page 3 was inserted as an image, so it was not subject to translation.
Text inside images is not text data and cannot be handled by Document Translation.

5. Fixing Custom Terms with a Glossary

Let's verify whether proper nouns and custom terms can be fixed to a consistent translation.
To use a glossary, you need to (1) prepare a TSV with source and target term pairs, (2) upload it to Cloud Storage, and (3) create a glossary resource.
By specifying the created glossary at translation time, registered terms will be aligned to their fixed translations.

5.1 Preparing the Glossary TSV

The glossary is prepared as a TSV file with one term pair per line, tab-separated, with the source language (Japanese) on the left and the target language (English) on the right.
No header row is needed; the left side is the coined source term and the right side is the fixed translation.
For this verification, I prepared 20 coined terms from the sample as glossary_ja_en.tsv.

星霊	Hoshirei
共鳴進化	Reso-Evolution
輝光石	Lumina Shard
雷狼ボルテ	Voltefang
焔狐ココ	Pyrofox Coco
水亀アクオ	Aquortle
草鹿リーフィ	Leafawn
輝竜ルミナ	Lumidragon
月読の祠	Moonread Shrine
守護者	Warden
星導士	Starwright
星霊守護協会	Hoshirei Warden Guild
共鳴値	Reso-Value
共鳴の灯	Resonance Flame
絆ゲージ	Bond Gauge
星霊酔い	Hoshirei-sickness
星霊図鑑	Hoshirei Codex
ルミナ群島	Lumina Archipelago
七つの祠	Seven Shrines
共鳴結界	Reso-Barrier

5.2 Uploading the TSV to Cloud Storage and Creating the Glossary Resource

Rather than uploading the TSV directly to create the glossary resource, you first place it in Cloud Storage and then specify that GCS URI to create the resource.
Create the bucket in the same us-central1 region as the glossary.

# Create bucket (skip if already exists)
gcloud storage buckets create gs://<YOUR_BUCKET> \
    --project <YOUR_PROJECT_ID> --location us-central1

# Upload the TSV
gcloud storage cp glossary_ja_en.tsv \
    gs://<YOUR_BUCKET>/glossaries/glossary_ja_en.tsv

Next, create the glossary resource from the uploaded TSV.
Creation is a long-running operation (LRO), so run it from the client library and wait for completion.
Save the following code as setup_glossary.py and run it with the venv from §3.3 still active.

Full text of setup_glossary.py (click to expand)
from google.cloud import translate_v3 as translate

PROJECT_ID = "<YOUR_PROJECT_ID>"
LOCATION = "us-central1"   # Glossaries are only available in us-central1
GLOSSARY_ID = "hoshirei-ja-en"
INPUT_URI = "gs://<YOUR_BUCKET>/glossaries/glossary_ja_en.tsv"

client = translate.TranslationServiceClient()
name = client.glossary_path(PROJECT_ID, LOCATION, GLOSSARY_ID)
glossary = translate.Glossary(
    name=name,
    # Unidirectional (ja→en) glossary
    language_pair=translate.Glossary.LanguageCodePair(
        source_language_code="ja", target_language_code="en"
    ),
    input_config=translate.GlossaryInputConfig(
        gcs_source=translate.GcsSource(input_uri=INPUT_URI)
    ),
)
parent = f"projects/{PROJECT_ID}/locations/{LOCATION}"
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(180)   # Wait up to 180 seconds for completion
print(f"Creation complete: {result.name} (entry count: {result.entry_count})")
python setup_glossary.py
# Example output
Creation complete: projects/.../locations/us-central1/glossaries/hoshirei-ja-en (entry count: 20)

The official documentation also explicitly states that custom resources must use us-central1.

Note: All of your resources in a single request to Cloud Translation - Advanced must have the same location. Currently, only global and us-central1 locations are supported. For all custom resources—AutoML models, glossaries, long-running-operations—you must use us-central1.

Reference: Official Documentation: Migrate to Cloud Translation - Advanced (v3) | Google Cloud

5.3 Comparing Results With and Without a Glossary

Translate with the created glossary specified.
Passing the glossary ID created in 5.2 to --glossary-id returns both "without glossary" and "with glossary" results in a single response.

python translate_document_handson.py \
    --project <YOUR_PROJECT_ID> \
    --input hoshirei_ja.docx \
    --output hoshirei_en.docx \
    --glossary-id <YOUR_GLOSSARY_ID>

Looking first at the result without a glossary, the translations of proper nouns were quite inconsistent (per Claude's analysis).
Out of 26 occurrences of 星霊 in the source, the breakdown of main translations in the English output without a glossary was as follows.

Translation of 星霊 (without glossary) Occurrences
Star Spirit 8
star spirits 6
Celestial Spirit 3
star spirit 3
Other variations Several

The same word ended up split into multiple variations, including differences in capitalization and singular/plural form.
Machine translation chooses the most contextually appropriate translation each time, so without a mechanism to align proper nouns, this is the result.

Switching to the glossary-enabled result, these were neatly unified.
The places where translations actually changed were the title page where proper nouns appear frequently (page 1) and the glossary table (page 2).

Comparison without/with glossary (Word, page 1)
Page 1: Left is without glossary (title is "Star Spirit Story," with inconsistent translations of 星霊 as Star Spirit), right is with glossary ("Hoshirei Story" is fixed and 星霊 is unified as Hoshirei)

Comparison without/with glossary (Word, page 2)
Page 2: Left is without glossary, right is with glossary. In the glossary table, 共鳴進化 becomes Reso-Evolution, 星導士 becomes Starwright, and 共鳴結界 becomes Reso-Barrier — all coined terms are fixed to their registered translations

The title changed from "Star Spirit Story" to "Hoshirei Story," and 星霊, which had been split into 4 different translations, was unified to Hoshirei.
Other coined terms were also fixed to their registered translations: 守護者 to Warden, 共鳴進化 to Reso-Evolution, 輝光石 to Lumina Shard, and 雷狼ボルテ to Voltefang.
The glossary was applied not only to body text but also to table cells (text boxes are excluded from translation in the first place, so they are also outside the scope of glossary application).

However, checking the full glossary-enabled output revealed it was not a perfect 100% (per Claude's analysis).
Just one occurrence of 星霊 remained as star spirits instead of being fixed to Hoshirei.
In the body text "Hoshirei become Warden who form bonds with these star spirits," within the same sentence 星霊 was fixed to Hoshirei and 守護者 was fixed to Warden, yet one more 星霊 remained as star spirits.
Since the glossary applies by matching each occurrence based on surrounding context, occasionally the same term may be missed depending on the context.
That said, the vast majority were fixed correctly, so the actual outcome was "almost unified, but with occasional misses."

6. Pricing and Processing Time

The pricing for document translation varies by the translation model used.
For the standard NMT model, document translation is priced per page.

Item Unit Price
NMT document translation $0.08 / page

Reference: Pricing Page: Pricing | Google Cloud

For this sample (a DOCX of several pages), the combined cost with and without the glossary was well under $1.
Processing time was approximately 1–2 seconds as measured.

7. About Attribution ("Machine Translated by Google")

In the PDF edition, a "Machine Translated by Google" attribution was added to the upper left of the translated PDF.
For this Word (DOCX) translation, even though no attribution was specified, this text was not found in the translated file.
Since the presence or absence of the attribution differed between PDF and Office under the same conditions, I checked the specifications.

The attribution text can be specified in the API request as customizedAttribution, and the default when not specified is "Machine Translated by Google."
This field is not PDF-specific; it is common to the entire translateDocument (document translation) feature.

customizedAttribution string

Optional. This flag is to support user customized attribution. If not provided, the default is Machine Translated by Google. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logos

Reference: API Reference: Method: projects.locations.translateDocument | Google Cloud

What's important to note here is that customizedAttribution only describes the "attribution text" itself — there is no official documentation explicitly stating how or whether that text is reflected in the output (i.e., whether it is burned into the file) for each format.
The reason why the presence of attribution differs by format could not be confirmed in the official documentation.
Based on what was actually observed, the attribution was burned into the PDF output but did not appear in Office format outputs (DOCX/XLSX/PPTX).
This is presumably because PDF reconstructs translated text overlaid on the page to preserve layout, and the output is produced differently from editable Office formats — but this is speculation and is not backed by any official statement.

Another point to keep in mind, from a different angle, is the explicit disclosure requirement under the brand guidelines.
This is separate from the question of whether attribution is burned into the output file, and it requires that whenever translation results are shown to users, regardless of format, it must be made clear that the content is a machine translation.

Whenever you display translation results from Google Translate directly to users, you must make it clear to users that they are viewing automatic translations from Google Translate using the appropriate text or brand elements.

Reference: Brand Guidelines: Attribution requirements | Google Cloud

In other words, the fact that attribution is not burned into Office format outputs is not itself a problem, but when publishing or distributing translation results, the responsibility to clearly indicate that the content is a machine translation lies with the user, regardless of format.

8. Summary

Cloud Translation API's Document Translation can translate a Word (DOCX) file directly while preserving the layout of body text, tables, multi-column layouts, colors, and headers/footers, and using a glossary allows proper nouns to be almost consistently fixed.
As explicitly stated in the official documentation, content inside text boxes and shapes is not translated and remains in the source language.
It is worth considering placing text that needs to be translated in the body text or tables rather than in text boxes.

Note that the handling of text boxes differs by format.
I will also write blog posts covering Excel and PowerPoint editions in the future, so please check those out.

I hope this article is helpful for those considering automating the translation of Word documents.

References

Share this article