I used Amazon Translate to translate multiple source language documents into numerous destination languages.

2023.02.20

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

To connect with a worldwide audience of consumers, clients, and investors, businesses must translate business-critical information such as promotional materials, guidebooks, and online ordering into different languages. Determining the source language in each document before calling a translation task poses challenges.

Overview

The automated language recognition capability for batch translation tasks in Amazon Translate now allows you to translate a batch of documents in many languages ​​with a single translation job. This eliminates the requirement for you to organize the document translation procedure, which required the detection and classification of dominant languages. Amazon Translate also supports translation to several target languages ​​(up to 10 languages).

Automated source language detection for batch translation jobs enables you to translate documents written in many supported languages ​​in a single operation. You can also specify up to ten different languages ​​as targets. Amazon Translate determines the prevailing language in each of your source documents using Amazon Comprehend and utilizes it as the source language.

Create a batch translation job via the console

In this blog, we will use batch translation to automatically identify the source language and translate it into multiple languages ​​(Japanese and Spanish). The location of the input and output will be the Amazon S3.

NOTE: Batch translation is supported in the following AWS Regions

  • US East (N. Virginia)
  • US East (Ohio)
  • US West (Oregon)
  • Asia Pacific (Seoul)
  • Europe (Frankfurt)
  • Europe (Ireland)
  • Europe (London)

You may decide to choose the output it should be a formal tone or informal, also profanity masking for profane words or phrases can be supported.

Then, as part of the configuration, we create an Amazon Identity and Access Management (IAM) role. The role has access to both the input and output S3 buckets.
Upon the creation of the job, you may track the progress of the batch translation task in the Translation jobs area.

After the translation job is completed, check out the output S3 bucket location to confirm the translation job to their target language respectively.

The input consists of two files in two distinct languages, so the output document is expected to be four, each with two dominant language documents translated into two target languages.

Create a batch translation job via the AWS SDK

The batch translation call in Python Boto3 is used to translate documents in your source S3 bucket. Enter the following values:
InputDataConfig - Provide the location of your input documents in the S3 bucket.
OutputDataConfig - Provide the S3 bucket where your output documents will be stored.
DataAccessRoleArn - Construct an IAM role that grants Amazon Translate access to your input and output S3 buckets.
Use auto for source language code.
TargetLanguageCodes: You can specify up to ten target languages.
import boto3
client = boto3.client('translate')

def lambda_handler(event, context):

response = client.start_text_translation_job(

   JobName='Translation-job',
   InputDataConfig={
     'S3Uri': 's3://<<REPLACE-WITH-YOUR-INPUT-BUCKET>>/input',
  'ContentType': 'text/plain'
  },
  OutputDataConfig={
   'S3Uri': 's3://<<REPLACE-WITH-YOUR-OUTPUT-BUCKET>>/output'
  },
 DataAccessRoleArn='<<REPLACE-WITH-THE-IAM-ROLE-ARN>>',
 SourceLanguageCode='auto',
 TargetLanguageCodes=[
  'ja', 'es'
]
)