I tried Hands-on on Amazon Polly

Hi, this is Charu from Classmethod. And in this blog, we will be discussing about Amazon Polly.

Amazon Polly is a cloud-based service that converts text into lifelike speech. It enables developers to add speech capabilities to various applications, including voice-enabled chatbots, automated call centres, e-learning platforms, and more. With dozens of lifelike voices in multiple languages, Polly allows you to create a natural and engaging experience for your users.

Now that we have Amazon Polly set up, let's dive into some practical examples of how to use it.

Let's get started: Generating Speech from Text

In this example, we'll use the AWS SDK to generate speech from a given text string. We will be proceeding by assuming Python as the programming language.

import boto3
session = boto3.Session(
    aws_access_key_id='YOUR_ACCESS_KEY',
    aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
    region_name='YOUR REGION' 
)
polly = session.client('polly')

To improve the security, instead of mentioning your credentials in the code as shown above, you can configure it in the terminal by typing the following command-

aws configure

To generate speech using Amazon Polly:

response = polly.synthesize_speech(
    Text='Hello, welcome to my blog!',
    OutputFormat='mp3',
    VoiceId='Joanna'
)

Save the synthesised speech to an MP3 file:

with open('output.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

By executing this code, we will generate an MP3 file containing the speech generated by Amazon Polly.

The above code will generate a mp3 file which we can listen by playing the file separately. But I want to play the sound immediately when I run the code. For this we can use the 'pydub' library provided by python. Add the following lines of code to the existing code.

from pydub import AudioSegment
from pydub.playback import play

audio = AudioSegment.from_file('output.mp3', format='mp3')

play(audio)

To execute 'pydub' we have to install 'ffmpeg' package. For macOS, you can install it as follows. This will take a few minutes to install.

brew install ffmpeg

Let's add one more feature - SSML. Speech Synthesis Markup Language (SSML) allows you to control various aspects of speech synthesis, including pronunciation, volume, and rate. To add this, we just have to update our 'Text' field and add a TextType field like this-

response = polly.synthesize_speech(
    Text='<speak>Hello, <prosody volume="x-loud">welcome</prosody> to my blog!</speak>',
    OutputFormat='mp3',
    VoiceId='Joanna',
    TextType='ssml'
)

Finally, the complete code will look like this-

import boto3
from pydub import AudioSegment
from pydub.playback import play


polly = boto3.client('polly')
response = polly.synthesize_speech(
    Text='<speak>Hello, <prosody volume="x-loud">welcome</prosody> to my blog!</speak>',
    OutputFormat='mp3',
    VoiceId='Joanna',
    TextType='ssml'
)

with open('output.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

audio = AudioSegment.from_file('output.mp3', format='mp3')

play(audio)

Do give a try and Thank You for your time.

Happy Learning:)