Create a fully automated YouTube video with Text-to-speech services & comparison of Amazon Polly vs. IBM Watson



Text to speech (TTS) is a popular area in machine learning. As technology evolves, the option of TTS has increased drastically. In recent years, cloud computing companies have improved TTS with the growth of big data and artificial intelligence applications. I will compare 2 TTS that I used to create AWS tutorials on YouTube. The tutorials can be found in this playlist.


Nowadays, big cloud computing companies provide APIs for speech recognition services makes it easy to use. Contrary to open sources TTS services, TTS APIs provided by cloud computing companies ensures that personal data remains within the user account. I will share my experience with Amazon Polly and IBM Watson here in this article. Note that I have used the Demo version of IBM Watson and no personal data is involved.

Link to IBM Watson Text to Speech: 

Link to Amazon Polly:

Other free TTS service:

1. IBM Watsons

The first 3 videos were created with IBM Watsons Text to Speech service. This is the link to the Demo version I used to create the tutorial videos.


  • 14 languages & variations — 27 voices (13 neural and 14 standard) across 7 languages


  • Lite plan gives you 500 Minutes per month free
  • Standard plan starting from $0.02USD/Minute


  • Doesn’t require to create an account
  • Source code can be forked from GitHub


  • Cannot resolve abbreviations such as AWS, IAM. Work Around, type “A” “W” “S” to force IBM to spell out each alphabet
  • The downloaded file doesn’t come with a file extension, thus, require to append “.mp3” to each “synthesize” file manually

An example of speech using IBM Watson can be found in this video:

2. Amazon Polly

Amazon Polly doesn’t have a demo site, therefore, it requires login to an AWS account. For those of you who haven't sign up for an AWS account, you can follow this tutorial to create an AWS account:


  • 29 languages & variations
  • Standard TTS voices, and Neural Text-to-Speech (NTTS) voices that improve speech quality for more natural and human-like voices.


  • Pay-as-you-go model: Standard voices $4.00USD/1M characters, Neural voices $16.00USD/1M characters
  • Free Tier: Standard voices 5M characters/mn, Neural voices 1M characters /mn for first 12 months starting from first request for speech


  • Recognize more abbreviations “AWS”, “IAM”, “IBM”
  • Smoother speech
  • A wider selection of voices and languages
  • Download file saved as speech_YYYYMMDDTTTTTTTTT.mp3


  • Doesn’t have a demo site, require to create an AWS account

An example of speech using Amazon Polly can be found in this video:

Please support & follow us on Medium!