Create a fully automated YouTube video with Text-to-speech services & comparison of Amazon Polly vs. IBM Watson

2020.05.06

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Intro.

Text to speech (TTS) is a popular area in machine learning. As technology evolves, the option of TTS has increased drastically. In recent years, cloud computing companies have improved TTS with the growth of big data and artificial intelligence applications. I will compare 2 TTS that I used to create AWS tutorials on YouTube. The tutorials can be found in this playlist.

 

Nowadays, big cloud computing companies provide APIs for speech recognition services makes it easy to use. Contrary to open sources TTS services, TTS APIs provided by cloud computing companies ensures that personal data remains within the user account. I will share my experience with Amazon Polly and IBM Watson here in this article. Note that I have used the Demo version of IBM Watson and no personal data is involved.

Link to IBM Watson Text to Speech:

https://www.ibm.com/cloud/watson-text-to-speech 

Link to Amazon Polly:

https://aws.amazon.com/polly/

Other free TTS service:

https://text-to-speech.imtranslator.net/

https://activ8.co.jp/

1. IBM Watsons

The first 3 videos were created with IBM Watsons Text to Speech service. This is the link to the Demo version I used to create the tutorial videos.

https://text-to-speech-demo.ng.bluemix.net/

Features:

  • 14 languages & variations — 27 voices (13 neural and 14 standard) across 7 languages

Pricing:

  • Lite plan gives you 500 Minutes per month free
  • Standard plan starting from $0.02USD/Minute

Pro

  • Doesn’t require to create an account
  • Source code can be forked from GitHub

Con

  • Cannot resolve abbreviations such as AWS, IAM. Work Around, type “A” “W” “S” to force IBM to spell out each alphabet
  • The downloaded file doesn’t come with a file extension, thus, require to append “.mp3” to each “synthesize” file manually

An example of speech using IBM Watson can be found in this video:

2. Amazon Polly

Amazon Polly doesn’t have a demo site, therefore, it requires login to an AWS account. For those of you who haven't sign up for an AWS account, you can follow this tutorial to create an AWS account:

Features:

  • 29 languages & variations
  • Standard TTS voices, and Neural Text-to-Speech (NTTS) voices that improve speech quality for more natural and human-like voices.

Pricing

  • Pay-as-you-go model: Standard voices $4.00USD/1M characters, Neural voices $16.00USD/1M characters
  • Free Tier: Standard voices 5M characters/mn, Neural voices 1M characters /mn for first 12 months starting from first request for speech

Pro

  • Recognize more abbreviations “AWS”, “IAM”, “IBM”
  • Smoother speech
  • A wider selection of voices and languages
  • Download file saved as speech_YYYYMMDDTTTTTTTTT.mp3

Con

  • Doesn’t have a demo site, require to create an AWS account

An example of speech using Amazon Polly can be found in this video:

https://www.youtube.com/watch?v=qRaw92dgQ8o

Please support & follow us on Medium!