Best text to speech services

 

Text to speech originally began as a great way to enable people with either sight or reading difficulties to consume content orally. This is still a key role for these applications, however, there are now quite a number of new ways that TTS is now being applied. For example, education services use these applications as a great way to consume their educational content and courses. Businesses, are now using these applications as a way to communicate their services and product both online via bots or podcasts but, also in real-world spaces. Within the travel and tourism industry, many businesses and tourist bodies now use TTS to create easy to consume audio guides for cities, museums, etc. Within the marketing industry, they are a great way for clients to extend the reach of their product and marketing information as businesses can now transform their online content into podcasts that customers can then access whenever and wherever is convenient for them. The truth is that we are just at the beginning of the TTS revolution and already many significant industries and social enterprises are finding important and valuable roles for text to speech applications that help engage audiences in ways more convenient to the end-users.

 

All of the technology major players in the market now provide text to speech services. From Amazon to Google, Microsoft, and IBM. All of these providers have been active in this market for a number of years and have built robust and scalable services. They all have different strengths and weaknesses. However, it is fair to say that overall Amazon and Google are the stronger services. As their footprint in the audio smart speaker market has enabled them to create accurate and sophisticated text to audio services.

 

Amazon Polly

Polly’s Text To Speech service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications, services and audio content platforms that work in many different countries.

 

In addition to Standard Text to Speech voices, Amazon Polly offers Neural Text to Speech voices that deliver advanced improvements in speech quality through a new machine learning approach. Polly’s Neural Text to Speech technology also supports two speaking styles that allow you to better match the delivery style of the speaker to the application. For example, a Newscaster reading style that is tailored to news narration use cases. And a Conversational speaking style that is ideal for two-way communication like telephony applications.

 

Amazon Polly includes dozens of lifelike voices and support for a variety of languages. So you can select a suitable voice and distribute your speech-enabled applications, services or audio content in many countries.

For a full list of voices go to:

 

https://aws.amazon.com/polly/features/?nc=sn&loc=3

 

Pros

 

  • Quality of the voices
  • Accuracy of transcoding

Cons

 

  • Limited number of voices in neural text to speech voices
  • Limited range of voices outside the five major world languages

 

Google Text to Speech

Google Cloud Text-to-Speech enables developers to synthesise natural-sounding speech with 220+ voices and is available in multiple languages. It applies DeepMind’s ground-breaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible. As an easy-to-use API, a user can create lifelike interactions with their users, across many applications and devices

The 220+ voices go across 40+ languages and variants, including Mandarin, Hindi, Spanish, Arabic, Russian, and more.

 

For a full list of voices here go here:

 

https://cloud.google.com/text-to-speech/docs/voices

 

Platform provides over 90+ WaveNet voices based on DeepMind’s research to generate voices that significantly close the gap with human performance. It can also personalise the pitch of a selected voice, up to 20 semitones more or less from the default. Ability to customise speech with speech synthesis markup language tags that allow a customer to add pauses, numbers, date and time formatting, and other pronunciation instructions.

 

Pros
• Number of voices to choose
• Number of WaveNet voices deployed

 

Cons
• Voice quality can vary across their full range of voice

IBM Watson

IBM Watson’s is an API cloud service that enables a customer to convert written text into natural-sounding audio in a variety of languages and voices within an existing application or within Watson Assistant.  

 

It has controllable speech attributes which enable a user to easily adjust pronunciation, volume, pitch, speed using Speech Synthesis Markup Language. Has the ability to personalise voice quality by specifying attributes such as strength, pitch, breathiness, rate, timbre, and more. Provides strong data governance practices based on IBM’s long-term corporate governance practises

 

IBM Watson provides 32 to different voices. To find out more go here:

 

https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices

 

Pros

 

  • Data security
  • Ability to personalise the voice

Cons

 

  • Accuracy when transcoding Text to Speech
  • Limited number of voices to choose from
  • Processing of text to speech can be slow

Microsoft

Microsoft’s platform enables clients to build apps and services that speak naturally, choosing from more than 215 voices and 60 languages and variants. A customer can differentiate its brand with a customized voice, and access voices with different speaking styles and emotional tones to fit your use case—from text readers to customer support chatbots.

It is flexible to deploy as it can Run Text to Speech anywhere in the cloud or at the edge in containers providing fine-grained audio controls. It also tunes voice outputs for different customer scenarios by easily adjusting rate, pitch, pronunciation, pauses, and more. Microsoft’s neural Text to Speech supports several speaking styles, including chat, newscast, and customer service, and emotions like cheerfulness and empathy.

There are over 215 voices to choose from. To find out more go here:

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech

 

Pros
• Range of voices
• Flexibility of deployment

 

Cons
• Accuracy of transcoding
• Voice quality, especially for regional voicesp

Summary

All of the providers have robust and scalable services. However, each of the providers has different strengths and weaknesses so it is best to look to match the service that most suits your use case. 

The most important first step is therefore to establish the role TTS plays in your business or social enterprise as that will both ensure you get appropriate value from the application you select but also that you select the application that best meets those needs.

 

Overall Amazon and Google, currently, provide the most robust applications, however, all of the providers have different strengths, and all of these providers are well-established players in this market.

 

Share on facebook
Share on twitter
Share on linkedin
Dominic Finney

Best text to speech services

  Text to speech originally began as a great way to enable people with either sight or reading difficulties to consume content orally. This is

Dominic Finney

Voice Marketing Strategy

The sheer number of people listening to audio. And the range of topics that audio platforms cover has meant that many brands are now prioritising

Dominic Finney

How to start a business podcast

Podcasting is a great way for businesses to reach and engage new audiences.

The total podcast audience is significant with over 1 billion people globally. And the long-form audio format means that businesses can really engage their customers wherever and whenever they want to consume the content. When starting a business podcast you essentially have two options. A business can either create and record an in-house podcast. Or a business can use existing content from their website that they then create a podcast out of. Your choice will depend on what your objectives are for the podcast. Who your target audience is? And the amount of time and resource you have available to create the podcast?

Scroll Up