Google launches DeepMind's AI-powered text-to-speech service

Image
IANS San Francisco
Last Updated : Mar 28 2018 | 3:45 PM IST

Google on Wednesday launched a voice synthesiser called "Cloud Text-to-Speech" which is powered by its Britain-based Artificial Intelligence (AI) subsidiary DeepMind.

The service is now available for developers to add it in their own applications.

A text-to-speech service is a form of speech synthesis that converts text into spoken voice output. Google's text-to-speech powers the voices in service like Google Assistant, Search and Maps.

"'Cloud Text-to-Speech' lets developers choose from 32 different voices from 12 languages and variants," Dan Aharon, Product Manager, Cloud AI, said in a blog post.

"Cloud Text-to-Speech" correctly pronounces complex text such as names, dates, times and addresses for authentic sounding speech, the company claimed.

It also allows developers to customise pitch, speaking rate and volume gain, and supports a variety of audio formats, including MP3 and WAV.

According to Google, "Cloud Text-to-Speech" can be used in a variety of ways, including to power voice response systems for call centres (IVRs) and enabling real-time natural language conversations, to enable Internet of Things (IoT) devices to talk back and to convert text-based media into spoken format.

Google said that "Cloud Text-to-Speech" includes a selection of high-fidelity voices built using WaveNet -- a neural network trained with a large volume of speech samples that is able to create raw audio waveforms from scratch.

DeepMind introduced the first version of WaveNet in late 2016.

WaveNet synthesises more natural-sounding speech and, on average, produces speech audio that people prefer over other text-to-speech technologies.

During training, the network extracts the structure of the speech, including tones and what shape a realistic speech waveform should have.

When given text input, the trained WaveNet model generates the corresponding speech waveforms, one sample at a time, achieving higher accuracy than alternative approaches.

Today's improved WaveNet model generates raw waveforms 1,000 times faster than the original model and can generate one second of speech in just 50 milliseconds.

The model also has higher-fidelity and is capable of creating waveforms with 24,000 samples a second.

"We have also increased the resolution of each sample from 8 bits to 16 bits, producing higher quality audio for a more human sound," Aharon added.

With these adjustments, the latest WaveNet model produces more natural sounding speech and people have given the new US English WaveNet voices an average mean-opinion-score (MOS) of 4.1 on a scale of one-five.

--IANS

sku/nks/bg

Disclaimer: No Business Standard Journalist was involved in creation of this content

*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

First Published: Mar 28 2018 | 3:40 PM IST

Next Story