What if your system could convert text to speech in a way that a real human can? If this still reminds of you the legacy speech conversion software that was doing rounds decades ago, then you are in for a surprise.
Tacotron 2, Google’s AI-based text to speech system does exactly this by creating a spectrogram of the text, a visual representation of how the speech should sound and mimicking human speech. According to Google researchers, the model achieves a mean opinion score (MOS) of 4.53 which is comparable to a score of 4.58 for professionally recorded speech.
CEO Sundar Pichai announced the company’s shift from mobile-first to AI first focus at the Google I/O conference in 2017 and this product is part of the upcoming line-up. The AI is at a stage where it can detect the contextual difference between the noun ‘desert’ and the verb ‘desert’ and alter pronunciation accordingly. It can also place emphasis on certain words and apply inflexions to differentiate between questions and statements.
This is a step forward in human-computer interaction, eliminating the need for traditional input devices. Reminds you of the movie ‘Her’?