The Morning After: Microsoft’s VALL-E AI can replicate a voice from a three-second sample

by Shivendra Singh · January 11, 2023

Microsoft’s latest research in text-to-speech AI centers on a new AI model, VALL-E. While there are already multiple services that can create copies of your voice, they usually demand substantial input. Microsoft claims its model can simulate someone’s voice from just a three-second audio sample. The speech can match both the timbre and emotional tone of the speaker – even the acoustics of a room. It could one day be used for customized or high-end text-to-speech applications, but like deepfakes, there are risks of misuse.

Researchers trained VALL-E on 60,000 hours of English language speech from 7,000-plus speakers in Meta’s Libri-Light audio library. The results aren’t perfect: Some are tinny machine-like samples, while others are surprisingly realistic.

Microsoft isn’t making the code open source, possibly due to the inherent risks. In the paper, the company said: “Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating.”

We’ve all seen the 1992 movie Sneakers, right? Right?!

– Mat Smith

The Morning After: Microsoft’s VALL-E AI can replicate a voice from a three-second sample

You may also like...

Recent Articles

Categories

Our Latest Edition

Like Us On Facebook

The Morning After: Microsoft’s VALL-E AI can replicate a voice from a three-second sample

The biggest stories you might have missed

You may also like...

Destiny 2’s next-gen upgrade requires downloading the game again

Baidu made a smart cat shelter that uses AI to tell cats and dogs apart

Today I learned my baby’s Fisher-Price gamepad accepts the Konami Code

Recent Articles

Categories

Our Latest Edition

Like Us On Facebook