Microsoft has introduced a new AI model that can replicate any voice from a 3-second sample. This model has been named VALL-E. This model is based on the research being done on Text-to-Speech AI.
Microsoft has introduced an artificial intelligence device named VALL-E, taking its step into the AI world. The new AI model is part of Microsoft's latest research into text-to-speech AI. This model can make a copy of any voice even with a sample of only 3 seconds.
Can copy voice from a 3-second sample
While there are already several services that can create copies of your voice, they usually require a fair amount of input. Microsoft claims that its model can mimic someone's voice with just three seconds of the audio sample.
Information found in the report
A report by Windows Central revealed that this AI tool was trained on 60,000 hours of English speech data and short clips of specific voices to improve the content. The report also states that while some of the recordings sound natural, others sound like they were created by a robot or machine. Whereas if the AI model is supplied with a larger sample set, VALL-E may be able to create more realistic samples.
What can be the disadvantages?
VALL-E can have many positive uses, including in the production industry. But it can also have some dangers. People can use VALL-E to make spam calls to defraud genuine and unknown users. Apart from this, the voice of politicians or people with social influence can also be copied. It also poses a threat in cases where the voice is used as a password.