How do you assert that the cloned voice has been truly permitted by the voice owner? I've had my voice cloned without my consent by other people using Descript and Eleven Labs.
I'd be curious what the false positive rate on that is. Can you clone anyone's by collecting a set of ten voices with unique timbre reading the required statement plus pitch control to get close enough? A hundred? Or can you trick the neural net by giving it something that sounds like white noise to humans until the NN triggers in the right way and goes "ok yep that's a match, you're authorised now"?
Probably not something we'll get to hear as part of the PR pitch.
Or is the consent statement the thing that will be cloned and is there no separate training audio? Then it might actually work and you'll just have to get close enough that the human you're trying to fool can't distinguish anymore (defeating the need for this tech in the first place, at least in targeted rather than automated cases).
Yeah, good point - don't know. When I tried I actually did get a (personal?) email saying that it didn't match closely enough. After uploading another sample (based on a different text) it went through.
I like your idea of just training on the consent text! That wasn't the case when I tried it as you needed around 3h (optimally) of training data.
With a couple soundalike voices and changing the pitch in Audacity? That's a far, far cry from doing cutting edge neural networks that clone voices with samples of less than half a minute.
If you mean the white noise, I meant that as a brute force attack because, to do it more targeted (to know what it'll accept as seeming like your target voice), you'd likely need their exact model rather than doing your own.
It's mentioned in the second demo video that they have a strict process to prevent cases like yours. I think Descript started asking for identity verification after its service was abused. This one probably has a similar process too.
What is your process for verifying consent?