That's a real problem with the vast majority of current TTS. Terrible in things ...

Uehreka · on Dec 13, 2024

This is why it pisses me off when a techy person makes a YouTube video and just uses TTS instead of recording a voiceover. I know some people don’t have a good recording situation, but I get the sense that a lot of people just do it because they either think people can’t tell or they think it’s a clever hack or “the way of the future” or something.

It isn’t. Instead I find myself watching videos and getting a weird creepy feeling when I suddenly hear the voiceover mispronounce a word or put an emphasis in the wrong place. Part of it is the uncanny valley for sure, but the more pernicious thing is this: once I realize that the voice is AI-generated, I start to worry that the script might be too. Now I’m trying to figure out “is this guy just an amateur writer taking a while to get to his point, or is this an LLM-authored script that is never going to go beyond surface-level statements about the topic.”

layer8 · on Dec 13, 2024

I don’t think this is about a “good recording situation”. It’s likely people who think they suck at speaking/narrating or think they have a horrible accent or want to remain anonymous or just find the process annoying, and find it less embarrassing/more privacy-preserving/less of a hassle to use an AI voice.

drtgh · on Dec 14, 2024

Those are the most likely.

Another factor, less common, is when you want or have to speak a non-native language you're not used to pronounce, in which case you're usually afraid of not being understood.

PS: I think all the Text To Speech systems sounds horrible, the last generations are even irritating, as the user of the parent commented.

Cpoll · on Dec 13, 2024

I think this undersells the difficulty of recording a good voice-over, both technically and performatively.

sandworm101 · on Dec 13, 2024

And on the high end, making everyone sound like a professional public speaker. The machine sees mistakes as errors when in fact every non-hollywood speech contains multiple mistakes.