Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impressive! I guess the speech synthesis quality is the best available open source at the moment?

The endgame of this is surely a continuously running wave to wave model with no text tokens at all? Or at least none in the main path.



This is coqui xttsv2 because it can be tuned to deliver the first token in under 100 ms. Gives the best balance between quality and speed currently imho. If it's only about quality I'd say there are better models out there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: