Impressive! I guess the speech synthesis quality is the best available open sour... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		IshKebab 8 months ago \| parent \| context \| favorite \| on: Show HN: Real-time AI Voice Chat at ~500ms Latency Impressive! I guess the speech synthesis quality is the best available open source at the moment? The endgame of this is surely a continuously running wave to wave model with no text tokens at all? Or at least none in the main path.

koljab 8 months ago [–]

This is coqui xttsv2 because it can be tuned to deliver the first token in under 100 ms. Gives the best balance between quality and speed currently imho. If it's only about quality I'd say there are better models out there.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact