There are projects that will run Whisper or another transcription service locally on your computer, which has great quality. For whatever reason, Google chooses not to use their highest quality transcription models on YouTube, maybe due to cost.
I use Whisper running locally for automated transcription of many hours of audio on a daily basis.
For the most part, Whisper does much better than stuff I've tried in the past like Vosk. That said, it makes a somewhat annoying error that I never really experienced with others.
When the audio is low quality for a moment, it might misinterpret a word. That's fine, any speech recognition system will do that. The problem with Whisper is that the misinterpreted word can affect the next word, or several words. It's trying to align the next bits of audio syntactically with the mistaken word.
Older systems, you'd get a nonsense word where the noise was but the rest of the transcription would be unaffected. With Whisper, you may get a series of words that completely diverges from the audio. I can look at the start of the divergence and recognize the phonetic similarity that created the initial error. The following words may not be phonetically close to the audio at all.
Ah yes, one of the standard replies whenever anyone mentions a way that an AI thing fails: "You're still using [X]? Well of course, that's not state of the art, you should be using [Y]."
You don't actually state whether you believe Parakeet is susceptible to the same class of mistakes...
It's an extremely common goalpost-moving pattern on HN, and it adds little to the conversation without actually addressing how or whether the outcome would be better.
Try it, or don't. Due to the nature of generative AI, what might be an issue for me might not be an issue for you, especially if we have differing use cases, so no one can give you the answer you seek except for yourself.