That's been done already for years. OpenAI were training on bulk AI transcribed ...

That's been done already for years. OpenAI were training on bulk AI transcribed YouTube vids already in the GPT-4 era. Modern models are all multi-modal and cotrained on audio and image tokens together with text.

The AI companies are not only out of such data but their access to it is shrinking as the people who control the hosting sites wall them off (like YouTube).