Fine tuning text to image/video models perhaps? For the newest models unless you...

Fine tuning text to image/video models perhaps?

For the newest models unless you quantize the crap out of them, even with a 5090 you’re going to be swapping blocks, which slows things down anyways. At least you’d be able to train on them at full precision with a decent batch size.

That said, I can’t imagine there’s enough of a market there to make it worth it.