Founder of Replicate here. Yeah, our cold boots suck. Here's what we're doing: -...

reissbaker · on Feb 17, 2024

The warm boot numbers for Replicate are also a bit concerning, though. I know that you're contesting the 800ms latency, and saying that a similar model you tested is 200ms — but that's still 30% slower than Fly (155ms). Even if you fix the cold boot problem, it looks like you're still trailing Fly by quite a bit.

I feel like it would be worth a deep dive with your team on what's happening and maybe writing a blog post on what you found?

Also, I'll gently point out that Fly not having to pull Docker images on "cold" boot isn't something your customers think much about, since a stopped Fly machine doesn't accrue additional cost (other than a few cents a month for rootfs storage). If it's roughly the same price, and roughly the same level of effort, and ends up performing the same function for the customer (inference), whether or not it's doing Docker image pulls behind the scenes doesn't matter so much to most customers. Maybe it's worth adding a pricing tier to Replicate that charges a small amount for storage even for unused models, and results in much better cold boot time for those models since you can skip the Docker image pull — or in the future, model file download — and just attach a storage device?

(I know you're also selling the infinitely autoscaling cluster, but I think for a lot of people the tradeoff between finite-autoscaling vs extremely long cold boot times is not going to be in favor of the long cold boots — so paying a small fee for a block storage tier that can be attached quickly for autoscaling up to N instances would probably make a lot of sense, even if scaling to N+1 instances is slow again and/or requires clicking a button or running a CLI command.)

tptacek · on Feb 18, 2024

For what it's worth: creating and stopping/starting Fly Machines is the whole point of the API. If you're on-demand creating new Machines, rather than allocating AOT and then starting/stopping them JIT, you're holding it wrong. :)

(There's a lot I can say about why I think a benchmark like this is showing us unusually well! I'm not trying to argue that people should take this benchmark too seriously.)

brianjking · on Feb 18, 2024

How do you see the Replicate vs Modal.com overlap?