Replicate has really long boot times for custom models - 2/3 minutes if you are lucky and up to 30 minutes if they are having problems.
While we loved the dev experience we just couldn’t make it work with frequently switching models / LORA weights.
We switched to beam (https://www.beam.cloud) and it’s so much better. Their cold start times are consistently small and they provide caching layer for model files i.e volumes which make switching between models a breeze.
Beam also has much better pricing policy. For custom models on replicate you pay for boot times (which are very long!) so you are paying a lot of $ for a single request.
With beam you only pay for inference and idle time.
Founder of Replicate here. Our cold boots do suck (see my other comment), but you aren't charged for the boot time on Replicate, just the time that your `setup()` function runs.
Incentives are aligned for us to make it better. :)
Was not aware of that that. You should probably change the docs to better explain what you are charged for. Right now it says you do get charged for boot time:
“[…] Unlike public models, you’ll pay for boot and idle time in addition to the time it spends processing your requests.”
Apart from boot times, we actually find replicate to be an amazing platform, congrats
While we loved the dev experience we just couldn’t make it work with frequently switching models / LORA weights.
We switched to beam (https://www.beam.cloud) and it’s so much better. Their cold start times are consistently small and they provide caching layer for model files i.e volumes which make switching between models a breeze.
Beam also has much better pricing policy. For custom models on replicate you pay for boot times (which are very long!) so you are paying a lot of $ for a single request.
With beam you only pay for inference and idle time.