More

bfirsh · on April 18, 2024

We've got an API out here: https://replicate.com/blog/run-llama-3-with-an-api

You can also chat with it here: https://llama3.replicate.dev/

simonw · on April 18, 2024

The pet names example is my pelican test prompt!

bfirsh · on Feb 17, 2024

Founder of Replicate here. Our cold boots do suck (see my other comment), but you aren't charged for the boot time on Replicate, just the time that your `setup()` function runs.

Incentives are aligned for us to make it better. :)

moscicky · on Feb 17, 2024

Was not aware of that that. You should probably change the docs to better explain what you are charged for. Right now it says you do get charged for boot time:

“[…] Unlike public models, you’ll pay for boot and idle time in addition to the time it spends processing your requests.”

Apart from boot times, we actually find replicate to be an amazing platform, congrats

bfirsh · on Feb 17, 2024

Oops you're absolutely right. "Boot" ambiguously meant "setup" there. Fix deploying!

bfirsh · on Feb 17, 2024

Founder of Replicate here. Yeah, our cold boots suck.

Here's what we're doing:

- Fine-tuned models now boot fast: https://replicate.com/blog/fine-tune-cold-boots

- You can keep models switched on to avoid cold boots: https://replicate.com/docs/deployments

- We've optimized how weights are loaded into GPU memory for some of the models we maintain, and we're going to open this up to all custom models soon.

- We're going to be distributing images as individual files rather than as image layers, which makes pulling images much more efficient.

Although our cold boots do suck, the comparison in this blog post is comparing apples to oranges because Fly machines are much lower level than Replicate models. It is more like a warm boot.

It seems to be using a stopped Fly machine, which has already pulled the Docker image onto a node. When it starts, all it's doing is starting the Docker container. Creating the Fly machine or scaling it up would take much longer.

On Replicate, the models auto-scale on a cluster. The model could be running anywhere in our cluster so we have to pull the image to that node when it starts.

Something funny seems to be going on with the latency too. Our round-trip latency is about 200ms for a similar model. Would be curious to see the methodology, or maybe something was broken on our end.

But we do acknowledge the problem. It's going to get better soon.

reissbaker · on Feb 17, 2024

The warm boot numbers for Replicate are also a bit concerning, though. I know that you're contesting the 800ms latency, and saying that a similar model you tested is 200ms — but that's still 30% slower than Fly (155ms). Even if you fix the cold boot problem, it looks like you're still trailing Fly by quite a bit.

I feel like it would be worth a deep dive with your team on what's happening and maybe writing a blog post on what you found?

Also, I'll gently point out that Fly not having to pull Docker images on "cold" boot isn't something your customers think much about, since a stopped Fly machine doesn't accrue additional cost (other than a few cents a month for rootfs storage). If it's roughly the same price, and roughly the same level of effort, and ends up performing the same function for the customer (inference), whether or not it's doing Docker image pulls behind the scenes doesn't matter so much to most customers. Maybe it's worth adding a pricing tier to Replicate that charges a small amount for storage even for unused models, and results in much better cold boot time for those models since you can skip the Docker image pull — or in the future, model file download — and just attach a storage device?

(I know you're also selling the infinitely autoscaling cluster, but I think for a lot of people the tradeoff between finite-autoscaling vs extremely long cold boot times is not going to be in favor of the long cold boots — so paying a small fee for a block storage tier that can be attached quickly for autoscaling up to N instances would probably make a lot of sense, even if scaling to N+1 instances is slow again and/or requires clicking a button or running a CLI command.)

tptacek · on Feb 18, 2024

For what it's worth: creating and stopping/starting Fly Machines is the whole point of the API. If you're on-demand creating new Machines, rather than allocating AOT and then starting/stopping them JIT, you're holding it wrong. :)

(There's a lot I can say about why I think a benchmark like this is showing us unusually well! I'm not trying to argue that people should take this benchmark too seriously.)

brianjking · on Feb 18, 2024

How do you see the Replicate vs Modal.com overlap?

bfirsh · on Jan 30, 2024

The exact same thing happened to a phpBB bass guitar forum I was a member of about 15 years ago. The owner just disappeared and we knew the bills weren’t being paid.

Hypthetically, I found a remote code execution vulnerability in that version of phpBB, read the configuration file to get the MySQL details, and then used mysqldump to download the database.

You can find exploits by just Googling or looking in Metasploit. They’re usually pretty simple query string things.

We set up a clone of the forum with a similar name just in time, then emailed the user database to tell them about the new site when the old one disappeared.

Sadly this caught the previous owner’s attention and he sent a cease and desist, despite his version of the site not existing any longer. So, we wiped the database, and then all the users just signed up again. It still lives on as https://www.basschat.co.uk/

Hypothetically.

Modified3019 · on Jan 30, 2024

Amazing story. I hypothetically consider you a hero.

>Sadly this caught the previous owner’s attention and he sent a cease and desist, despite his version of the site not existing any longer.

Wow, what an ass. I just can't understand the thought process of some people.

gexla · on Jan 30, 2024

Even though I didn't keep the software maintained, I wouldn't be happy that sensitive member information (email addresses, login info) was leaked (unknown destination) by the community. I would feel some responsibility for that information getting out, and at least feel the need to reach out to past members and anyone using that data. If I wanted to be an "ass" then I would have taken harsher legal routes.

I would just scrape the site. I would even consider anonymizing names, as everyone should have the right to delete old content IMO.

After scraping, I would then watch the analytics. Anything which seems popular could then be given more attention. For example, I would probably create a design for the site itself, then create dedicated pages for anything popular. The pages could be curated "this is what you came for" and a link to the original pages.

Forums probably don't come back. You could start out with some minimal community features such as comments to see if anyone is willing to bite. A Discord server or similar could be another option. Kind of depends on the demographic I guess. Old people like forums. ;)

bfirsh · on Nov 14, 2023

Founder of Replicate here. We open pull requests on models[0] to get them running on Replicate so people can try out a demo of the model and run them with an API. They're also packaged with Cog[1] so you can run them as a Docker image.

Somebody happened to stumble across our fork of the model and submitted it. We didn't submit it nor intend for it to be an ad. I hope the submission gets replaced with the upstream repo so the author gets full credit. :)

[0] https://github.com/Vaibhavs10/insanely-fast-whisper/pull/42

[1] https://github.com/replicate/cog

idonotknowwhy · on Nov 14, 2023

I'm curious, How did you know about this thread here? I've seen this happen where a blog or site is mentioned and the author shows up. It's there software to monitor when you're mentioned on HN or did you just happen to browse it?

bfirsh · on Nov 14, 2023

I read Hacker News a lot...

arvidkahl · on Nov 14, 2023

You might find https://syften.com/ interesting. I use it for monitoring Reddit and all kinds of communities for mentions of my name and the titles of my books.

bfirsh · on Oct 2, 2023

Replicate (YC W20) | San Francisco, CA + Remote | https://replicate.com/

Replicate makes it easy to run machine learning models in the cloud. You can run a big library of open source machine learning models with a few lines of code, or deploy your own models at scale.

We're an experienced team from Spotify, Docker, GitHub, Heroku, Apple, and various other places. We're backed by a16z, Sequoia, Andrej Karpathy, Dylan Field, Guillermo Rauch.

We're hiring for:

- An infrastructure engineer who is an expert at deploying ML systems

- An engineer who is good at humans to look after our customers

- An engineering manager based in SF

... and more: https://replicate.com/about#join-us

Email us: jobs@replicate.com

flatline · on Oct 3, 2023

Do you really have an engineering manager role open? I may be interested, but it is not listed on that page.

bfirsh · on Sept 12, 2023

Founder of Replicate here. It's early indeed.

OpenAI aren't doing anything magic. We're optimizing Llama inference at the moment and it looks like we'll be able to roughly match GPT 3.5's price for Llama 2 70B.

Running a fine-tuned GPT-3.5 is surprisingly expensive. That's where using Llama makes a ton of sense. Once we’ve optimized inference, it’ll be much cheaper to run a fine-tuned Llama.

yixu34 · on Sept 13, 2023

We're working on LLM Engine (https://llm-engine.scale.com) at Scale, which is our open source, self-hostable framework for open source LLM inference and fine-tuning. We have similar findings to Replicate: Llama 2 70B can be comparable to GPT 3.5 price, etc. Would be great to discuss this further!

Dowwie · on Sept 13, 2023

How heavy of a lift is it to optimize inference?

bfirsh · on July 3, 2023

Replicate (YC W20) | Berkeley, CA + Remote | https://replicate.com/

Replicate makes it easy to run machine learning models in the cloud. You can run a big library of open source machine learning models with a few lines of code, or deploy your own models at scale.

We're an experienced team from Spotify, Docker, GitHub, Heroku, Apple, and various other places. We're backed by a16z, Sequoia, Andrej Karpathy, Dylan Field, Guillermo Rauch.

We're hiring:

- ML engineers

- An engineer who is good at humans to look after our customers

- Hackers to build cool things with machine learning to show you how to use Replicate

... and more: https://replicate.com/about#join-us

Email us: jobs@replicate.com

bfirsh · on June 1, 2023

Replicate (YC W20) | Berkeley, CA + Remote | https://replicate.com/ Replicate makes it easy to run machine learning models in the cloud. You can run a big library of open source machine learning models with a few lines of code, or deploy your own models at scale.

We're an experienced team from Spotify, Docker, GitHub, Heroku, Apple, and various other places. We're backed by a16z, Sequoia, Andrej Karpathy, Dylan Field, Guillermo Rauch.

We're hiring for:

- A generalist engineer who has a refined sense of what makes a good developer tool

- An engineer who is good at humans to look after our customers

- Hackers to build cool things with machine learning to show you how to use Replicate

... and more: https://replicate.com/about#join-us

Email us: jobs@replicate.com

bfirsh · on May 2, 2023

Founder of https://replicate.com/ here, which has been mentioned a few times. Happy to help you get set up. :) ben@replicate.com