Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh nice, 65B! I was planning to try it out sometime but have been waiting for various repos to get their issues sorted out and I'm much less interested in smaller models. Are you using GPUs or CPU? Any tips on what to use? What's the RAM usage? Performance? How's the quality looking?


I'm running LLaMA-65B on a a2-ultragpu-1g instance at GCP with a 1xNVIDIA A100 80GB using this UI: https://github.com/oobabooga/text-generation-webui

The good thing about this UI is that it supports both completion and chat-mode (+ is super easy to install).

I'm using a preemptible instance to save costs. As it is an instance with a local SSD you cannot stop it using the UI (only delete it) but there is a trick if you do it from Cloud Shell:

gcloud compute instances stop <INSTANCE_NAME> --discard-local-ssd

It's usable, though a bit slow, but it's more for playing and discovering the model.

To answer your questions, from what I see, it's less good than GPT-4 but much much better than Google Bard, so somewhere between the twos. (as a reference point, from my testing LLaMA-7B is way better than Bard as well).

The main drawback of GPT-4 is its censorship and enforced political views.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: