Oh nice, 65B! I was planning to try it out sometime but have been waiting for va...

rvnx · on March 23, 2023

I'm running LLaMA-65B on a a2-ultragpu-1g instance at GCP with a 1xNVIDIA A100 80GB using this UI: https://github.com/oobabooga/text-generation-webui

The good thing about this UI is that it supports both completion and chat-mode (+ is super easy to install).

I'm using a preemptible instance to save costs. As it is an instance with a local SSD you cannot stop it using the UI (only delete it) but there is a trick if you do it from Cloud Shell:

gcloud compute instances stop <INSTANCE_NAME> --discard-local-ssd

It's usable, though a bit slow, but it's more for playing and discovering the model.

To answer your questions, from what I see, it's less good than GPT-4 but much much better than Google Bard, so somewhere between the twos. (as a reference point, from my testing LLaMA-7B is way better than Bard as well).

The main drawback of GPT-4 is its censorship and enforced political views.