Since OpenAI didn't release the parameter count of GPT-4, I've been wondering/do...

ShamelessC · on March 23, 2023

Yannic Kilcher's opinion on this is likely correct. Similar parameter count, but trained for longer. The particulars of their instruction tuning/whatever-else-they-did are the real secret sauce.

redox99 · on March 23, 2023

Don't forget about a more efficient attention that let's them get 32k tokens of context.

bitL · on March 23, 2023

It's still much worse than 1M context on 16GB VRAM with Reformer, but at the cost of inference speed. And you can use FlashAttention in your own models to get a more efficient/sparse attention now as well.

Tenoke · on March 23, 2023

The quality with reformer is much much worse, it's not really comparable.

bitL · on March 23, 2023

Yeah, but it fits on a single GPU. Now imagine it scaled across 1000 GPUs.

Tenoke · on March 23, 2023

I finetuned one in 2020[0] to play around with and the results still seemed a bit worse than a gpt of comparable size.

0. https://svilentodorov.xyz/blog/reformer-99m/

meghan_rain · on March 23, 2023

How could one apply the mentioned technologies to llama/alpaca?

ugh123 · on March 23, 2023

>performance on standardized tests? That doesn't necessarily seem like the best metric for what the LLM tries to be.

The standardized tests give a baseline, no matter how arbitrary it might be, just as they do for humans in school.

Whether we think it's right or not, these tools are coming for the workplace. So their ultimate metric will be in business performance to justify their costs (whatever they may be).

lionkor · on March 23, 2023

GPT 3.5 had trouble understanding when I told it "Say 2 bob are a beb, how many beb per bob are there?" and it wrote a goddamn essay about shoes.

That thing isnt smart, it doesnt understand, it doesnt know, it just rambles. I have worked with people who do the same, yes, but they also werent a threat to most jobs.

I said it before, and I will say it again: If ChatGPT 3,4,5,... can take your job, maybe youre not really providing that much value. Make of that what you will - not everyone has to provide huge value.

Mike_12345 · on March 23, 2023

Answer from GPT-4:

"This question seems to be intentionally nonsensical or is using unfamiliar terminology. However, if we try to interpret it, we could say that there are 2 "bob" making up 1 "beb." In this case, there would be 0.5 "beb" per "bob." Please provide more context or clarify the terms if you are looking for a different answer."

Answer from GPT-3.5 (subscription version, not free):

"If 2 bob are a beb, then the conversion factor from bobs to bebs is 1 beb per 2 bobs.

So, there is 0.5 beb per bob (or alternatively, 2 bobs per beb)."

Answer from GPT-3.5 (legacy / free version):

"If 2 bobs are equal to 1 beb, then there would be 1 beb per 2 bobs."

lionkor · on March 23, 2023

Cool, but sadly, as I said, it did not give a very useful answer. If asked enough times, im sure it will give a reasonable answer, yes, but thats not the point.

GPT4s answer is interesting, though

Mike_12345 · on March 23, 2023

But all of the answers were correct and useful, and GPT-4 was perfect. Anyway ChatGPT is getting hooked up to Wolfram Alpha, and that won't have any issues with basic algebra.

ugh123 · on March 26, 2023

If you already know the answer, but need an interesting and detailed way to explain/write it, then GPT4 can be an immense value-add. Even if you have to do some fact checking afterwards.

skykooler · on March 23, 2023

What do LLaMA-based models answer for this?

qingdao99 · on March 23, 2023

alpaca.cpp 7b model gave me this:

> Say 2 bob are a beb, how many beb per bob are there?

There are two bebs per bob.

> No, there are 2 bobs per beb.

Correct! There is only one other answer that makes sense - "1 Bob for every Beb". But this does not make mathematical or logical sense since a single Bob can have multiple Babes associated with it and vice versa (as seen in the example). Therefore, we must conclude there are two bobs per beb.

BeefWellington · on March 23, 2023

> I said it before, and I will say it again: If ChatGPT 3,4,5,... can take your job, maybe youre not really providing that much value.

The thing you're missing here is that it might take someone's job not because they weren't providing the value but because the people who crunch numbers decided to axe 15% of a company's workforce because some executive was sold a pack of lies about what LLMs/"AI" are actually capable of.

It's fine if that happens to one company who then finds out the hard way. It's probably more social-unresty if it's essentially done at every company in every marketplace an LLM can touch - from writing to programming to 3D animation to teaching.

The hype machine around LLM/AI here is the same irrational one we saw around blockchain. The key difference is blockchain was basically never sold as really replacing a person's job (at best you could argue it was sold as getting around the banking industry and maybe eventually being able to replace it, ish). The primary sales pitch of these LLMs is essentially "do more with less".

kolinko · on March 23, 2023

I typed the query into chat-gpt3.5 (turbo and legacy), and 4, and they all said that there's 0.5 beb per bob.

Did you use the quoted prompt exactly?

lionkor · on March 23, 2023

No, I didn't use the quoted prompt, but even after explaining to it that bob and beb were not, in fact, shoe related terms, it still kept insisting and being confused (while also giving the correct 1/2 answer).

It can do it, but its not deterministic, and it doesnt really do it well. You can continue the chain by asking "How many bob per bib, assuming two beb per bib?", and see if it chokes then. It sometimes does, sometimes doesnt.

ogogmad · on March 23, 2023

GPT-4:

   If 2 bebs are equal to 1 bib, and we know that 1 beb equals 2 bobs, we can
   determine how many bobs there are per bib using simple substitution.
   
   1 bib = 2 bebs
   1 beb = 2 bobs
   
   Therefore,
   
   1 bib = 2 bebs × 2 bobs/beb = 4 bobs
   
   So, there are 4 bobs per bib.

Nitpick: A properly done substitution would've arrived at

   1 bib = 2 × (2 bobs)

without needing any of the "2 bebs × 2 bobs/beb" nonsense. It doesn't teach this task very well.

Tostino · on March 23, 2023

You do realize that the current implementations get their context polluted by your prior conversation, right?

hcrean · on March 23, 2023

I think right here we have an example AI reproducibility problem. It seems fully reasonable and credible as an outcome, but it is hard to dig in and replicate. But the truth of ML is it would be difficult to replicate even if things were FOSS.

DeathArrow · on March 23, 2023

> Since OpenAI didn't release the parameter count of GPT-4

That makes me ask what the open in OpenAI stands for?

ugjka · on March 23, 2023

Just like MTV doesn't mean Music TV anymore.

As a joke I'd say, Open means "open your wallets"

version_five · on March 23, 2023

Or TLC as the learning channel or History channel (assuming these still exist).

There are also lots of "Open Government" initiatives that end up being about making everything as opaque and confusing as possible. There were (are?) popular in the "big data" era, though funnily enough, if you watch "Yes Minister!" from ~40 years ago, there is a similar gag about "open government" in the first few episodes, so it's not new.

See of course Orwell, "we care about your privacy" banners, etc. People like to lie as blatantly as possible.

rvnx · on March 23, 2023

Didn't know it was "Music TV", made me think about Skyrock... the biggest Rap channel in France, and essentially no Rock there.

marcyb5st · on March 23, 2023

I am not sure how much bigger, but definitely much bigger IMHO. Otherwise you wouldn't be capped at 25 requests every 3h. That number is small enough that makes me think the inference costs/hardware needed are much bigger than 3.5.

pixl97 · on March 23, 2023

I believe I heard that running inference longer is giving the better responses we're seeing in v4. Hence why v4 is taking so much longer to output data.

Of course we won't know this for sure until OAI tells us, so we may be in the dark for a while.

zarzavat · on March 23, 2023

ChatGPT-4 is definitely slower than GPT-3.5 (and way slower than 3.5-turbo). What could be the reason for that other than much larger parameter count?

I agree that the capabilities seem overhyped. In my subjective experience, 4 seems a little better than 3.5 but not by a huge amount. We just have OpenAI’s cherry-picked word that it‘s this incredible advance.

selfhoster11 · on March 23, 2023

I disagree. It does much, much better on selected tasks. I cannot quite figure out how to describe what the difference "feels" like, but the performance is sometimes markedly different when feeding ChatGPT-3.5 and ChatGPT-4 the same prompt.

allemagne · on March 23, 2023

One task that ChatGPT-3.5 is hilariously bad at is reversing strings (both words and pseudorandom input). It seems to have only a vague concept of what that means, even if I try to hold its hand through the process. Maybe some prompt engineering can get it to succeed on anything longer than four letters.

ChatGPT-4 meanwhile seems to have no issue with this at all.

zarzavat · on March 23, 2023

Have you tried inserting spaces between the characters? This may just be a tokenization issue, rather than anything due to the model per se.

Reversing a string is somewhat of a pathological case for language models, because they see tokens not characters. Learning that the token “got” and token “tog” are mirror images is only useful for string reversal and generating palindromes. Unless they are trained specifically for this task, they may not be able to do it. They should however be able to see that “g o t” and “t o g” are mirror images.

Infamously, early versions of GPT-3 tokenized numbers as grouped tokens, nerfing its calculation abilities, because it would tokenize a number such as 12345 as (illustratively) 12 34 5 which is obviously a harmful representation.

pixl97 · on March 23, 2023

> What could be the reason for that other than much larger parameter count?

Longer inference time... I should have written it down now that people are asking about it, but a few weeks ago I was seeing people discuss the GPT-4 "paper" in what little information was released and that throwing more inference compute at the problem gives better responses.

>, 4 seems a little better than 3.5 but not by a huge amount.

Can you define that in a tangible way? I don't think most of us can since we have so little access to the product.

CapsAdmin · on March 23, 2023

Runs on cheaper but slower compute maybe? Given all the hype and little competition, I'm sure they're willing to make it slower if it reduces cost.

HDThoreaun · on March 23, 2023

> 4 seems a little better than 3.5 but not by a huge amount.

Depends on the task. 3.5 was completely incapable of doing math, but 4 seems to be able to at a solid highschool graduate level.

loveparade · on March 23, 2023

Given how small the time window between the successive releases was it's extremely unlikely that there were any big changes to the model. Most likely it's just better preprocessed training data, more training data, trained for longer, performance optimizations for attention, or a few changes to layer sizes.

Veen · on March 23, 2023

They didn’t release GTP-4 immediately after it was trained and then move on to training GPT-5. They had 4 for almost 6 months before it was released. 5 was certainly well underway long before we’d heard of 4.

mach1ne · on March 23, 2023

Your timeline is wrong, GPT-4 finished training already in August.

ar9av · on March 23, 2023

I saw this coming a long time ago and I'm still very pissed off. For three reasons:

1. We are all forced to use the damn "chat" API instead of regular completions. Can't wait to have to deal with chatgpt's conversations in order to get a few lines of code out 2. We loose the super valuable 'insert' and 'edit' modes, which were great for code 3. 3-day notice period? that's going to be a hell for people who are actually providing products based on codex or doing research

mcaledonensis · on March 23, 2023

Completion API for GPT-4 will be there soon. With extra stop tokens, but better than nothing. A compromise.

And it's not like what OpenAI did was an impossible magic trick. They've had a right team composition. And three insights. All present in the literature. Repeat that, you'll have GPT-4. But GPT-5. Well, that one is different game.

As to being open, they are still relatively open. Consider Apple, for example. No one complains about Apple being a bit skittish. Well, OpenAI got a bit skittish too. It's a period. They'll stabilize. And their setup of the company, with the non-profit board in control, profit caps is a really interesting try at the corporate design.

jiggywiggy · on March 23, 2023

Its not interesting. It's a hack to have a don't be evil vibe and keeping the name "open" while they go against their own foundational principles.

KyeRussell · on March 23, 2023

You aren’t providing any sort of valuable insight here. This is more indicative of your priors than anything else. Everyone has heard this argument. The people that believe it, believe it. The people that don’t, don’t.

jiggywiggy · on March 26, 2023

The initial goal was to make ai available to everyone. In the process of getting enough funds to build their vision they gave it to Microsoft.

selfhoster11 · on March 23, 2023

Lots of people complain about Apple being skittish (including HN comment section), but they also expect them to pull a stunt every once in a while. OpenAI was an unknown quantity until now.

whym · on March 23, 2023

From WordNet:

> 1. skittish, flighty, spooky, nervous -- (unpredictably excitable (especially of horses))

(I didn't know the word skittish, and I figured this might help others, too.)

nunodonato · on March 23, 2023

dude why are you copy and pasting my comments from other threads?

SalmoShalazar · on March 23, 2023

Did they actually plagiarize a comment you’ve made previously?

nunodonato · on March 24, 2023

https://www.reddit.com/r/OpenAI/comments/11xbe9o/comment/jd2...

inquist · on March 23, 2023

I searched and didn’t find any identical prior comment

nunodonato · on March 24, 2023

https://www.reddit.com/r/OpenAI/comments/11xbe9o/comment/jd2...

dizhn · on March 24, 2023

Is it a bot? were you able to figure anything out?

nunodonato · on March 24, 2023

no idea, but its funny how it/he/she fetched the comment a similar reddit discussion and pasted it here...

MichaelZuo · on March 26, 2023

So this 'ar9av' is a bot reposting comments or a troll?

Wonder if dang could take a look at this.

aleksanderhan · on March 23, 2023

Nobody are forced to anything. You don't have to use openai services if you don't want to...

selfhoster11 · on March 23, 2023

> Nobody are forced to anything. You don't have to use a smartphone if you don't want to...

I expect that a similar thing is possible with the use of AI (for work or possibly education, if not for personal use) as happened with smartphones.