Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Au contraire, no one knows how large GPT-4 is, which is the single best predictor of performance (for a model trained to convergence). The GPT-4 paper spent much of its time writing about this — they did some small scale experiments with 1/1000th the compute, then picked a loss level they wanted and trained GPT-4 till it got it.

Neither the exact loss level nor the number of parameters are revealed by the paper. Unfortunately it’s not possible to guess these from outside observations.

Will this save them from competition? No, but it certainly makes things harder. Everyone immediately aimed at 175B the moment GPT-3 was published. GPT-4 is now a question mark.



I can't believe anyone considers a single number, which would work about equally well if it were 10% higher or lower, to be a trade secret.


This is not really true. The Chinchilla paper showed that a 4% difference in loss between Chinchilla and Gopher led Chinchilla to blow Gopher out of the water at most tasks, including 30x performance in physics.

Empirically, LLMs have shown to have emergent abilities appear at different loss levels. So, a 10% difference could really matter.


That ten percent is not loss it is parameter count.


It's about causing your competition to waste millions of dollars in compute time and power doing something unproductive.

There is not a huge pile of excess TPUs laying around for people to use. Any strategic advantage can quickly compound and put you well ahead of others.


For all we know they have hit 500B parameters with some clever unpublished optimisation, which would both give them an edge and if revealed would put a damper on the preveiling belief that LLMs can scale and scale (eg. 3x more params for less than 3x performance).

As you say, there is absolutely no way for us to find out.


I think the big tech actors probably know. Information leaks and ultimately Google is spyware. Not that it will reach the public knowledge today, but that kind of information is difficult to keep in the bottle long time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: