Personally I disagree, there are lot of interesting tidbits in this paper. More ...

whatshisface · on March 23, 2023

What good bits did you find? (I'm not sure how fruitful the "OpenAI is a Microsoft department" debate is given that they are almost one and everybody knows it, but I am curious if anyone has found anything good in those many pages.)

buildbot · on March 23, 2023

I think the most interesting thing is the their ability to predict performance from loss and on a wide range of tasks using a much smaller model - this lets them fine tune their architecture and hypers, then run a single large training run to get full scale gpt4 - from the paper it sounds like they only trained the large model once, then did a Reinforcement learning with human feedback finetune.

Disclaimer - I work at Microsoft, in AI, and have no internal knowledge about gpt4.

tempusalaria · on March 23, 2023

This isn’t that interesting imo. This is the basic outcome of the scaling laws from Kaplan, Chinchilla papers pushed to a larger final model delta.

They likely did extensive small model building on the gpt-4 architecture to establish hyperparameter scaling laws and then did a predicted build in exactly the same way chinchilla did.

buildbot · on March 23, 2023

I guess, but its actually not simple to do that, in my experience. There’s another paper on that: https://arxiv.org/abs/2203.03466

Why isn’t chinchilla running google AI chat or whatever then?