Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Personally I disagree, there are lot of interesting tidbits in this paper. More than marketing would need at least.


What good bits did you find? (I'm not sure how fruitful the "OpenAI is a Microsoft department" debate is given that they are almost one and everybody knows it, but I am curious if anyone has found anything good in those many pages.)


I think the most interesting thing is the their ability to predict performance from loss and on a wide range of tasks using a much smaller model - this lets them fine tune their architecture and hypers, then run a single large training run to get full scale gpt4 - from the paper it sounds like they only trained the large model once, then did a Reinforcement learning with human feedback finetune.

Disclaimer - I work at Microsoft, in AI, and have no internal knowledge about gpt4.


This isn’t that interesting imo. This is the basic outcome of the scaling laws from Kaplan, Chinchilla papers pushed to a larger final model delta.

They likely did extensive small model building on the gpt-4 architecture to establish hyperparameter scaling laws and then did a predicted build in exactly the same way chinchilla did.


I guess, but its actually not simple to do that, in my experience. There’s another paper on that: https://arxiv.org/abs/2203.03466

Why isn’t chinchilla running google AI chat or whatever then?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: