Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not to be impolite, but this is incorrect. One detail they did share in their paper is that they where able to finetune and select their hyper parameters on a model that needed 1,000x less compute than the final gpt4 model. OpenAI is definitely leading in how to train very large models cost effectively.


Toying around with a smaller model for hyperparameter search is nothing ground-breaking.


It’s not 1:1, understanding how the hypers scale with the model is also important. See https://arxiv.org/abs/2203.03466




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: