> What could be the reason for that other than much larger parameter count? Long...

> What could be the reason for that other than much larger parameter count?

Longer inference time... I should have written it down now that people are asking about it, but a few weeks ago I was seeing people discuss the GPT-4 "paper" in what little information was released and that throwing more inference compute at the problem gives better responses.

>, 4 seems a little better than 3.5 but not by a huge amount.

Can you define that in a tangible way? I don't think most of us can since we have so little access to the product.