It’s called emergent behavior. We understand how an llm works, but do not have even a theory about how the behavior emerges from among the math. We understand ants pretty well, but how exactly does anthill behavior come from ant behavior? It’s a tricky problem in system engineering where predicting emergent behavior (such as emergencies) would be lovely.
> but do not have even a theory about how the behavior emerges from among the math
Actually we have an awful lot of those.
I'm not sure if emergent is quite the right term here. We carefully craft a scenario to produce a usable gradient for a black box optimizer. We fully expect nontrivial predictions of future state to result in increasingly rich world models out of necessity.
It gets back to the age old observation about any sufficiently accurate model being of equal complexity as the system it models. "Predict the next word" is but a single example of the general principle at play.
No, as I said, we have _lots_ of theories about exactly that at various levels of detail. The theories vary based on (at least) the specifics of the loss function being employed to construct the gradient. Giving an overview of that is far beyond the scope of this comment section (but it's well trodden ground so you can just go ask an LLM).
The "black box" bit refers to a generic, interchangeable optimization algorithm that simply makes the number go down (or up or whatever).
There are certainly various details about the internal workings of models that we don't properly understand but a blanket claim about the whole is erroneous.
The good news is that despite being incredibly complex, it’s still a lot simpler than ants because it is at least all statistical linguistics (as far as LLMs are concerned anyways).
> but do not have even a theory about how the behavior emerges
We fully do. There is a significant quality difference between English language output and other languages which lends a huge hint as to what is actually happening behind the scenes.
> but how exactly does anthill behavior come from ant behavior?
You can't smell what ants can. If you did I'm sure it would be evident.
The dynamics of ant nest creation are way more complicated than that. The evolved biological parallel of a procedural generation algorithm. In addition, the completed structure has to be compatible with the various programmed behaviors of the workers.
OK but then that goes back to their other assertion that it gives a huge hint at what is going on behind the scenes, is that huge hint just "more data gives better results!" if so, that doesn't seem at all important since that is the absolutely central idea of an LLM. That is not behind the scenes at all, that is the introduction to the play as written by the author.
Not your fault obviously, but they have not yet described what that huge hint is, and I'm just at the edge of my seat with anticipation here.