Is there any convincing research for how/why the inner monologue capabilities emerge?
It’s extremely unintuitive, but also pretty empirically obvious, that LLM’s gain this capability just by scaling and absent any changes in architecture. I assumed that an explicit external memory would be needed, maybe similar to a neural turing machine.
There is none I am aware of. It all focuses on eliciting and measuring and making good use of the capability.
The lack of an explicit external memory is not too surprising because the text is fed back in at every iteration. That fakes having a memory: the prompt just gets bigger. That's ordinary enough. What's critical, it seems, is being able to decide on the next incremental step and executing it within the space of an iteration, rather than simply 'guessing' the final answer.
As to how that actually happens inside a large but not small Transformer, I suspect that there is a phase transition inside the Transformer itself where it changes how it fundamentally thinks, which doesn't lead to any obvious changes the training dynamics because the two ways of thinking are initially equivalent in a loss. An example of this, where the Transformer computes in a radically different way before and after a certain point in training, is Anthropic's new work on the "induction bump": https://transformer-circuits.pub/2022/in-context-learning-an...
It’s extremely unintuitive, but also pretty empirically obvious, that LLM’s gain this capability just by scaling and absent any changes in architecture. I assumed that an explicit external memory would be needed, maybe similar to a neural turing machine.