Is there any convincing research for how/why the inner monologue capabilities em...

gwern · on April 5, 2022

There is none I am aware of. It all focuses on eliciting and measuring and making good use of the capability.

The lack of an explicit external memory is not too surprising because the text is fed back in at every iteration. That fakes having a memory: the prompt just gets bigger. That's ordinary enough. What's critical, it seems, is being able to decide on the next incremental step and executing it within the space of an iteration, rather than simply 'guessing' the final answer.

As to how that actually happens inside a large but not small Transformer, I suspect that there is a phase transition inside the Transformer itself where it changes how it fundamentally thinks, which doesn't lead to any obvious changes the training dynamics because the two ways of thinking are initially equivalent in a loss. An example of this, where the Transformer computes in a radically different way before and after a certain point in training, is Anthropic's new work on the "induction bump": https://transformer-circuits.pub/2022/in-context-learning-an...