Arguably one of the central issues with CGPT is that it often fails to do common sense reasoning about the world.
Things like keeping track of causality etc.
The data it has been trained on doesn't contain that information. Text doesn't convey those relationships correctly.
It's possible to write event A was the cause of event B, and event B happened before event A.
It seems likely that humans gain that understanding by interacting with the world. Such data isn't available to train LLMs. Just including just basic sensory inputs like image and sound would easily increase training data by many orders of magnitude.
It seems likely that humans gain that understanding by interacting with the world. Such data isn't available to train LLMs. Just including just basic sensory inputs like image and sound would easily increase training data by many orders of magnitude.