If we're trying to quantify what they can NEVER do, I think we'd have to resort to some theoretical results rather than a list empirical evidence of what they can't do now.
The terminology you'd look for in the literature would be "expressibility".
We have to be a bit more honest about the things we can actually do ourselves. Most people I know would flunk most of the benchmarks we use to evaluate LLMs. Not just a little bit but more like completely and utterly and embarrassingly so. It's not even close; or fair. People are surprisingly alright at a narrow set of problems. Particularly when it doesn't involve knowledge. Most people also suck at reasoning (unless they had years of training), they suck at factual knowledge, they aren't half bad at visual and spatial reasoning, and fairly gullible otherwise.
Anyway, this list looks more like a "hold my beer" moment for AI researchers than any fundamental objections for AIs to stop evolving any further. Sure there are weaknesses, and paths to address those. Anyone claiming that this is the end of the road in terms of progress is going to be in for some disappointing reality check probably a lot sooner than is comfortable.
And of course by narrowing it to just LLMs, the authors have a bit of an escape hatch because they conveniently exclude any further architectures, alternate strategies, improvements, that might otherwise overcome the identified current weaknesses. But that's an artificial constraint that has no real world value; because of course AI researchers are already looking beyond the current state of the art. Why wouldn't they.
It's clear that what's missing is flexibility and agency. For anything that can be put into text or a short conversation, and I'd have to chose between access to ChatGPT or a random human, I know what I'd chose.
Agency is one of those things we probably want to think about quite a bit. Especially with the the willingness for people to hook up it up to things that interact with the real world.
Thank you for sharing this here. Rigorous work on the "expressibility" of current LLMs (i.e., which classes of problems can they tackle?) is surely more important, but I suspect it will go over head of most HN readers, many of whom have minimal to zero formal training on topics relating to computational complexity.
Sort of moot anyway. If statements can approximate any function, most programming languages are effectively turing complete. What's important about specific architectures like transformers is they allow for comparatively efficient determination of the set of weights that will approximate some narrower class of functions. It's finding the weights that's important, not the theoretical representation power.
I think the person you're replying to may have been referring to the problem of a MLP approximating a sine wave for out of distribution samples, i.e. the entire set of real numbers.
There's all sorts of things a neural net isn't doing without a body. Giving birth or free soloing El Capitan come to mind. It could approximate the functions for both in token-land, but who cares?
For a review of this topic, I'd suggest: https://nessie.ilab.sztaki.hu/~kornai/2023/Hopf/Resources/st...
The authors of this review have themselves written several articles on the topic, and there is also empirical evidence connected to these limitations.