If we're trying to quantify what they can NEVER do, I think we'd have to resort ...

hansonw · on April 27, 2024

This is also a good paper on the subject:

What Algorithms can Transformers Learn? A Study in Length Generalization https://arxiv.org/abs/2310.16028

shawntan · on April 27, 2024

Yes this is a good empirical study on the types of tasks that's been shown to be impossible for transformers to generalise on.

With both empirical and theoretical support I find it's pretty clear this is an obvious limitation.

jillesvangurp · on April 27, 2024

We have to be a bit more honest about the things we can actually do ourselves. Most people I know would flunk most of the benchmarks we use to evaluate LLMs. Not just a little bit but more like completely and utterly and embarrassingly so. It's not even close; or fair. People are surprisingly alright at a narrow set of problems. Particularly when it doesn't involve knowledge. Most people also suck at reasoning (unless they had years of training), they suck at factual knowledge, they aren't half bad at visual and spatial reasoning, and fairly gullible otherwise.

Anyway, this list looks more like a "hold my beer" moment for AI researchers than any fundamental objections for AIs to stop evolving any further. Sure there are weaknesses, and paths to address those. Anyone claiming that this is the end of the road in terms of progress is going to be in for some disappointing reality check probably a lot sooner than is comfortable.

And of course by narrowing it to just LLMs, the authors have a bit of an escape hatch because they conveniently exclude any further architectures, alternate strategies, improvements, that might otherwise overcome the identified current weaknesses. But that's an artificial constraint that has no real world value; because of course AI researchers are already looking beyond the current state of the art. Why wouldn't they.

martindbp · on April 27, 2024

It's clear that what's missing is flexibility and agency. For anything that can be put into text or a short conversation, and I'd have to chose between access to ChatGPT or a random human, I know what I'd chose.

pixl97 · on April 27, 2024

Agency is one of those things we probably want to think about quite a bit. Especially with the the willingness for people to hook up it up to things that interact with the real world.

shawntan · on April 27, 2024

Not sure what you got out of the paper, but for me it was more spurring ideas about how to fix this in future architectures.

Don't think anyone worth their salt would look at this and think : oh well that's that then.

cs702 · on April 27, 2024

Thank you for sharing this here. Rigorous work on the "expressibility" of current LLMs (i.e., which classes of problems can they tackle?) is surely more important, but I suspect it will go over head of most HN readers, many of whom have minimal to zero formal training on topics relating to computational complexity.

shawntan · on April 27, 2024

Yes, but unfortunately that doesn't answer the question the title poses.

cs702 · on April 27, 2024

The OP is not trying to answer the question. Rather, the OP is asking the question and sharing some thoughts on the motivations for asking it.

shawntan · on April 27, 2024

I agree it's a good question to be asking.

There are good answers to be found if you look.

It feels like no proper looking was attempted.

marquisdepolis · on April 29, 2024

This is very interesting thanks Shawn. I did email William Merrill to see his thoughts but didn't get a response yet.

unparagoned · on April 27, 2024

Neural nets can approximate any function.

A large enough llm with memory is turning complete.

So theoretically I don’t think there is anything they can never do.

shawntan · on April 27, 2024

> Neural nets can approximate any function.

Common misunderstanding of the universal approximation theorem.

Consider this: can an mlp approximate a sine wave?

> A large enough llm with memory is turning complete.

With (a lot of) chain of thought it could be.

Read the paper, and its references.

andy99 · on April 27, 2024

Sort of moot anyway. If statements can approximate any function, most programming languages are effectively turing complete. What's important about specific architectures like transformers is they allow for comparatively efficient determination of the set of weights that will approximate some narrower class of functions. It's finding the weights that's important, not the theoretical representation power.

sdenton4 · on April 27, 2024

"Consider this: can an mlp approximate a sine wave?"

Well, yes - we have neutral speech and music synthesis and compression algorithms which do this exceedingly well...

qwery2 · on April 28, 2024

I think the person you're replying to may have been referring to the problem of a MLP approximating a sine wave for out of distribution samples, i.e. the entire set of real numbers.

goatlover · on April 28, 2024

There's all sorts of things a neural net isn't doing without a body. Giving birth or free soloing El Capitan come to mind. It could approximate the functions for both in token-land, but who cares?