You'd get the answer to a riddle wrong or miss something and nobody would start assuming that means you lack a fundamental understanding of how the world works. There's entire fields that look into how and why we make various mistakes and riddles and puzzles designed specifically to trip people up.
If you want to test if these models can solve riddles, or where they make mistakes go right ahead that's great. It's any assumption it has a much deeper meaning that is wrong to me.
>> You'd get the answer to a riddle wrong or miss something and nobody would start assuming that means you lack a fundamental understanding of how the world works. There's entire fields that look into how and why we make various mistakes and riddles and puzzles designed specifically to trip people up.
That's because with humans we assume a certain level of competency and intellectual ability. We cannot make the same assumption when testing AI systems like LLMs because their level of competency and intellectual ability is exactly the question we are trying to answer in the first place.
Note that getting an answer a little wrong, because the question looks like a question you already know the answer to, can be catastrophic in real world conditions. Tipping a frying pan over a plate on a table to serve an omelette when you've learned to do the same thing to serve a cooked shrimp works just fine and shows everyone how smart you are and how well you generalise to novel situations, right up to the point where the contents of the frying pan are on fire and you still tip them over a plate, on a table. Made of flammable wood. Oops.
Also note: a human may be confused by the Tsathoggua-Cthuga-Cxaxukluth river-crossing riddle but they'd never be confused about the danger of a frying pan on fire.
> Also note: a human may be confused by the Tsathoggua-Cthuga-Cxaxukluth river-crossing riddle but they'd never be confused about the danger of a frying pan on fire.
Which highlights the problem with using these riddles to assess other capabilities.
I wasn't talking about riddles, I was talking about real world. Suddenly something is just a little bit different and if you miss the change you fail. There's plenty of that in real world.
I am not sure I understand. It seems very easy. You cannot directly remove an element from an array, you could create a new array that excludes that element. Arrays have a fixed size once declared, and I cannot imagine anyone who has written some code not knowing. :/
I "have written some code" but it's been decades since I've done anything significant in Java in particular, and every language handles arrays (and/or data structures that get called "arrays") differently.
The terminology may be confusing, yes, although you would rather call them dynamic arrays or lists (like in Common Lisp). Plus you did say "decades", that is a long time nevertheless. I was not referring to people who have written some code decades ago, of course.
Most people with theory of mind can’t trivially solve this problem though. So the test doesn’t disprove ToM in general, just that it memorizes some results.
Would you care to explain how that responds to my point. I didn't feel the need to specify that ToM can not only be replaced with reasoning, but logic, and my point will still stand.
But that's exactly how real world works too.