> if you don't spot the change you fail But that's exactly how real world works ...

IanCal · on Oct 5, 2024

No it doesn't.

You'd get the answer to a riddle wrong or miss something and nobody would start assuming that means you lack a fundamental understanding of how the world works. There's entire fields that look into how and why we make various mistakes and riddles and puzzles designed specifically to trip people up.

If you want to test if these models can solve riddles, or where they make mistakes go right ahead that's great. It's any assumption it has a much deeper meaning that is wrong to me.

YeGoblynQueenne · on Oct 5, 2024

>> You'd get the answer to a riddle wrong or miss something and nobody would start assuming that means you lack a fundamental understanding of how the world works. There's entire fields that look into how and why we make various mistakes and riddles and puzzles designed specifically to trip people up.

That's because with humans we assume a certain level of competency and intellectual ability. We cannot make the same assumption when testing AI systems like LLMs because their level of competency and intellectual ability is exactly the question we are trying to answer in the first place.

Note that getting an answer a little wrong, because the question looks like a question you already know the answer to, can be catastrophic in real world conditions. Tipping a frying pan over a plate on a table to serve an omelette when you've learned to do the same thing to serve a cooked shrimp works just fine and shows everyone how smart you are and how well you generalise to novel situations, right up to the point where the contents of the frying pan are on fire and you still tip them over a plate, on a table. Made of flammable wood. Oops.

Also note: a human may be confused by the Tsathoggua-Cthuga-Cxaxukluth river-crossing riddle but they'd never be confused about the danger of a frying pan on fire.

IanCal · on Oct 5, 2024

> Also note: a human may be confused by the Tsathoggua-Cthuga-Cxaxukluth river-crossing riddle but they'd never be confused about the danger of a frying pan on fire.

Which highlights the problem with using these riddles to assess other capabilities.

tsunamifury · on Oct 5, 2024

This is such a strange and incoherently adjacent answer.

ifdefdebug · on Oct 5, 2024

I wasn't talking about riddles, I was talking about real world. Suddenly something is just a little bit different and if you miss the change you fail. There's plenty of that in real world.

Fripplebubby · on Oct 5, 2024

This has not been my experience with the real world. Riddles and gotchas have played a very small role, so far.

shermantanktop · on Oct 5, 2024

Unfortunately they do show up in tech interviews.

“What’s the correct way to delete an element from a Java array while iterating over it?”

“Well I suppose you’d need to avoid invalidating the iterator state but I don’t recall the…”

“BZZT!”

johnisgood · on Oct 5, 2024

I am not sure I understand. It seems very easy. You cannot directly remove an element from an array, you could create a new array that excludes that element. Arrays have a fixed size once declared, and I cannot imagine anyone who has written some code not knowing. :/

HappMacDonald · on Oct 5, 2024

I "have written some code" but it's been decades since I've done anything significant in Java in particular, and every language handles arrays (and/or data structures that get called "arrays") differently.

johnisgood · on Oct 5, 2024

The terminology may be confusing, yes, although you would rather call them dynamic arrays or lists (like in Common Lisp). Plus you did say "decades", that is a long time nevertheless. I was not referring to people who have written some code decades ago, of course.

shermantanktop · on Oct 6, 2024

The point of my fake narrative was that failing to recall trivia about exactly which method to call is considered failure.

And btw, some arrays are fixed, but many languages call something an “array” which is dynamically-sized.

theptip · on Oct 5, 2024

Most people with theory of mind can’t trivially solve this problem though. So the test doesn’t disprove ToM in general, just that it memorizes some results.

godelski · on Oct 5, 2024

People are capable of theory of mind. That does not mean they're using it. Same is true for reasoning.

IanCal · on Oct 6, 2024

People messing up in this puzzle are not confused about the idea that people know things others don't.

godelski · on Oct 6, 2024

Would you care to explain how that responds to my point. I didn't feel the need to specify that ToM can not only be replaced with reasoning, but logic, and my point will still stand.