Jsut today I was doing some vibe coding ish experiments where I had a todo list and getting the AI tools to work through the list. Claude decided to do an item that was already checked off, which was something like “write database queries for the app” kind of thing. It first deleted all of the files in the db source directory and wrote new stuff. I stopped it and asked why it’s doing an already completed task and it responded with something like “oh sorry I thought I was supposed to do that task, I saw the directory already had files, so I deleted them”.
Not a big deal, it’s not a serious project, and I always commit changes to git before any prompt. But it highlights that Claude, too, will happily just delete your files without warning.
Why would you ask one of these tools why they did something? There's no capacity for metacognition there. All they'll do is roleplay how human might answer that question. They'll never give you any feedback with predictive power.
They have no metacognition abilities, but they do have the ability to read the context window. With how most of these tools work anyways, where the same context is fed to the followup request as the original.
There's two subreasons why that might make asking them valuable. One is that with some frontends you can't actually get the raw context window so the LLM is actually more capable of seeing what happened than you are. The other is that these context windows are often giant and making the LLM read it for you and guess at what happened is a lot faster than reading it yourself to guess what happened.
Meanwhile understanding what happens goes towards understanding how to make use of these tools better. For example what patterns in the context window do you need to avoid, and what bugs there are in your tool where it's just outright feeding it the wrong context... e.g. does it know whether or not a command failed (I've seen it not know this for terminal commands)? Does it have the full output from a command it ran (I've seen this be truncated to the point of making the output useless)? Did the editor just entirely omit the contents of a file you told it to send to the AI (A real bug I've hit...)?
> One is that with some frontends you can't actually get the raw context window so the LLM is actually more capable of seeing what happened than you are. The other is that these context windows are often giant and making the LLM read it for you and guess at what happened is a lot faster than reading it yourself to guess what happened.
I feel like this is some bizzaro-world variant of the halting problem. Like...it seems bonkers to me that having the AI re-read the context window would produce a meaningful answer about what went wrong...because it itself is the thing that produced the bad result given all of the context.
It seems like a totally different task to me, which should have totally different failure conditions. Not being able to work out the right thing to do doesn't mean it shouldn't be able to guess why it did what it did do. It's also notable here that these are probabilistic approximators, just because it did the wrong thing (with some probability) doesn't mean its not also capable of doing the right thing (with some probability)... but that's not even necessary here...
You also see behaviour when using them where they understand that previous "AI-turns" weren't perfect, so they aren't entirely over indexing on "I did the right thing for sure". Here's an actual snippet of a transcript where, without my intervention, claude realized it did the wrong thing and attempted to undo it
> Let me also remove the unused function to clean up the warning:
> * Search files for regex `run_query_with_visibility_and_fields`
> * Delete `<redacted>/src/main.rs`
> Oops! I made a mistake. Let me restore the file:
It more or less succeeded too, `jj undo` is objectively the wrong command to run here, but it was running with a prompt asking it to commit after every terminal command, which meant it had just committed prior to this, which made this work basically as intended.
> They have no metacognition abilities, but they do have the ability to read the context window.
Sure, but so can you-- you're going to have more insight into why they did it than they do-- because you've actually driven an LLM and have experience from doing so.
It's gonna look at the context window and make something up. The result will sound plausible but have no relation to what it actually did.
A fun example is to just make up the window yourself then ask the AI why it did the things above then watch it gaslight you. "I was testing to see if you were paying attention", "I forgot that a foobaz is not a bazfoo.", etc.
I've found it to be almost universally the case that the LLM isn't better than me, just faster. That applies here, it does a worse job than I would if I did it, but it's a useful tool because it enables me to make queries that would cost too much of my time to do myself.
If the query returns something interesting, or just unexpected, that's at least a signal that I might want to invest my own time into it.
I ask it why when it acts stupid and then ask it to summarize what just happened and how to avoid it into claude.md
With varied success, sometimes it works sometimes it doesn't. But the more of these Claude.md patches I let it write the more unpredictable it turns after a while.
Sometimes we can clearly identify the misunderstanding. Usually it just mixes prior prompts to something different it can act on.
So I ask it to summarize it's changes in the file after a while. And this is where it usually starts doing the same mistakes again
Not a big deal, it’s not a serious project, and I always commit changes to git before any prompt. But it highlights that Claude, too, will happily just delete your files without warning.