> Again, we have moved past hallucinations and errors to more subtle, and often human-like, concerns.
From my experience we just get both. The constant risk of some catastrophic hallucination buried in the output, in addition to more subtle, and pervasive, concerns. I haven't tried with Gemini 3 but when I prompted Claude to write a 20 page short story it couldn't even keep basic chronology and characters straight. I wonder if the 14 page research paper would stand up to scrutiny.
I feel like hallucinations have changed over time from factual errors randomly shoehorned into the middle of sentences to the LLMs confidently telling you they are right and even provide their own reasoning to back up their claims, which most of the time are references that don't exist.
I recently tasked Claude with reviewing a page of documentation for a framework and writing a fairly simple method using the framework. It spit out some great-looking code but sadly it completely made up an entire stack of functionality that the framework doesn't support.
The conventions even matched the rest of the framework, so it looked kosher and I had to do some searching to see if Claude had referenced an outdated or beta version of the docs. It hadn't - it just hallucinated the funcionality completely.
When I pointed that out, Claude quickly went down a rabbit-hole of writing some very bad code and trying to do some very unconventional things (modifying configuration code in a different part of the project that was not needed for the task at hand) to accomplish the goal. It was almost as if it were embarrassed and trying to rush toward an acceptable answer.
I've seen it so this too. I had it keeping a running tally over many turns and occasionally it would say something like: "... bringing the total to 304.. 306, no 303. Haha, just kidding I know it's really 310." With the last number being the right one. I'm curious if it's an organic behavior or a taught one. It could be self learned through reinforcement learning, a way to correct itself since it doesn't have access to a backspace key.
Disappointingly, that is an exceedingly good story for a high school assignment. The use of an appositive phrase alone would raise alarm bells though.
It's nitpicking for flaws, but why not -- what lens on an old DSLR, older than a car, will let you take a macro shot, a wide shot, and a zoom shot of a bird?
In any case I'm not surprised. It's a short story, and it is indeed _serviceable_, but literature is more than just service to an assignment.
From my experience we just get both. The constant risk of some catastrophic hallucination buried in the output, in addition to more subtle, and pervasive, concerns. I haven't tried with Gemini 3 but when I prompted Claude to write a 20 page short story it couldn't even keep basic chronology and characters straight. I wonder if the 14 page research paper would stand up to scrutiny.