These models are designed to produce a _plausible_ text output for a given prompt. Nothing more.
They are not designed to produce a _correct_ text output to a question or request, even if sometimes the output is correct. These proverbial stopped clocks might be correct more than twice a day, but that's just the huge training set speaking.
Well, I wasn't, but if you look at the top most comment of this thread [0] you'll see that considering the level of human reinforcement being demonstrated only reinforces my point.
They are not designed to produce a _correct_ text output to a question or request, even if sometimes the output is correct. These proverbial stopped clocks might be correct more than twice a day, but that's just the huge training set speaking.