My experience changes just throughout the day on the same model, it seems pretty clear that during peak hours (lately most of the daytime) Anthropic is degrading their models in order to meet demand. Claude becomes a confident idiot and the difference is quite noticeable.
I too have noticed variability and it's impossible to know for sure but late one Friday or Saturday night (PST) it seemed to be brilliant, several iterations in a row. Some of my best output has been in very short windows.
This is through providers such as Cursor, but the consistency of this experience has put me off from directly subscribing to Anthropic since I'm already subscribed up to my eyeballs in various AI services.
Last I'd checked, Anthropic would not admit that they were degrading models for obvious scummy business reasons, but they are probably quantizing them, reducing beam search, lowering precision/sampling), etc. because the model goes from being superpowered to completely unusable, constantly dropping code and mangling files, getting caught in loops, doing the weirdest detours, and sometimes completely ignoring my instructions from just one message prior.
t first I wondered if Cursor was mishandling the context, and while they indeed aren't doing the best with context stuffing, the rest of the issues are not context-related.