For some reason, the guarantee in the format of the response doesn't seem sufficient in preventing backwards incompatible changes that may happen to models.
Yes, the response might be in a standard format. But a well formed response can still be bad/broken.
Another way to think about it, is it can "pass QA" one day, and "fail QA" the next day even if the API response is identically formatted/structured.
i have a feeling those dates are an illusion of sorts.
I get the feeling they frequently deploy hot patches for edge cases. I hate to call them edge cases because they are actually “real cases” - things like adjusting system prompts so one day it might happy answer “Fill in the blank: F _ _ _ you”.
To truly freeze a model, you would need to freeze its weights, freeze its system prompts (no one sees those), and avoid any and all action that might impact its output. Perhaps would even need the default temperature to be 0 so it’s truly a deterministic API, with the option to add in some temperature to the responses.
Until then, I consider those “versions” but only reference the model weights and not the abstractions around the model
Here's the part you're looking for: https://github.com/anthropics/anthropic-quickstarts/blob/mai...