Anthropic and OpenAI let you define a JSON schema to adhere to for tool calling....

cj · on Oct 19, 2024

For some reason, the guarantee in the format of the response doesn't seem sufficient in preventing backwards incompatible changes that may happen to models.

Yes, the response might be in a standard format. But a well formed response can still be bad/broken.

Another way to think about it, is it can "pass QA" one day, and "fail QA" the next day even if the API response is identically formatted/structured.

SparkyMcUnicorn · on Oct 20, 2024

This is why OpenAI and Anthropic provide date versioned models.

gpt-4o can change, but gpt-4o-2024-05-13 will always use the 2024-05-13 snapshot.

cj · on Oct 20, 2024

i have a feeling those dates are an illusion of sorts.

I get the feeling they frequently deploy hot patches for edge cases. I hate to call them edge cases because they are actually “real cases” - things like adjusting system prompts so one day it might happy answer “Fill in the blank: F _ _ _ you”.

To truly freeze a model, you would need to freeze its weights, freeze its system prompts (no one sees those), and avoid any and all action that might impact its output. Perhaps would even need the default temperature to be 0 so it’s truly a deterministic API, with the option to add in some temperature to the responses.

Until then, I consider those “versions” but only reference the model weights and not the abstractions around the model

felixvolny · on Oct 20, 2024

Tangent, but it seems like such a tough engineering challenge to keep all these models around and available at an instant