Wow. Sneaky. They do not even state the rate of impact for the XLA bug afaik, which affected everyone, not just claude code users, very vague. Interesting.
Claude code made almost half a billion so far[1] (>500m in ARR and its like 9 months old) , and 30% of all users have been impacted at least once, just from the first routing bug. Scary stuff.
Their post mortem is basically "evaluations are hard, we relied on vibe checking, now we are going to have even more frequent vibe checking". I believe it was indeed unintentional, but in the future where investor's money wont come down from the skies, serving distilled models will be very tempting. And you can not be liable to any SLA currently, it's just vibes. I wonder how enterprise vendors are going to deal with this going forward, you cannot just degrade quality without client or vendor even being able to really prove it.
Of course not! But usually, you can quantify metrics for quality, like uptime, lost transactions, response time, throughput etc. Then you can have accountability, and remediate. Even for other bugs, you can often reproduce and show clearly the impact. But in this case, other than internal benchmarks, you cannot really prove it. There is no accountability yet
We already kind of have a solution for this with SLAs. Humans, being (probably) non-deterministic, also fuck up. An expectation of a level of service is, I think, reasonable. It's not "zero mistakes ever", just as it can't be "zero bugs ever".
We're firmly in the realms of 'this thing is kind of smarter / faster at a task compared to me my employees, so I am contracting it to do that task'.
That doesn't mean 'if it fails, no payment'.
But I think it's too analogous to non-tech-products to hide behind a 'no refunds' policy. It's that good - there are consequences for it, I think.
Claude code made almost half a billion so far[1] (>500m in ARR and its like 9 months old) , and 30% of all users have been impacted at least once, just from the first routing bug. Scary stuff.
Their post mortem is basically "evaluations are hard, we relied on vibe checking, now we are going to have even more frequent vibe checking". I believe it was indeed unintentional, but in the future where investor's money wont come down from the skies, serving distilled models will be very tempting. And you can not be liable to any SLA currently, it's just vibes. I wonder how enterprise vendors are going to deal with this going forward, you cannot just degrade quality without client or vendor even being able to really prove it.
[1][https://www.anthropic.com/news/anthropic-raises-series-f-at-...]