So they could have had 100% redundant systems at OVH and still be under half the cost of a traditional "cloud" provider?
I would look at architecture and operations first. Their "main" node went down, and they did not have a way they could just bring another instance of it online fast on a fresh OVH machine (typically provisioned in a few minutes, assuming they had no hot standby). If the same happened to their "main" VM at a "hyperscaler" , I would guess they also would have been up the same creek. It is not the difference between 120 and 600 seconds to provision a new machine that caused their 10 hrs downtime.
If you're doing VPSes, then maybe, as long as they're not under the same node. If it's dedicated servers, then probably.
But I think "redundancy" is more like a spectrum, rather than a binary thing. You can be more or less redundant, even within the same VPS if you'd like, but that of course be less redundant than hosting things across multiple data centers.
And it's cheap enough that you can have replicated setup across two different providers and still be cheaper than one expensive cloud provider.
While AWS is probably towards the safer end if you want to put all your eggs in one basket, people are still putting all their eggs in one basket if they have everything at AWS as well...
But that question remains the same whether you are renting bare metal or VMs. You can rent OVH servers located at different datacentres all over the globe, and their Cloud SLA has higher uptime guarantees than AWS (what that is worth depends on the value you place on an SLA ofc.)
So they could have had 100% redundant systems at OVH and still be under half the cost of a traditional "cloud" provider?
I would look at architecture and operations first. Their "main" node went down, and they did not have a way they could just bring another instance of it online fast on a fresh OVH machine (typically provisioned in a few minutes, assuming they had no hot standby). If the same happened to their "main" VM at a "hyperscaler" , I would guess they also would have been up the same creek. It is not the difference between 120 and 600 seconds to provision a new machine that caused their 10 hrs downtime.