Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This vastly simplifies the situation, especially when the cloud is involved. Having a backup, much less a replica of such data requires an enormous infrastructure cost, whether it's your own or someone else's infrastructure. The time to bring that data back to a live and stable state again also is quite costly. (note the stable part)

It's a simple truth that even if you are at the millions of dollars point, there is a data size at which you are basically all-in with whatever solution you've chosen, and having a secondary site even for a billion dollar company can be exceptionally difficult and cost prohibitive to move that sort of data around, again especially when you're heavily dependent on a specific service provider.

Yes, the blame in part lies with making the decision to rely on such a provider. At the same time, there are compelling arguments for using an existing infrastructure instead of working on the upkeep of your own for data and compute time at that scale. Redundancy is built into such infrastructures, and perhaps it should take a little more evidence for the provider to decide to kill access to everything without hard and reviewed evidence.



It might be too expensive for some people. But really there is no other solution other than full backup of everything. Relying on a single point of failure, even on an infrastructure with a stellar record, is just a dead man walking.


And then of course there is the important bit that from a regulatory perspective 'just a backup' may be enough to be able to make some statements about the past but it won't get you out of the situation where due to your systems being down you weren't ingesting real-time date during the gap. And for many purposes that makes your carefully made back-up not quite worthless but close to it.

So then you're going to have to look into realtime replication to a completely different infrastructure and if you ever lose either one then you're immediately on very thin ice.

It's like dealing with RAID5 on arrays with lots of very large hard drives.


About ~6 years ago, I was involved in a project where data would increase by 100gb per day and the database would also significantly change every day. I vaguely remember having some kind of cron bash script with mysqldump and rsync that would have a near identical offsite backup of data (also had daily, monthly snapshots). We also had a near identical staging setup of our original production application which we would use to restore our application from the near-realtime backup we had running. We had to test this setup every other month - it was an annoying thing to do at first. But we were exceedingly good at it over time. Thankfully we never had to use our backup, but we slept at night peacefully.

Backup is a bit of an art in itself, everyone has a different type of backup requirement for their application, some solutions might not be even financially feasible. You might never end up using your backup ever at all, but all it needs is one very bad day. And if your data is important enough, you will need to do everything possible to avoid that possible bad day.


That's a good scheme. Note how things like GCP make it harder rather than easier to set something like that up, you'd almost have to stream your data in real time to two locations rather than to bring it in to GCP first and then to stream it back out to your backup location.

> Backup is a bit of an art in itself

Fully agreed on that, and what is also an art is to spot those nasty little single points of failure that can kill an otherwise viable business. Just thinking about contingency planning makes you look at a business with different eyes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: