Im more curious how someone pulls down 56 terabytes in a very short period of time without sysadmins at parler noticing. I'm surprised they didn't unintentionally DoS them.
Parler used AWS and AWS is always happy to serve any request without problems and notify you about it hours later if you decided to create usage alerts.
AWS can really sting you. I've completely stopped using it for private projects after receiving a $100 bill for something I accidentally provisioned through the command line.
Not only is it very easy to spend real money by accident, the web interface makes it incredibly hard to work out what you're spending money on (it took me a long time to even understand what I was paying for, and then longer still to turn it all off).
Even if you know what you're getting in to, the pricing is very misleading. A few cents an hour adds up if its running 24/7, and its never obvious whether you're in the free tier or not.
If you don't have deep pockets, be very careful with AWS.
The first thing I do at companies I join is dig through common services and tag all resources in a consistent way(ie. app=web-frontend). Then you can create resource groups and breakdown billing at an ‘application level’ through cost explorer, instead of relying on their default, very general filters. Not perfect, but it gets you 90% of the way towards understanding where your costs are.
It amazes me when I get pushback for that, but it's to be expected inside toxic culture corporations. I'm kinda disgusted with AWS now so I'm looking to re-tool. The dangerous defaults and concern about arbitrary or uncaring TOU enforcement should be an impetus to diversification of service provision.
AFAIK yes. This is why I never get people that use AWS. DigitalOcean or any other VPS provider for that matter gives you a flat monthly rate so you know there will never be any surprises. Why take the risk?
There are genuine architectural and cost benefits for some type of configurations. But you really need to be an expert (or team of experts) to identify those situations, then architect and configure appropriately. Where it bites people is the "AWS by default" mentality many folks have (after a decade or more of lots of positive press) without understanding what they're using or why they're using it. Many people who make these decisions are shielded from the direct impact of any cost overruns too, to there is less reason to be sensitive to that. Almost any time I've worked with orgs using AWS, any reference to cost is "an engineer is more expensive!". Which is sort of true, but there's also typically no way a company could just accidentally hire 27x more people than they budgeted for in a single day, or that a rival company could force-hire those engineers in to your company without you knowing about it, sticking you with even just a day's cost for 50 engineers, for example.
This is always a possibility so you take steps to protect yourself. If it’s a few static assets, put a CDN in front of it. You won’t be charged extra by S3 because the CDN would cache it.
If you’re hosting Mastodon, then I assume you’d take steps to ensure that only an authenticated user can access any data. And that user would also need to be authorised to access only specific data. And that authenticated and authorised user would be rate limited so they couldn’t scrape everything they have access to easily.
Or local storage. The DataHoarder subreddit where a lot of similar efforts are coordinated has a lot of info about building dense home storage on the cheap
I got two 90 TB servers that I pay a small amount of peanuts per month at Hetzner to server as backup servers. As long as you stay away from the cloud, storage is dirty cheap.
I saw 6tb hdd's for €114 on my local site, 16tb hdd's for €370.
It's not exactly cheap, but if you're doing it for a serious project like archiving an entire politically relevant social media website, I'm sure you'll have 1-2 thousand eur lying around for a couple of hard disks
The storage isn't that much of a deal, but I bet that was not on some cheap consumer Internet subscription as an ISP would have throttled her into oblivion after the first few TB.
You'd be surprised. Google has a lot of storage lying around, it takes them a while to notice it being used up. I know of at least three people who on the order of petabytes of data on Google Drive.
They took action against one organisation that had multiple 10PB+ users but Google really doesn't seem to care that much about the tertiary institutes giving unlimited GDrive accounts to every data hoarder who pretends to enrol.
You can. Create multiple service accounts, add them to the Shared drives and connect the service accounts using Rclone, a CLI tool that allows you to perform I/O operations on multiple cloud storage.
Once a service account reached the limit, switch to another.