I'm curious where someone just gets 56.7 tb of storage that quickly.

bawolff · on Jan 14, 2021

Im more curious how someone pulls down 56 terabytes in a very short period of time without sysadmins at parler noticing. I'm surprised they didn't unintentionally DoS them.

foepys · on Jan 14, 2021

Parler used AWS and AWS is always happy to serve any request without problems and notify you about it hours later if you decided to create usage alerts.

berkes · on Jan 14, 2021

> and notify you about it hours later if you decided to create usage alerts

and bill you at the end of the month.

This is the crucial part: AWS will serve whatever is request. It brings them money.

INTPenis · on Jan 14, 2021

Sorry for OT but what about a simple small time private AWS user like me who uses it for static hosting my blog and stuff.

Does that mean someone could request a lot of files, over and over, to increase my bill? Or is that served by some cache?

Let's say for example I host Mastodon media on S3, about 30G of unique data. Could someone use that to increase the cost of my bill?

I do of course have a budget alert set but I have to react to that alert too.

leoedin · on Jan 14, 2021

AWS can really sting you. I've completely stopped using it for private projects after receiving a $100 bill for something I accidentally provisioned through the command line.

Not only is it very easy to spend real money by accident, the web interface makes it incredibly hard to work out what you're spending money on (it took me a long time to even understand what I was paying for, and then longer still to turn it all off).

Even if you know what you're getting in to, the pricing is very misleading. A few cents an hour adds up if its running 24/7, and its never obvious whether you're in the free tier or not.

If you don't have deep pockets, be very careful with AWS.

scrose · on Jan 14, 2021

The first thing I do at companies I join is dig through common services and tag all resources in a consistent way(ie. app=web-frontend). Then you can create resource groups and breakdown billing at an ‘application level’ through cost explorer, instead of relying on their default, very general filters. Not perfect, but it gets you 90% of the way towards understanding where your costs are.

no-s · on Jan 14, 2021

>>tag all resources in a consistent way

It amazes me when I get pushback for that, but it's to be expected inside toxic culture corporations. I'm kinda disgusted with AWS now so I'm looking to re-tool. The dangerous defaults and concern about arbitrary or uncaring TOU enforcement should be an impetus to diversification of service provision.

dom96 · on Jan 14, 2021

AFAIK yes. This is why I never get people that use AWS. DigitalOcean or any other VPS provider for that matter gives you a flat monthly rate so you know there will never be any surprises. Why take the risk?

mgkimsal · on Jan 14, 2021

There are genuine architectural and cost benefits for some type of configurations. But you really need to be an expert (or team of experts) to identify those situations, then architect and configure appropriately. Where it bites people is the "AWS by default" mentality many folks have (after a decade or more of lots of positive press) without understanding what they're using or why they're using it. Many people who make these decisions are shielded from the direct impact of any cost overruns too, to there is less reason to be sensitive to that. Almost any time I've worked with orgs using AWS, any reference to cost is "an engineer is more expensive!". Which is sort of true, but there's also typically no way a company could just accidentally hire 27x more people than they budgeted for in a single day, or that a rival company could force-hire those engineers in to your company without you knowing about it, sticking you with even just a day's cost for 50 engineers, for example.

nindalf · on Jan 14, 2021

This is always a possibility so you take steps to protect yourself. If it’s a few static assets, put a CDN in front of it. You won’t be charged extra by S3 because the CDN would cache it.

If you’re hosting Mastodon, then I assume you’d take steps to ensure that only an authenticated user can access any data. And that user would also need to be authorised to access only specific data. And that authenticated and authorised user would be rate limited so they couldn’t scrape everything they have access to easily.

If you do all these things, you’ll be fine.

supermatt · on Jan 14, 2021

Most CDNs also charge for transfer.

tim333 · on Jan 14, 2021

If you search HN for aws bill https://hn.algolia.com/?query=aws+bill

there are a lot of interesting things that can go wrong.

j2bax · on Jan 14, 2021

I believe the person that pulled it down is a digital archivist. I’m sure she has plenty of storage laying around for such occasions.

polar · on Jan 14, 2021

> laying around

ITYM lying around, unless this is a quirk of US English. Sorry to be pedantic!

nkrisc · on Jan 14, 2021

> unless this is a quirk of US English

It is indeed. Very common in colloquial speech around here.

adwww · on Jan 14, 2021

Presumably just another s3 bucket?

Do all your transferring from an EC2 instance in the same region and it never needs to waste bandwidth going over the public internet anyway.

jackweirdy · on Jan 14, 2021

Or local storage. The DataHoarder subreddit where a lot of similar efforts are coordinated has a lot of info about building dense home storage on the cheap

ikiris · on Jan 14, 2021

You can get that in 5 drives from best buy these days. Not exactly a huge leap for cloud storage.

jamesponddotco · on Jan 14, 2021

I got two 90 TB servers that I pay a small amount of peanuts per month at Hetzner to server as backup servers. As long as you stay away from the cloud, storage is dirty cheap.

ckdarby · on Jan 14, 2021

AWS.

If not them because in you're worried they'd also shut you down than probably BackBlaze.

Could also just buy a bunch of fairly cheap 100 mbit unmetered boxes off OVH/Kimsufi for a total cost probably of ~$300/m.

azeirah · on Jan 14, 2021

Modern hdd's store up to 18tb.

I saw 6tb hdd's for €114 on my local site, 16tb hdd's for €370.

It's not exactly cheap, but if you're doing it for a serious project like archiving an entire politically relevant social media website, I'm sure you'll have 1-2 thousand eur lying around for a couple of hard disks

mickotron · on Jan 14, 2021

Crowdsourced. The crawling and downloading was able to be coordinated and performed by a bunch of people at the same time.

flukus · on Jan 14, 2021

All the content was hosted on s3 and you just needed the URL's, security by obscurity.

PeterStuer · on Jan 14, 2021

The storage isn't that much of a deal, but I bet that was not on some cheap consumer Internet subscription as an ISP would have throttled her into oblivion after the first few TB.

asdff · on Jan 14, 2021

work gave us google drive accounts with "unlimited" capacity

PeterStuer · on Jan 14, 2021

You'll find out what 'unlimited' means in SaaS speak very, very much sooner than you expect if you tried really to utilize it.

sellyme · on Jan 15, 2021

You'd be surprised. Google has a lot of storage lying around, it takes them a while to notice it being used up. I know of at least three people who on the order of petabytes of data on Google Drive.

They took action against one organisation that had multiple 10PB+ users but Google really doesn't seem to care that much about the tertiary institutes giving unlimited GDrive accounts to every data hoarder who pretends to enrol.

ubercow13 · on Jan 14, 2021

Not at 50TB though.

compsciphd · on Jan 14, 2021

except you can only upload 750GB a day without somewhat workarounds

imvictor · on Jan 15, 2021

You can. Create multiple service accounts, add them to the Shared drives and connect the service accounts using Rclone, a CLI tool that allows you to perform I/O operations on multiple cloud storage.

Once a service account reached the limit, switch to another.

compsciphd · on Jan 15, 2021

"without somewhat workarounds" sometimes I wonder about people.