Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious where someone just gets 56.7 tb of storage that quickly.


Im more curious how someone pulls down 56 terabytes in a very short period of time without sysadmins at parler noticing. I'm surprised they didn't unintentionally DoS them.


Parler used AWS and AWS is always happy to serve any request without problems and notify you about it hours later if you decided to create usage alerts.


> and notify you about it hours later if you decided to create usage alerts

and bill you at the end of the month.

This is the crucial part: AWS will serve whatever is request. It brings them money.


Sorry for OT but what about a simple small time private AWS user like me who uses it for static hosting my blog and stuff.

Does that mean someone could request a lot of files, over and over, to increase my bill? Or is that served by some cache?

Let's say for example I host Mastodon media on S3, about 30G of unique data. Could someone use that to increase the cost of my bill?

I do of course have a budget alert set but I have to react to that alert too.


AWS can really sting you. I've completely stopped using it for private projects after receiving a $100 bill for something I accidentally provisioned through the command line.

Not only is it very easy to spend real money by accident, the web interface makes it incredibly hard to work out what you're spending money on (it took me a long time to even understand what I was paying for, and then longer still to turn it all off).

Even if you know what you're getting in to, the pricing is very misleading. A few cents an hour adds up if its running 24/7, and its never obvious whether you're in the free tier or not.

If you don't have deep pockets, be very careful with AWS.


The first thing I do at companies I join is dig through common services and tag all resources in a consistent way(ie. app=web-frontend). Then you can create resource groups and breakdown billing at an ‘application level’ through cost explorer, instead of relying on their default, very general filters. Not perfect, but it gets you 90% of the way towards understanding where your costs are.


>>tag all resources in a consistent way

It amazes me when I get pushback for that, but it's to be expected inside toxic culture corporations. I'm kinda disgusted with AWS now so I'm looking to re-tool. The dangerous defaults and concern about arbitrary or uncaring TOU enforcement should be an impetus to diversification of service provision.


AFAIK yes. This is why I never get people that use AWS. DigitalOcean or any other VPS provider for that matter gives you a flat monthly rate so you know there will never be any surprises. Why take the risk?


There are genuine architectural and cost benefits for some type of configurations. But you really need to be an expert (or team of experts) to identify those situations, then architect and configure appropriately. Where it bites people is the "AWS by default" mentality many folks have (after a decade or more of lots of positive press) without understanding what they're using or why they're using it. Many people who make these decisions are shielded from the direct impact of any cost overruns too, to there is less reason to be sensitive to that. Almost any time I've worked with orgs using AWS, any reference to cost is "an engineer is more expensive!". Which is sort of true, but there's also typically no way a company could just accidentally hire 27x more people than they budgeted for in a single day, or that a rival company could force-hire those engineers in to your company without you knowing about it, sticking you with even just a day's cost for 50 engineers, for example.


This is always a possibility so you take steps to protect yourself. If it’s a few static assets, put a CDN in front of it. You won’t be charged extra by S3 because the CDN would cache it.

If you’re hosting Mastodon, then I assume you’d take steps to ensure that only an authenticated user can access any data. And that user would also need to be authorised to access only specific data. And that authenticated and authorised user would be rate limited so they couldn’t scrape everything they have access to easily.

If you do all these things, you’ll be fine.


Most CDNs also charge for transfer.


If you search HN for aws bill https://hn.algolia.com/?query=aws+bill

there are a lot of interesting things that can go wrong.


I believe the person that pulled it down is a digital archivist. I’m sure she has plenty of storage laying around for such occasions.


> laying around

ITYM lying around, unless this is a quirk of US English. Sorry to be pedantic!


> unless this is a quirk of US English

It is indeed. Very common in colloquial speech around here.


Presumably just another s3 bucket?

Do all your transferring from an EC2 instance in the same region and it never needs to waste bandwidth going over the public internet anyway.


Or local storage. The DataHoarder subreddit where a lot of similar efforts are coordinated has a lot of info about building dense home storage on the cheap


You can get that in 5 drives from best buy these days. Not exactly a huge leap for cloud storage.


I got two 90 TB servers that I pay a small amount of peanuts per month at Hetzner to server as backup servers. As long as you stay away from the cloud, storage is dirty cheap.


AWS.

If not them because in you're worried they'd also shut you down than probably BackBlaze.

Could also just buy a bunch of fairly cheap 100 mbit unmetered boxes off OVH/Kimsufi for a total cost probably of ~$300/m.


Modern hdd's store up to 18tb.

I saw 6tb hdd's for €114 on my local site, 16tb hdd's for €370.

It's not exactly cheap, but if you're doing it for a serious project like archiving an entire politically relevant social media website, I'm sure you'll have 1-2 thousand eur lying around for a couple of hard disks


Crowdsourced. The crawling and downloading was able to be coordinated and performed by a bunch of people at the same time.


All the content was hosted on s3 and you just needed the URL's, security by obscurity.


The storage isn't that much of a deal, but I bet that was not on some cheap consumer Internet subscription as an ISP would have throttled her into oblivion after the first few TB.


work gave us google drive accounts with "unlimited" capacity


You'll find out what 'unlimited' means in SaaS speak very, very much sooner than you expect if you tried really to utilize it.


You'd be surprised. Google has a lot of storage lying around, it takes them a while to notice it being used up. I know of at least three people who on the order of petabytes of data on Google Drive.

They took action against one organisation that had multiple 10PB+ users but Google really doesn't seem to care that much about the tertiary institutes giving unlimited GDrive accounts to every data hoarder who pretends to enrol.


Not at 50TB though.


except you can only upload 750GB a day without somewhat workarounds


You can. Create multiple service accounts, add them to the Shared drives and connect the service accounts using Rclone, a CLI tool that allows you to perform I/O operations on multiple cloud storage.

Once a service account reached the limit, switch to another.


"without somewhat workarounds" sometimes I wonder about people.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: