Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where are you going to store this data? It's dozens of petabytes.


It's only a few racks worth of disk servers.

If I was building it from my 5 minutes of googling, using 15TB nvme u2 drives, and easily available server chasis, I can get 24 drives per 2u of a rack. That's 360 TB + a couple server nodes. So ~6u per PB. A full height rack is 42u, so 6-7PB per rack once you take up some of the space with networking, etc. So dozens is doable in a short datacenter row.

Realistically you could fit a lot more storage per U, depending on how much compute you need per unit of data. The example above assumes all the disks are at the front of the server only, if you mount them internally also, you can fit a lot more. (see Backblaze's storage pods for how they did it with spinning disks).

Dozens of PB is not that much data in 2023.


This is still like tens of thousands of dollars of equipment to store information about a single person's biology.


Probably an order of magnitude or two more. Still something that is feasable in a research context - early MRI and genome sequencing had similar "too much data" problems like this, but the researchers still built it out to learn stuff. Tech marched forward and these days no one really blinks about it. I presume that if such a "all the cells scanner" was invented today, it would only be used for research for a long time - and that by the time it became widespread data storage will have caught up.


> Dozens of PB is not that much data in 2023.

Yes it is. Just transferring it at data center speeds will take days if not weeks.


A truck has more bandwidth than network adapters technically


Should theoretical research of data structures and algorithms have been capped at 1GB in 1980 because that was the biggest single hard drive available back then and you couldn’t store for example a 2GB dataset on a disk?


Not at all, I'll still call out fantastic claims when I see them though.


Google has definitely indexed over a trillion pages.


Do you have any sources for this claim?

As far as I am aware Google doesn't publish any statistics about the size of its index, which no doubt varies.



Well what do you know, they contradict the claim made above.


Sorry, they've crawled trillions of pages, and narrowed it down to an index of 100s of billions. Conveniently, the link answers your question of "can you have PB sized indices?" to which we can clearly say, yes.


Where do you think computers store data?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: