I think it's a historically significant dataset. We've seen other datasets be preserved, such as GitHub arctic vault.
I agree that it's tenuous. I would give it 20% odds of hitting the 500 year mark at best. And I don't think all of the data will survive.
But if archive.org ever becomes unsustainable to run, the existing data will likely be preserved. Lots of companies will be incentivized to continue hosting the data, as it's excellent PR if nothing else. They don't need to continue gathering the data, just host it.
Hosting is only going to become cheaper as t -> infinity, and given the massive amount of compute I've seen Google wield, it's hard to imagine that an operation like archive.org can't find some way to be preserved.
All that said, the biggest threat is sudden data loss. This only works as long as the data doesn't get lost. Has archive.org posted their operations policies anywhere? It would be interesting reading.
Imagine a future gdpr-like policy that gives people's descendants ownership and copyright over everything they've said. Suddenly every word written into archive.org has an owner, who might come and sue archive.org or its managers. Soon every person alive has some grandparent who wrote something in the archive and some of them are wanting compensation for all the decades archive.org has been distributing grandpa's words for free.
It's less about the "getting it done" aspect. It's more about are they going to be around in 50/100/500 years. Will the tech be around that long? Will they keep up with the conversion of old tech into new tech? In my opinion, any kind of digital archive is just not a sound way to go about it. Analog all the way for long term archival.
Mm, you're right, but Geocities might be less interesting to historians than an archive of all internet history.
Also, as someone who has trained a few large GPT models, I think ML has a chance of preserving a lot of this data. Training datasets are only growing larger and larger, and although those aren't updated (yet), there's no reason to think they won't last for a long time.
I imagine that in 500 years, imagenet2012 might still be around as a historical curiosity, at least somewhere.
Well, everyone was hyped on perl at the turn of the millennium. Yet not many people write it anymore. I keep waiting for the re-surge, but it just doesn't look like it is going to happen.
At nuclear waste sites, even the feds have come up with a few ways of saying "Don't enter. It is bad" with different languages, pictorial signs, and such.
It is really tough to figure out what the next few hundred years looks like. And to be a bit political, I don't think anyone saw the invasion of the capitol building in January.
It isn't easy to predict the future. With the original poster in mind, I think the best bet would to be with archive.org.
Maybe archive.org should provide this service. It could be a way to generate revenue - say "here is a thousand bucks, keep it for eternity."
I'm not sure I would want my thoughts to last that long though.
(And I'm still not sure that it would survive for more than a few hundred years.) Maybe the right thing to do is do something so great for society that they want to write books on you (eg: George Washington).
I think the chance of future generations having the motivation to continue preserving OP's specific website would be quite low but there would be a much greater motivation to maintain a large organised archive.