I dream of having a BTFS that will fix my "damaged" media files. E.g. ones I med...

kelchm · on April 16, 2024

Not the same as what you are talking about, but your comment reminded me of AccurateRip [1] which I used to make extensive use of back when I was ripping hundreds of CDs every year.

1: http://www.accuraterip.com/

a-french-anon · on April 17, 2024

Pretty sure AccuRip is only a collections of checksums to validate your rips. http://cue.tools/wiki/CUETools_Database actually improved on it to provide that healing feature (via some kind of parity, I guess?).

Related, I use and recommend https://github.com/cyanreg/cyanrip on modern UNIXes.

neolithicum · on April 16, 2024

Do you have any tricks you can share on how to rip a large library of CDs? It would be nice to semi-automate the ripping process but I haven't found any tools to help with that. Also the MusicBrainz audio tagging library (the only open one I am aware of?) almost never has good tags for my CDs that don't have to be edited afterwards.

mavhc · on April 16, 2024

https://github.com/whipper-team/whipper

https://github.com/thomas-mc-work/most-possible-unattended-r...

Finding a good CD drive to rip them is the first step.

https://flemmingss.com/importing-data-from-discogs-and-other...

IME Discogs had the track data most often.

And obviously rip to flac

neolithicum · on April 16, 2024

Great suggestions, I'll have to try these out. Thank you!

kelchm · on April 16, 2024

I’ll be honest, this was around 2005-2008 — it was a long time ago and at the time I really enjoyed the ritual of it all.

The main advice I can give you is to use ripping software that integrates with AccurateRip (XLD, EAC, etc) and use a widely supported lossless format (like FLAC).

Also — I can’t remember all the details, but there’s a way to store a CUE file, along with some metadata alongside your rip such that you can recreate an exact copy of the original physical media.

At least for now, I’ve moved on to streaming services, but I’m happy to know that I have a large library of music that I ripped myself to fall back to using instead, should I ever choose to.

eddieroger · on April 16, 2024

This project still seems alive to my pleasant surprise.

https://github.com/automatic-ripping-machine/automatic-rippi...

I never had it fully working because the last time I tried, I was too focused on using VMs or Docker and not just dedicating a small, older computer to it, but I think about it often and may finally just take the time to set up a station to properly rip all the Columbia House CDs I bought when I was a teen and held on to.

neolithicum · on April 16, 2024

Nice, I might install this on my Raspberry Pi.

jonhohle · on April 16, 2024

In the distant past iTunes was great at this (really). Insert a disc, its metadata is pulled in automatically, it’s ripped and tagged using whatever coded settings you want and when it’s done the disc is ejected.

Watch a show do some other work and when the toast pops out a new one in.

Ripping DVDs with HandBrake was almost as easy, but it wouldn’t eject the disc afterwards (though it could have supported running a script at the end, I don’t recall).

cheap_headphone · on April 16, 2024

It really was. In the early 2000s I had a stack of Mac laptops doing exactly this. Made some decent cash advertising locally to rip people's CD collections!

bayindirh · on April 16, 2024

I was ripping my CD's with KDE's own KIO interface, which also does CDDB checks and embeds original information to ID3 tags. Passing through MusicBrainz Picard always gave me good tags, but I remember fine tuning it a bit.

Now, I'll start another round with DBPowerAmp's ripper on macOS, then I'll see which tool brings the better metadata.

tarnith · on April 16, 2024

Why not run a filesystem that maintains this? (ZFS exists, storage is cheap)

gosub100 · on April 16, 2024

another use of this is to share media after I've imported it into my library. if I voluntarily scan hashes of all my media, if a smart torrent client could offer those files only (so a partial torrent because I always remove the superfluous files) it would help seed a lot of rare media files.

Stephen304 · on April 16, 2024

This happens to be one of the pipe dream roadmap milestones for bitmagnet: https://bitmagnet.io/#pipe-dream-features

I used to use magnetico and wanted to make something that would use crawled info hashes to fetch the metadata and retrieve the file listing, then search a folder for any matching files. You'd probably want to pre-hash everything in the folder and cache the hashes.

I hope bitmagnet gets that ability, it would be super cool

jonhohle · on April 16, 2024

I’ve done a lot of archival of CD-ROM based games, and it’s not clear to me this is possible without a lot of coordination and consistency (there are like 7 programs that use AccurateRip, )and those only deal with audio). I have found zero instances where a bin/cue I’ve downloaded online perfectly matches (hashes) to the same disc I’ve ripped locally. I’ve had some instances where different pressings if the same content hash differently.

I’ve written tools to inspect content (say in an ISO file system), and those will hash to the same value (so different sector data but the same resulting file system). Audio converted to CDDA (16-bit PCM) will hash as well.

If audio is transcoded into anything else, there’s no way it would hash the same.

At my last job I did something similar for build artifacts. You need the same compiler, same version, same settings, the ability to look inside the final artifact and avoid all the variable information (e.g. time). That requires a bit of domain specific information to get right.

gosub100 · on April 16, 2024

Sorry I think you misunderstood me. I mean when I download a torrent called "Rachmaninov Complete Discography" I copy the files to the Music/Classical folder on my NAS. I can no longer seed the torrent unless I leave the original in the shared folder. But if I voluntarily let a crawler index and share my Music folder, it could see the hash of track1.flac and know that it associates with a particular file in the original torrent, thus allowing others to download it.

pigpang · on April 16, 2024

How you will calculate hash of file, when it broken, to lookup for?

rakoo · on April 16, 2024

You have all the hashes in the .torrent file. All you need is a regular check with it

(but then the .torrent file itself has to be stored on a storage that resists bit flipping)

arijun · on April 16, 2024

If you’re worried about bit-flipping, you could just store multiple copies of the hash and then do voting, since it’s small. If you’re worried about correlated sources of error that helps less, though.

Dibby053 · on April 16, 2024

>storage [...] bit flipping

As someone with no storage expertise I'm curious, does anyone know the likelyhood of an error resulting in a bit flip rather than an unreadable sector? Memory bit flips during I/O are another thing but I'd expect a modern HDD/SSD to return an error if it isn't sure about what it's reading.

halfcat · on April 16, 2024

Not sure if this is what you mean, but most HDD vendors publish reliability data like “Non-recoverable read errors per bits read”:

https://documents.westerndigital.com/content/dam/doc-library...

Dibby053 · on April 16, 2024

Thanks for the link. I think that 10^14 figure is the likelyhood of the disk error correction failing to produce a valid result from the underlying media, returning a read error and adding the block to pending bad sectors. A typical read error that is caught by the OS and prompts the user to replace drives.

What I understand by bit flip is a corruption that gets past that check (ie the "flips balance themselves" and produce a valid ECC) and returns bad data to the OS without producing any errors. Only a few filesystems that make their own checksums (like ZFS) would catch this failure mode.

It's one reason I still use ZFS despite the downsides, so I wonder if I'm being too cautious about something that essentially can't happen.

everfree · on April 16, 2024

Just hash it before it's broken.

jonhohle · on April 16, 2024

Maybe this is a joke that’s over my head, but the OP wants a system where damaged media can be repaired. They have the damaged media so there’s no way to make a hash of the content they want.

OnlyMortal · on April 17, 2024

How far would error correction go?

alex_duf · on April 16, 2024

if you store the merkle tree that was used to download it, you'll be able to know exactly which chunk of the file got a bit flip.

01HNNWZ0MV43FF · on April 16, 2024

You could do a rolling hash and say that a chunk with a given hash should appear between two other chunks of certain hashes

arijun · on April 16, 2024

That seems like a recipe for nefarious code insertion.

01HNNWZ0MV43FF · on April 19, 2024

oh shit yeah it does lol

selcuka · on April 16, 2024

Just use the sector number(s) of the damaged parts.

Fnoord · on April 16, 2024

Distribute parity files together with the real deal, like they do on Usenet? Usenet itself is pretty much this anyway. Not sure if the NNTP filesystem implementations work. Also, there's nzbfs [1]

[1] https://github.com/danielfullmer/nzbfs

drlemonpepper · on April 16, 2024

storj does this