Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's obviously not there as a NAS filesystem, ZFS drop-in replacement, etc. But if what you take away from that is that BTRFS is no good as a filesystem on a single drive system, you're missing out. Just a few weeks ago I used a snapshot to get myself out of some horrible rebase issue that lost half my changes. Could I have gone to the reflog and done other magic? Probably. But browsing my .snapshots directory was infinitely easier!


Snapshots are the best thing in the world for me on Arch. I'm specifically using it cause I like to tinker with exotic hardware and it has the most sane defaults for most of the things I care about. Pacman is great, but the AUR can be a bit a sketchy sometimes on choices that package authors make. Having snapshots every time package changes happen that I can roll back to from my boot loader is _awesome_. If you've ever used a distro that does kernel backups in your boot loader, it's like that, except it's whole packages at a time! And being able to use subvolumes to control which parts of a snapshot to restore is awesome. I can roll back my system without touching my home directory, even on a single drive setup!


But browsing my .snapshots directory was infinitely easier!

I second this however I don't use the filesystem to get this functionality. I most often use XFS and have a cronjob that calls an old perl script called "rsnapshot" [1] that makes use of hardlinks to deal with duplicate content and save space. One can create both local and remote snapshots. Similar to your situation I have used this to fix corrupted git repos which I could have done within git itself but rsnapshot was many times easier and I am lazy.

[1] - https://wiki.archlinux.org/title/rsnapshot


Let me guess: you use XFS because of the large number of hardlinks it allows out of the box (as opposed to the small number allowed by ext4)?


For me personally it was a matter of benchmarks. In the past I have seen higher performance numbers especially when dealing with directories that contain a very large number of files creating / deleting vs ext4. I've not tested since the 5.11 kernel time-frame however. But to your point it would be nice to see XFS have an option to grow inode limits dynamically like brtfs instead of having to manually adjust it. I've honestly never even tried out brtfs but it looks like I should.


By the way, "git reflog" can usually get you out of horribly botched rebases without using special filesystem features: git reset --hard <sha1 of last good state from reflog>


The reflog can be a PITA to walk through. A less well known thing is that you can spelunk through the reflog by saying `<ref>@{N}` which means whatever commit `<ref>` was pointing at N changes to the ref ago. Super handy to double-check that the rebase squashing commits didn't screw things up if there were merge conflicts in the fixups.


Synology ships it for their NAS


Sort of. They use it on top of dm-raid and use dm-integrity for the checksum features. They claim BTRFS RAID is unstable.

https://kb.synology.com/en-us/DSM/tutorial/What_was_the_RAID...


Synology is running on top of mdraid, but does not use dm-integrity. Since it can be enabled per-share, and share creates a btrfs subvolume, that would be kind of difficult.

For scrubbing, plain-old btrfs-scrub is being used.


> They claim BTRFS RAID is unstable.

and they'd be correct


Does BTRFS actually need to be augmented like that? I've been afraid of using it for a NAS because it doesn't sound like it's as trustworthy as ZFS when it comes to handling bitrot. But I don't know if that's actually true. When I tried to find info on it a couple weeks ago, a lot of people were trying to claim that bitrot isn't a thing if you have ECC RAM.


BtrFS was previously restricted to crc32c checksums. This was enhanced to allow several more, including sha256. The crc32c checksum has also been improved as xxhash, which promises fewer collisions for safer deduplication. When configured for sha256, BtrFS uses a checksum that is as strong as ZFS.

However, the checksum must be chosen at the time of filesystem creation. I don't know of any way to upgrade an existing BtrFS filesystem.

Contrast this to ZFS, which allows the checksum to be modified on a file-by-file basis.


Hmm well that's good to know. I like the idea of being able to have mismatched drive capacities and easy expandability which is why I had been looking at Btrfs. I will have to look more into the checksum options.


send/receive should allow for checksum rotation; however this isn't in-place.


I didn't think of that as a migration path.


Breadth/creativity in the search of technical solutions is my specialty. Hope it helped!


It's raid 5/6 comes with a warning from the developers not to use it and RAID 1 is a weird arrangement that actually keeps 2 copies on however many disks and can lose data if a disk comes and goes for example with a bad cable.

Bitrot still happens with ECC.


Bitrot is a thing, getting random bit flips/etc past all the ECC and data integrity checks in the storage path is much harder.

Having filesystem level protection is good, but it its like going from 80% to 85% protected. That is because the most critical part remaining unprotected in a traditional RAID/etc system is actually the application to filesystem interface. Posix and linux are largly to blame here because the default IO model should be async interface where the completion only fires when the data is persisted and things like read()/write()/close() should be fully serialized with the persistence layer. Otherwise, even with btrfs the easiest way to lose data is simply write it to disk, close the file, and pull the power plug.


This heavily depends on your usage model.

For example if your use case is a file archive (think raw photo or video), then fiesysytem interface does not matter - if the computer crashes soon after copy, you re-copy from original media. But bit flips are very real and can ruin your day.


I'm really curious, what your storage stack is that your getting undetected bit flips. This stuff was my day job for ~10 years. I've seen every kind of error you can imagine, and I can't actually remember "bit flips" showing up in end user data, that wasn't eventually attributable to something stupid like lack of ECC ram, or software bugs. Random bit flips tend to show up two ways, the interface/storage mechanism gets "slow" (due to retries/etc) or flat out read errors. This isn't the 1980's where you could actually read data back from your storage mechanism and get flipped bits, there are too many layers of ECC on the storage media for it to go undetected. Combined with media scrubbing failures tend to be all or nothing, the drive goes from working to 100% dead, or the raid kicks it when the relocated sector counts start trending up, or the device goes in a read only mode. What most people don't understand is that IO interfaces these days aren't designed to be 100% electrically perfect. The interface performance/capacity is pushed until the bit error rate (BER) is not insignificant, and then error correction is applied to assure that the end result is basically perfect.

But as I mentioned these days I'm pretty sure nearly all the storage loss that isn't physical damage is actually software bugs. Just a couple months ago I uploaded a multiple GB file (on my ECC protected workstation) to a major hyperscaler's cloud storage/sharing option. Sent the link to a colleague half way around the globe and they reported an unusual crash. So I asked them to md5sum the file and they got a different result from what I got, so I downloaded the file myself and diffed it against the original and right in the middle there was a ~50k block of garbage. Uploaded it again, and it was fine. Blame it on my browser, or whatever if you will, but the end result was quite disturbing because I'm fairly certain my local storage stack was fine. These days i'm really quite skeptical of "advanced" filesystem/etc. What I want is a dumb one where the number 1 priority is data consistency. I'm not sure that is an accurate reflection of many of them, where winning the storage perf benchmark, or feature wars seems to be a higher priority.


Last time I have seen data damage was about 2005, this was multiple SATA drives connected to a regular consumer motherboard running Linux (sorry, don't remember the brands). If I remember right, I think there was an 8-byte block damaged every few gigabyte transferred or so? So a very high number of damaged files given I had a few terabytes of data.

I never found the cause, because I just switched to a completely different system to copy data. I know it was not disk-specific because this was happening on multiple hard drives, nor it was physical damage, as SMART/syslog were silent, and reading the disk again was giving correct data. Memory was fine -- not ECC, but I did run a lot of memtest's on it.

Later on, I found some blog posts which mentioned the similar problem and claim it was result of bad SATA card, or bad cable, or even bad power supply. I remember there was original one made by Jeff Bonwick on his ZFS blog, but I cannot find it anymore. Here is a more modern link instead: https://changelog.complete.org/archives/9769-silent-data-cor...

I now have the homegrown checksumming solution which I use after each major file transfer, and I have not seen any data corruption yet (*known on the wood().


ECC can't protect the interface to your storage device. Bit flips can happen at any point in that chain.


Which part of the chain isn't protected by at least a packet CRC?

I think the answer is its all covered.


Link-level CRCs don't protect against bit flips that happen during processing or while transitioning between links with different CRCs. For maximum end-to-end integrity you want to calculate the check value over the data in (ECC) RAM before writing it and the check value to storage, and verify the check value after reading all the data back in—this ensures that a bit-flip will be detected no matter where it occurs in the pipeline.


No it isn't. At some point data is in transit in an IC without any error handling. CRCs aren't 100% reliable. Bit flips are guaranteed to happen with non-zero probability in all digital electronics. Bad data eventually makes it to persistent storage.


I don't know what parts of the storage stack you guys are working on, but I worked on enterprise storage systems we had 100% coverage in one form or another from the moment it left the RAM, via the intel interconnects, PCIe, the FC adapters which were either ECC or were using some form of data protection wrapping the entire transaction. So random bit flips in serdes/etc didn't cause problems because the higher level link protocols protected the data between the onboard ram and the endpoint, then there were the higher level protocol also providing their own data integrity.

Looking at even SATA 1, if it is implemented _CORRECTLY_ you get the packet CRC protecting the link data from the point its formed to the point the endpoint verifies the result. So like ethernet sometimes it doesn't matter if some random piece of junk in the process doesn't do its own ECC validation, because its covered under a higher level of the stack.

If your adapters are "desktop" grade then I might consider seeking another vendor if you care about data integrity, some vendors are definitely shipping crap, but there are vendors I can assure you will detect link/etc failures.

And as a side, note, I've seen a lot of bad data, a huge percentage of it, were kernel/etc filesystem errors. We added a bunch of out of band extra metrics to track when/where writes were going and our own metadata layers/etc, and it uncovered a whole bunch of software errors.


ZFS can easily protect you from that:

>>it is designed with a focus on data integrity by protecting the user's data on disk against silent data corruption caused by data degradation, power surges (voltage spikes), bugs in disk firmware, phantom writes (the previous write did not make it to disk), misdirected reads/writes (the disk accesses the wrong block), DMA parity errors between the array and server memory or from the driver (since the checksum validates data inside the array), driver errors (data winds up in the wrong buffer inside the kernel), accidental overwrites (such as swapping to a live file system), etc.

https://en.wikipedia.org/wiki/ZFS#Data_integrity


You can typically make a branch and push it up to the server, and I usually do this before any rebase. In future development scenarios (mac, windows) will you be able to do this reliably in the future?


Or just write down the commit hash. Or just learn your tools and stop worrying because git reflog got you covered.


I've had issues with reflog that were not easily solved in the past, and wasn't clear what the fix was.

Also tagging is better than writing it down. Like initials+date.


That sounds extremely unlikely. Did you look into what the reflog actually is? And a tag is literally just a pointer to a commit hash. Sure, you can tag it, but don't embarrass yourself by pushing it to a remote.


So you do realize you're on a website with a bunch of smart people, right? Telling them they have no idea what they're talking about is kinda dumb.


Go ahead and give some examples of cases where the reflog won't help but tagging will?


No.


So you're admitting you were completely wrong?


No.


Well, you are.


No.


Guess you never matured past the 8 year old's level of arguing: No no no nananana i'm right sticks fingers in ears


Pro Tip:

Don't insult people's intelligence then command them to spend hours to prove that you're wrong.

Just start with the assumption that you may be wrong next time.


Pro tip: Don't pretend to know better than everyone else, then go "no no no" like an 8 year old when someone asks for info ;) It's ok to admit you're wrong.


No.


Incremental snapshots as a backup tool and being able to chroot into a new snapshot of root to do updates is so useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: