Debian's actually one of the distros I thought of when I said "properly set up". Their tools packages are very out of date, they don't install the proper maintenance setup by default, and the installer doesn't support subvolumes. Going through the man page, yeah I see it does mention using btrfs-check for various parts when generally that is not recommended (see the Arch Wiki[0] or OpenSUSE docs[1] to see how they warn against it).
> So keep parroting "it's stable!" all you want, my experience has shown btrfs is "stable" until you have a problem.
I've been running it on multiple production machines for years now, as well as my home machine. Facebook has been using it in production for I think over a decade now, and it's used by Google and Synology on some of their products.
I'm not saying it doesn't have problems (I've certainly faced a few), but it is tiresome reading the same cracks against it because they set it up without reading the docs. You never see the same thing against someone running ZFS without regular scrubs or in RAIDZ1.
It just seems weird to me to still be seeing RTFM more than a decade after I last received an actual FM for me to R.
A well-designed, general-audience technology product doesn't require one to be initiated into the mysteries before using it. The phone in my pocket is literally a million times more complicated than my first computer and I haven't read a bit of documentation. It works fine.
If btrfs wants to be something people use, its promoters need to stop blaming users for bad outcomes and start making it so that the default setup is what people need to get results at least as good as the competition. I have never read an extfs manual, but when I've had problems nobody has ever blamed bad outcomes on me not reading an extfs manual.
Particularly as in this case I did read the fucking manual. I read every relevant page of the btrfs-wiki and the man-pages before building this filesystem. What I've found is there are still relevant implementation details documented only in the dev's head or the endless mailing list archives.
Btrfs has not been, is currently not, and unlikely in future to become a general population usable file system, which is a shame as 10 years ago it looked like a promising move forward.
Its window to it was when setting up ZFS included lots of hand-waving. That window now has closed. ZFS is sable, does not eat data, does not have a cult of wizards spelling "RTFM" and is can be installed in major distributions using easy to follow procedure. In a year or two I expect that procedure to be fully automated, to a point where one could do a root on ZFS.
I haven't tried this yet but supposedly the Ubuntu installer can setup ZFS on root for a very basic[1] install. (i.e: No redundancy, and no encryption. The former one could trivially add after the fact by attaching a mirror & doing a scrub. The latter you could also do post-install w/ some zfs send+recv shenanigans, and maybe some initramfs changes.)
I do use the Ubuntu live image pretty regularly when I need to import zpools in a preboot environment and it works great. In general it's not my favorite distro - but I'm happy to see they're doing some of the leg work to bring ZFS to a wider audience.
> In a year or two I expect that procedure to be fully automated, to a point where one could do a root on ZFS.
Ubuntu has been able to install directly to root-on-ZFS automatically since 20.04. I don't think any other major distros are as aggressive about supporting ZFS due to the licensing problem, but the software is already there.
> You never see the same thing against someone running ZFS without regular scrubs or in RAIDZ1.
ZFS doesn't have these kinds of hidden gotchas, and that's the key difference. Yeah ok somebody's being dumb if they never scrub and find out they have uncorrectable bad data come from two drives on a raidz1. That's exactly the advertised limitation of raidz1: it can survive a single complete drive failure, and can't repair data that has been corrupted on two (or more) drives at once.
If you are in the scenario, as the GP was, that you have a two-disk mirror and regular scrubs have assured that one of the disks has only good data, and the other dies, ZFS won't corrupt the data on its own. If you try replacing the bad drive with another bad drive, eventually the bad drive will fail or produce so many errors that ZFS stops trying to use it, and you'll know. The pool will continue on with the good drive and tell you about it. Then you buy another replacement and hope that one is good. No surprises.
>ZFS doesn't have these kinds of hidden gotchas, and that's the key difference. Yeah ok somebody's being dumb if they never scrub and find out they have uncorrectable bad data come from two drives on a raidz1. That's exactly the advertised limitation of raidz1: it can survive a single complete drive failure, and can't repair data that has been corrupted on two (or more) drives at once.
Why is ZFS requiring scrubs and understanding the limitations of it's RAID implementations okay, but btrfs requiring scrubs and understanding the limitations of its RAID implementations "hidden gotchas"?
> If you are in the scenario, as the GP was, that you have a two-disk mirror and regular scrubs have assured that one of the disks has only good data, and the other dies, ZFS won't corrupt the data on its own.
Honestly, I don't know enough about GP's situation to really comment on what happened there. It could have been btrfs or perhaps they were using hardware RAID and the controller screwed up. ZFS is definitely very good in that regard and I want to be clear that I'm not saying ZFS is bad or that btrfs is better then it; I've been using ZFS much longer then I have btrfs, back before ZoL was a thing.
> Why is ZFS requiring scrubs and understanding the limitations of it's RAID implementations okay, but btrfs requiring scrubs and understanding the limitations of its RAID implementations "hidden gotchas"?
It read more like btrfs corrupting data despite good scrubbing practice; hosing the file system on the good drive instead of letting it remain good, for instance. If that's a misreading, that is where my position came from.
Regular scrubs and understanding the limitations of redundancy models is good on both systems, yes.
My own anecdotal evidence though: btrfs really does snag itself into surprising and disastrous situations at alarming frequency. Between being unable to reshape a pool (eg, removing a disk when plenty of free space exists) and not being safe with unclean shutdowns, it's hard to ever trust it. It even went a few good years where it seemed to be abandoned, but I guess since 2018 or so it's been picked up again.
Ah, I understand what you're saying now. Yeah, that's fair assuming it was btrfs's fault for the data loss.
>Between being unable to reshape a pool (eg, removing a disk when plenty of free space exists) and not being safe with unclean shutdowns, it's hard to ever trust it. It even went a few good years where it seemed to be abandoned, but I guess since 2018 or so it's been picked up again.
FYI, btrfs does support reshaping pool with the btrfs device commands.
for reference: this is referring to a systemd service script (systemd-udev-trigger/systemd-udev-settle) with a race condition where the pool may not be mounted by the time systemd tries to use it.
that's (a) not really a bug in ZFS, and (b) "fails to boot sometimes" is pretty different from btrfs shitting the bed and corrupting its pool. There was one of those recently with ZFS iirc (and specifically only ZFS-on-Linux) but they are fairly rare and notable when they occur!
(As a general statement, ZoL is less mature than ZFS-on-FreeBSD and likely (perhaps) to continue to be so given the licensing issues. I've also run into some problems where I can't send a dataset from FreeBSD to a ZoL pool (but rsync works fine). But again, generally bugs that actually lead to data loss are exceedingly rare.)
It just kind of sucks, when it happens, you are thrown into recovery console at boot and there is no solution.
From what I saw, it happened if the disk initialization took longer and ZoL looked for its pools before all disks were found. It hints at improper dependencies in ZoL startup.
btrfs is default on two distros (that I know of) OpenSUSE and Fedora. If you're using one of those two distros, don't read the documentation, and then your data is eaten/etc then that's fair and you can rightfully be upset. I would be too.
But if you're using it in some other setup, then that means you went out of your way to try a more complicated filesystem. I would think it's reasonable to do at least a quick scan of the btrfs wiki or your distro's documentation before continuing with that, the same way I'd expect someone would do the same for ZFS.
> I would think it's reasonable to do at least a quick scan of the btrfs wiki or your distro's documentation before continuing with that, the same way I'd expect someone would do the same for ZFS.
I would argue that the defaults should not be dangerous. If a filesystem is released in to the world as stable and ready for use, it's not absurd to expect that running mkfs.<whatever> /dev/<disk> will get you something that's not going to eat your data. It might not be optimized, but it shouldn't be dangerous.
Dangerous defaults are loaded footguns and should be treated as bugs.
If there is no safe option, there should not be a default and the user should be required to explicitly make a choice. At that point you can blame them for not reading the documentation.
What dangerous defaults do you think may exist with btrfs? A simple mkfs.btrfs won't set you up for later trouble unless you go out of your way to give it multiple devices and ask for one of the parity RAID modes.
There are some arguable footguns in the btrfs-tools workflows for repairing a damaged filesystem, but that's exactly the fragile situation where asking the user to RTFM before making more changes is perfectly reasonable.
> What dangerous defaults do you think may exist with btrfs?
The ones being discussed in the parent posts in this thread. I am not myself particularly familiar with btrfs internals, having only used it once, but I have heard of there being issues along those lines.
> There are some arguable footguns in the btrfs-tools workflows for repairing a damaged filesystem, but that's exactly the fragile situation where asking the user to RTFM before making more changes is perfectly reasonable.
My point is that in the cases where RTFM is a requirement there should not be a default behavior. Doing the dangerous thing should always require an explicit request and not be something that one can autocomplete their way to. If there is a "doing X is only safe when you also fill in Y parameter and put Z in mode W" then it either shouldn't let me do X without those other things at all or should require a "--yes-really-do-the-stupid-thing" type flag.
The thing mentioned upthread where it's possible to mount a damaged filesystem RW, but only once. If that's true, then attempting to do so through a normal command someone who knows Unixy systems might just perform without thinking should scream bloody murder to make sure the user knows what's going on, and then should require some explicit confirmation of intent to move forward with the dangerous and/or irreversable operation.
You're being too vague. It's silly to opine on what btrfs should do without checking whether that's already the case. You should actually point out any specific default behaviors that are questionable rather than simply speculating ones that may or may not exist. The only specific thing you've mentioned so far is one that you could trivially have found out requires a non-standard mount option.
> So keep parroting "it's stable!" all you want, my experience has shown btrfs is "stable" until you have a problem. I've been running it on multiple production machines for years now, as well as my home machine. Facebook has been using it in production for I think over a decade now, and it's used by Google and Synology on some of their products.
I'm not saying it doesn't have problems (I've certainly faced a few), but it is tiresome reading the same cracks against it because they set it up without reading the docs. You never see the same thing against someone running ZFS without regular scrubs or in RAIDZ1.
[0]https://wiki.archlinux.org/title/Btrfs#btrfs_check
[1]https://en.opensuse.org/SDB:BTRFS