Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You can register list of held futexes with the kernel using sys_set_robust_list, and when the thread dies kernel for each entry will set a specific bit and wake waiter if there's one.

My biggest worry with that kind of thing is that the lock was guarding something which is now in an inconsistent state.

Without thoroughly understanding how/why the particular thread crashed, there's no guarantee that the data is in any sort of valid or recoverable state. In that case, crashing the whole app is absolutely a better thing to do.

It's really cool that the capabilities exist to do cleanup/recovery after a single thread crashed. But I think (off-the-cuff guess) that 95% of engineers won't know how to properly utilize robust locks with robust data structures, 4% won't have the time to engineer (including documentation) that kind of solution, and the last 1% are really really well-paid (or, should be) and would find better ways to prevent the crash from happening in the first place.



The concern about state consistency applies to all error conditions, not just those that occur while holding a mutex lock.

It doesn’t matter if multiple threads are running or just one - the process could be in the middle of updating a memory-mapped file, or communicating with another process via shared memory, or a thousand other things.

Ensuring consistency is excruciatingly hard. If it truly matters, the best we have is databases.


one option is to use a non-blocking algorithm that by definition will maintain consistency at instruction boundary. In fact you won't even need robust mutexes this way.

But of course consistency is only maintained if the algorithm is implemented correctly; if you are trying to robustly protect against a process randomly crashing, you might not want to rely on the correctness of that process (or that process randomly writing to shared memory on its way to a segfault).


That also assumes that your data only needs consistency within a machine word (i.e., data types where the CPU can support atomic instructions). If you're just trying to ensure that 64 bits of data are consistent, that's fine, but that usually wasn't so hard anyway.


You need atomic operations at the synchronization points, but you can build arbitrary complex data structures on top of it. For example a lock free tree will stay consistent even if a process dies in the middle of an update (you might need to garbage collect orphaned unpublished nodes).


I think the point isn't to expose this to normal developers, but instead do stuff like enable rust like poisoned states in locks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: