In my experience shared memory is really hard to implement well and manage: 1. U...

elBoberido · on Sept 29, 2024

> In my experience shared memory is really hard to implement well and manage:

I second that. It took us quite some time to get the correct architecture. After all, iceoryx2 is the third incarnation of this piece of software, with elfepiff an me working on the last two.

> 1. Unless you're using either fixed sized or specially allocated structures, you end up paying for serialization anyhow (zero copy is actually one copy).

Indeed, we are using fixed size structures with a bucket allocator. We have ideas on how to enable the usage on types which support custom allocators and even with raw pointers but that is just a crazy idea which might not pan out to work.

> 2. There's no way to reference count the shared memory - if a reader crashes, it holds on to the memory it was reading. You can get around this with some form of watchdog process, or by other schemes with a side channel, but it's not "easy". > > 3. Similar to 2, if a writer crashes, it will leave behind junk in whatever filesystem you are using to hold the shared memory.

Indeed, this is a complicated topic and support from the OS would be appreciated. We found a few ways on how to make this feasible, though.

The origins of iceoryx are in automotive and there it is required to split functionality up into multiple processes. When one process goes down, the system can still operate in a degraded mode or just restart the faulty process. With this, one needs an efficient and low-latency solution else the CPU is spending more time on copying data than on doing real work.

Of course there are issues like the producer mutating data after delivery, but here are also solutions for this. It will of course affect the latency but should still be better than using e.g. unix domain sockets.

Fun fact. For iceoryx1 we supported only 4GB memory chunks and some time ago someone came and asked if we could lift this limitation since he wanted to transfer a 92GB large language model via shared memory.

hardwaresofton · on Sept 29, 2024

Thanks for sharing here -- yeah these are definitely huge issues that make shared memory hard -- the when-things-go-wrong case is definitely quite hairy.

I wonder if it would work well as a sort of opt-in specialization? Start with TCP/UDS/STDIN/whatever, and then maybe graduate, and if anything goes wrong, report errors via the fallback?

I do agree it's rarely worth it (and same-machine UDS is probably good enough), but with the 10x gain essentially I'm quite surprised.

One thing I've also found that actually performed very well is ipc-channel[0]. I tried it because I wanted to see how something I might actually use would perform, and it was basically 1/10th the perf of shared memory.

[0]: https://crates.io/crates/ipc-channel

a_t48 · on Sept 29, 2024

The other thing is 10x improvement on basically nothing is quite small. Whatever time it takes for a message to be processed is going to be dominated by actually consuming the message. If you have a great abstraction, cool - use it anyhow, but it's probably not worth developing a shared memory library yourself.

hardwaresofton · on Sept 30, 2024

Agree, but it's also a question of where do you start from -- 10x is a lot to give up, and knowing you're giving it up is pretty important.

That said, the people who built iceoryx2 obviously believe it's worth investing in, which is interesting.

a_t48 · on Sept 30, 2024

I'm glad the team is putting in the work to get it working for the rest of us.