My favorite ring buffer structure is like the one described in https://www.gnura...

dist1ll · on Feb 29, 2024

Unfortunately that means the chunks are only contiguous in virtual memory. So it won't work with the DMA use case mentioned in the article, which requires contiguous physical addresses.

But it's still a nice trick, I like it when people get creative with HW features.

jnwatson · on Feb 29, 2024

Theoretically, the same trick could be used to double map addresses coming from an external master via an IOMMU.

speed_spread · on Feb 29, 2024

It is unlikely that cheap microcontrollers where DMA is most helpful will have an IOMMU.

gpderetta · on Feb 29, 2024

But the hardware only needs to see on copy of the duplicated memory and you can let it deal with the wraparound. The software can use the convenience of the double mapping.

gpderetta · on Feb 29, 2024

AKA the Magic Ring Buffer. It extremely convenient not having to deal with split messages, especially if you support variables size payloads.

scottlamb · on Feb 29, 2024

This is a cool trick, and iirc there are a few Rust crates that implement it, including slice-deque.

...but I think there are a few significant downsides, even in userspace:

* The most obvious one: you have to write `unsafe`, non-portable code. The Linux, macOS, and Windows implementations are totally different from each other.

* The setup and teardown of each ring is a bit expensive (few system calls). For a regular allocation, the malloc implementation typically caches that for you. Here you have to do your own pooling if you might be frequently creating them.

* Using whole pages, and two mappings per ring, is wasteful in terms of not only RAM (often no big deal) but also TLB space (which often turns into significant CPU usage). If you just allocate 32 64 KiB rings from the standard allocator, on x86-64 you might be talking about a single 2 MiB huge page mapping. If you do this instead, you're talking about 1024 4 KiB page mappings.

duped · on Feb 29, 2024

Any real, production ready ring buffer should be using unsafe. I would consider anything that doesn't to be a toy.

In Rust it's basically impossible to do this without MaybeUninit. You could use Option, but then you're paying a massive cost for a very easy to write and audit chunk of unsafe code.

scottlamb · on March 1, 2024

I don't think it's useful to consider "uses `unsafe`" as a boolean. A one-line `unsafe` around `MaybeUninit::assume_init` isn't the same as an `unsafe` module per platform wrapping VM operations.

Also, it's not that crazy for a byte-oriented buffer to start with a `vec![0u8; N]` (cheap) and not need `MaybeUninit` at all. Probably doesn't buy you that much though; you still want to be careful to not leak previous bytes.

Also, you might be missing the point of my comment if you're responding to one word of "the most obvious [downside]" and not the other bullets...

duped · on March 1, 2024

It's not an attack on the wording, but the correctness of your first bullet point. `unsafe` is appropriate for the initialization of a ring buffer in Rust. That's true for using `mmap` or anything in "pure" Rust using the allocator API to get the most idiomatic representation (which can't be done in safe or stable Rust). It's not one line. It's also not platform dependent, the code is the same on MacOS, Linux, and Windows the last I tried it.

The rest of the bullet points are issues with scaling, which sure, are valid. But if your bottleneck is determined by the frequency at which channels get created or how many exist then I would call architecture into the question. A ringbuffer is a heavy hammer to synchronization problems. It's appropriate in many, but not many times in the same application, in my experience.

This last month I've written a lock-free ring buffer to solve a problem and there's exactly one in an application that spawns millions of concurrent tasks.

scottlamb · on March 1, 2024

> It's not an attack on the wording, but the correctness of your first bullet point. `unsafe` is appropriate for the initialization of a ring buffer in Rust. That's true for using `mmap` or anything in "pure" Rust using the allocator API to get the most idiomatic representation (which can't be done in safe or stable Rust). It's not one line. It's also not platform dependent, the code is the same on MacOS, Linux, and Windows the last I tried it.

We're not talking about the same thing then.

I'm talking about setting up a mirrored allocation, as in this code here: <https://github.com/gnzlbg/slice_deque/tree/master/src/mirror...> or this code here: <https://github.com/mmaroti/vmcircbuf/blob/743e1f3622641ee281...> or this code here: <https://github.com/kalamay/vmap-rs/blob/b8a5f9c819b4dd41a5b7...> It is absolutely platform specific, in three different crates implementing the same idea...

Yes, most ring buffer implementations feature a little bit of `unsafe` code. No, it doesn't make sense to say "I have a tiny amount of `unsafe` already, so adding more has no cost."

> But if your bottleneck is determined by the frequency at which channels get created or how many exist then I would call architecture into the question. ... This last month I've written a lock-free ring buffer to solve a problem and there's exactly one in an application that spawns millions of concurrent tasks.

A lot of applications or libraries are written to support many connections, and you don't necessarily know when writing the code (or even when your server accepts an inbound connection) if those connections will be just cycled very quickly or will be high-throughput long-lived affairs. Each of those probably has a send buffer and a receive buffer. So while it might make sense for your application to have a single ring buffer for its life, applications which churn through them heavily are common and completely valid.

Sometimes folks do go a bit crazy with this. I question whether this XML parser <https://github.com/tvbeat/rapid-xml/blob/7dbffab5a25487221b2...> really needed a mirrored ring buffer implementation here, and for small documents the cost of its setup more than outweighs the significant effort they put into making this parser fast with SIMD operations. But then again, they probably optimized it for large documents, and naturally it supports both...

> A ringbuffer is a heavy hammer to synchronization problems.

While the implementation in the ferrous-systems.com article is a "high-perf lock-free ring-buffer for cross-thread communication", cross-thread synchronization isn't the only use for ring buffers. They're great for connections' send and receive buffers, as mentioned above. None of the ring buffers in the crates I linked to are `Sync`; they're meant to be used by one thread at a time.

signa11 · on March 1, 2024

ah the vm-trick ! have used it current (and previous) places of work to great effect.

rdtsc · on March 1, 2024

Yup. The other "official" name of it is The Magic Ring Buffer as a sibling comment mentioned.