Show HN: Sonyflake – distributed unique ID generator implemented in Rust

sylvain_kerkour · on Dec 31, 2020

After a lot of time spent investigating the different kind of (U)UIDs, I've come to the conclusion that ULIDs[0] are the best kind of (U)UIDs you can use for your application today:

* they are sortable so pagination is fast and easy.

* they can be represented as UUID in database (at least in Postgres).

* they are kind of serial, so insertion into indexes is rather good, as opposed to completely random UUIDs.

* 48 bits timestamp gives enough space for the next 9000 years.

[0] https://github.com/ulid/spec

Lazare · on Dec 31, 2020

I also spent some time evaluating them, and came to the conclusion that KSUIDs (https://github.com/segmentio/ksuid) were strictly better than ULIDs.

The differences aren't huge in practice, but the ULID spec has a couple of highly questionable choices. Specifically, according to the spec:

1. The first time you generate a ULID in a given millisecond on a node, you should fill in the random component randomly.

2. But if you generate more than one ULID per milliscond, subsequent ULIDs should use the first ULIDs random component and just increment the least significant bit.

3. If that would overflow, then you just can't generate any more ULIDs that millisecond.

The goal is that ULIDs will be strictly sortable even when you're generating multiple per millisecond...except for that to work, you'd have to only generate them in a single thread (or at any rate on a single node). If you're just incrementing a numbers on a single node, then uh, why not just use an autoincrementing integer column in a DB? Of course, if you're generating multiple IDs per millisecond, you're probably using multiple nodes, and if you are, then wall clock skew means the IDs will never be strictly sortable. Except actually it's even worse than that; even if you ARE only using a single node, your code probably can't assume that every ULID it will ever encounter was generated in that way, so you still can't rely on the sorting guarantee. Meanwhile, a compliant ULID generator has to have state and, if it's going to be called in a multithreaded way, to deal with some complex race conditions, which makes it a lot harder to run.

Meanwhile, you could get 100% of the benefits and none of the drawbacks by just...using random bits for the random component every time.

Anyhow, the idea of slapping a timestamp and a random string of bits together is a good one. I just think the KSUID spec is a bit better thought out.

sylvain_kerkour · on Dec 31, 2020

The problem I had with KSUID is that they have a custom format, so not optimised in Postgres, while ULIDs are fully interoperable with other kinds of UUIDs. So if your requirements change later, it's just a change to the function generating your UIDs.

Also I don't use monotonic generators, because in pratice, in my applications, it never make business sense to sort entities with a greater granularity than a millisecond, so even if I generate 10 entities' IDs in a millisecond, I will let the random part randomly sort the entities generated during this millisecond.

Monotonic generators, are in my opinion only useful when you require absolute ordering, like with events, and thus rarely are a concern.

Lazare · on Dec 31, 2020

> The problem I had with KSUID is that they have a custom format, so not optimised in Postgres

As far as I'm aware, Postgres doesn't really have any "optimisations" for UUIDs. The type just provides some help for normalising and converting them to and from their canonical form.

The only DB I know of with actual UUID optimisations is MS SQL, and that would be a bit of a trap, because its optimisation is to re-order the bits so the timestamp part of a classic UUID is at the front when it's stored...which means trying to store a ULID as a UUID in MS SQL would actually be slower than storing it as plain byes.

ComputerGuru · on Dec 31, 2020

The ordering of bytes within UUID/GUID types is one of the differences between the two, and also causes no end to grief when importing data across systems (or from hex representation to binary).

I wrote about some of the pitfalls and how to properly convert MS guid types to hex for interop with, eg, SQLite here: https://neosmart.net/blog/2018/converting-a-binary-blob-guid...

The difference in big vs little endian can cause huge problems in practice aside from just not having identical values - as you note, the order of the random part and the timestamp part is switched around, which will actually cause guids to lose their randomness and cluster around not just the time but the inverted time - the last part of which may often even be null.

ComputerGuru · on Dec 31, 2020

The monotonicity component of ULID is - confusingly - an optional part of the spec (which makes the spec weird). There is discussion to drop this component from the official spec altogether.

fatsdomino001 · on Dec 31, 2020

How does ULID compare to Instagram's bigint id_generator for postgres?[0]

ULID is 128 bits vs 64 bits for instagram id, so ULID is stored in a uuidfield but instagram id can be stored in bigint field.

There is some advantages of using the application code to generate the ID though (avoiding roundtrip to DB if need to use the ID afterwards), but I'm seduced by just offloading this work to the DB and being able to use bigint field to store the id.

As someone who is just about to commit to using ULID or an instagram-like bigint id generator, would you have any thoughts you could share on which may be better?

[0] https://instagram-engineering.com/sharding-ids-at-instagram-...

sylvain_kerkour · on Dec 31, 2020

Is there any reasons to prefer BIGINT instead of UUID (other than the very few % of performance, which may never be a problem)?

I just find instagram's ID complex (custom epoch, mandatory use of RETURNING) while Ulid is elegant (easy to generate and use).

For me the biggest selling point was the interoperability with other kinds of UIDs, if later your requirements change.

Also, note that (at least in Rust) Ulid are implemented as u128, so are pretty fast.

fatsdomino001 · on Dec 31, 2020

I'll admit that I'm not actually sure on the performance difference between BIGINT and UUID in postgres. So in that sense you're right that the question is a tradeoff of x% of performance (which may never be a problem) but with added complexity vs ULID which is simple/elegant and signifcantly less cognitive overhead (easier to maintain) but will cause bigger indexes vs bigint. It's a tradeoff.. and your post has me thinking again that realistically the maintainability and agility that ULID provides may actually be better than a potentially error-prone but x% more performant instagram-like bigint generator. I just don't know enough to properly compare the two, so I suspect opting for the easier cognitive load / maintainability / agility is best.

j-pb · on Dec 31, 2020

The last time ULID was mentioned in a thread it got some flack for its "weird" generator local monotonicity property.

But it's actually pretty useful, when you're building database indices, because it reduces the amount of entropy that your index has, and is therefore a big help for succinct data structures, which occupy space proportional to the information theoretic lower bound up to a log factor. This can make the difference between a multi mb and multi gb index.

Obscure feature, big gain.

Dowwie · on Dec 31, 2020

I'm skeptical but can be swayed. A snowflake id can be represented as a BIGINT in postgres. I'm curious how a BIGINT contrasts with a UUID column in terms of storage and performance.

If anyone has used a snowflake id in a production environment and ran into a unique id limit, please shout about it.

minitoar · on Dec 31, 2020

Not good if you’re using them in a situation where they are visible to the public (eg urls) and you don’t want to leak the time component.

mnahkies · on Dec 31, 2020

True, but for many entities the time they were created is basically public information anyway - time of a tweet, date hacker news account was created, time blog post was published. It might leak more granular information than would be publicly available otherwise, but for many use cases it's not a security concern as far as I can tell

minitoar · on Jan 1, 2021

Yes, this only matters if you don’t want to leak the time component, eg to thwart people from estimating how many total active users you have.

andy_ppp · on Dec 31, 2020

If you really need this why not add a public ID...

minitoar · on Dec 31, 2020

A benefit of ULIDs is the better index performance. Adding a second id column kinda undermines that, since you’ll need an index there as well.

Dowwie · on Jan 1, 2021

What column type?

giancarlostoro · on Dec 31, 2020

I like MongoDBs ObjectIDs because they secretly maintain a timestamp, therefore... if you sort it sorts by that value. MongoDB isnt for every use case but when I do use it I try to take advantage of everything it has.

jhgg · on Jan 1, 2021

Discord uses snowflake based IDs to great success. It's actually been one of the rock solid parts of our infrastructure and the fact that time stamps are embedded in the ID are great. Every channel, message, stream, attachment, user, server, etc... gets a snowflake.

We originally wrote this service in Elixir but have since rewritten it in rust. Actually very similar to this crate. Maybe I should open source ours too haha.

Dowwie · on Jan 1, 2021

Did you see the flake crate?

jhgg · on Jan 2, 2021

The one that says "wip; use at your own risk" with no updates in the last 5 years? Or is there another?

https://crates.io/crates/flake

Dowwie · on Jan 2, 2021

sorry, missed the N at the end -- I refer to "flaken": https://crates.io/crates/flaken

snowflake libraries generally aren't updated after they're stabilized

jhgg · on Jan 3, 2021

No I didn't but after looking at it it looks like its implementation falls short.

- it only looks at system time at system startup, then relies on instant for monotonic clock source. unfortunately for setups that do leap-second smearing (ours does), this would deviate slightly out of sync after operating for a while.

- does not handle seq overflow within a given millisecond window - this i think is very unsafe, and can cause the generation of duplicate IDs if `next_id` is called at too rapid an interval.

lmilcin · on Dec 31, 2020

Unfortunately, 8 bits for sequence for each 1/100th of a second is way too low.

I gather the person that designed this has never seen high traffic service. But for some having cap of 25.6k unique ids per second on a node is a deal breaker.

Jweb_Guru · on Dec 31, 2020

There are very, very few services which actually need to handle more than 25.6k unique IDs per node per second. I'm not saying they don't exist, but it is not a problem most companies have.

lmilcin · on Dec 31, 2020

Ah, yes... and 640kB of RAM was supposed to be enough, too. Or 4B IP addresses.

If anything, my experience is whenever somebody sets close hard limits like that somebody else is going to pay for it.

The service I work on creates up to couple millions of unique objects per second, on a single node in a batch process. It tries to do a large batch of work as fast as is possible. It fetches I think something like 2GB of data per second from MongoDB node, processes it and creates up to 5 million objects in span of 2-3 seconds. We also have realtime services that are about 20-30k TPS per node.

We are currently using MongoDB and it uses plenty bits for that:

https://docs.mongodb.com/manual/reference/method/ObjectId/

Jweb_Guru · on Jan 1, 2021

I can almost guarantee you that your service could be rewritten to not require millions of unique objects per second per node, and/or split up so that the nodes were on different cores. I'm quite confident you are not anywhere close to optimization boundaries around throughput because MongoDB is far from the best-optimized database for batch processing.

secondcoming · on Dec 31, 2020

Why the hell has this been downvoted?

Not everyone on HN is a web developer. Some of the audience works on high-throughput systems. Instead of downvoting, perhaps people would prefer to learn? They may save their company lots of money!

jfrisby · on Dec 31, 2020

Well, for starters, it's simply untrue that every company (likely) will (barring bankruptcy of course) eventually need extreme scale, and the implicit assumption that the up-front cost of implementing such scalability is necessarily worthwhile. That may be true in Staruplandia, but the industry is a _lot_ bigger than the world of Silicon Valley style startups.

For example: My current company is a small lifestyle biz. Sure, it's a "tech company", has an event-processing pipeline and the like, but if we were 100x more successful than our wildest ambitions, we still wouldn't need anywhere near 25.6k IDs/sec/server. And, given the objectives of the company, it would make more sense to turn customers away than to grow the team to accommodate that sort of demand.

The simple fact is that in a great many situations extreme scale isn't needed. And, either way, the cost of implementing extreme scalability can inflict its own harm. Highly scalable systems usually come with more operational complexity, and steeper learning curves. When you have a small team, this can impair -- or even cripple -- product development. If your company hasn't found product/market fit yet, this attitude of sacrificing the present for an imagined future can materially reduce the likelihood of that future coming to pass. Of course, the opposite is sometimes true as well. Companies have, in fact, failed because they found product market fit but took so many shortcuts they couldn't adapt and grow into their success. But the point is that determining how much to invest in future-proofing is a complex and nuanced problem not amenable to sweeping generalizations.

Now, this clearly isn't such an extreme case, but frankly Sonyflake (the original, not the Rust implementation) seems to be operationally simpler than Snowflake while offering perfectly reasonable tradeoffs. The Rust implementation might prove entirely useful to any number of organizations _based on their needs_. If they have a Rust codebase with a single process per machine this could easily be a simpler and more robust option than Sonyflake.

The kind of arrogant dismissiveness based on one's own personal (and highly specialized) experience that's shown in the downvoted comment tends to leave a bad taste in peoples' mouths. Thus the downvoting.

secondcoming · on Dec 31, 2020

> And, either way, the cost of implementing extreme scalability can inflict its own harm. Highly scalable systems usually come with more operational complexity, and steeper learning curves

I think you're overestimating the cost factor of future-proofing an architecture for extreme scale (and 26K/sec/server isn't actually that 'extreme'). And instead of downvoting people who've walked that walk, perhaps people may realise this by engaging instead.

Also, I didn't read any 'arrogant dismissiveness' in the post. Each to their own! Happy New Year!

jfrisby · on Jan 1, 2021

As I touched on, the cost can vary a lot depending on the particulars of the situation.

For example, one company I came into was using Redshift and fighting a lot of problems with it. The problems stemmed from a combination of not knowing how to use it effectively (e.g. batching writes), it being the wrong tool for the job (they were using it for a combination of OLAP and OLTP(!!) workloads), and so forth. The long and short is that for both workloads, a basic RDS Postgres instance -- or a pair, one for OLAP, one for OLTP -- would've been considerably cheaper (they'd had to scale up a couple notches from the minimum cluster size/type because of performance), avoided correctness problems (e.g. UNIQUE constraints actually doing the thing), been several orders of magnitude more performant at the scale they were at, etc. They simply didn't understand the tool and tried to apply it anyway. They basically thought Redshift was like any RDBMS, but faster "because it stores data in a columnar format."

Had they understood it, designing a data pipeline that could make use of such a tool would have required considerably more engineering person-hours than the pipeline they actually built.

Obviously this is a terribly extreme example, but the learning curve for tools -- including the time/effort needed to discover unknown unknowns -- is a cost that must be factored in.

And, even if your org has the expertise and experience already, more-scalable solutions often have a lot more moving parts than simpler solutions. Another organization I was at wanted to set up a data pipeline. They decided to go with Kafka (self-managed because of quirks/limitations of AWS' offering at the time), and Avro. After 2 months that _team_ (4 sr + 1 jr engineers, IIRC) had accomplished depressingly little. Both in terms of functionality, and performance. Even considering only the _devops_ workload of Terraforming the setup and management of the various pieces (Avro server, Zookeeper cluster, Kafka cluster, IAM objects to control access between things...), it was a vastly more complicated project than the pipeline it was meant to replace (SQS-based). Yes, that's a bit of an apples-to-oranges comparison but the project's goal was to replace SQS with Kafka for both future throughput needs and desired capabilities (replaying old data into other targets / playing incoming data into multiple targets without coupling the receiving code).

By the time I left, that project had: 1. Not shipped. 2. Was not feature-complete. 3. Was still experiencing correctness issues. 4. Had Terraform code the CTO considered to be of unacceptably poor quality. 4. Monitoring and observability was, let's say, a "work in progress." Some of that is for sure the Second System Effect, but it is not at all clear to me that they would have been better off if they'd gone with Kafka from day 1.

Given that we could pretty easily extract another 2 orders of magnitude throughput out of SQS, there's a real discussion to be had about whether or not a better approach might've been to make a Go consumer that consumed data more efficiency, and shunted data to multiple destinations -- including S3 to allow for replay. That would've been a 1-2 week project for a single engineer. Kafka is 100% the right tool for the job _beyond a certain scale_ (both of throughput and DAG complexity), but the company was something like 4 years in when I got there, and had been using SQS productively for quite some time.

And no, 26k/sec/server isn't especially huge. I was referring to the fact that the downvoted commenter was making sweeping generalizations. Sweeping generalizations tend to shut discussion down, not prompt more nuanced discussion. Other threads on this post have seen very interesting and productive discussions emerge, but note that the downvoted commenter's post hasn't really drawn anything other than people being sucked into the very discussion we're having now. It's counter-productive.

lmilcin · on Dec 31, 2020

Hey, it is a Rust library. So probably not many web devs are going to use it.

Which is even more reason to point out limitations like that because if you choose Rust for your project it means very likely you have some important performance constraints. And that is either linked with a small system with low resources (in which case it is ok) or a large system with huge traffic (in which case it is not).

staticassertion · on Dec 31, 2020

Yeah I get pretty bored constantly reading "You don't need X". Some of us do. You don't have to be Google to deal with a lot of data, and to want that to be fast. Lots of domains deal with high volumes of requests and deal with massive amounts of volume.

dljsjr · on Dec 31, 2020

It was designed by Twitter…

lmilcin · on Dec 31, 2020

Which is a proof of what?

dljsjr · on Dec 31, 2020

I mean if Twitter isn't "high traffic" then I'm not sure what is.

jfrisby · on Dec 31, 2020

I think you're crossing the streams a bit here.

Twitter designed _Snow_flake. _Sony_flake is a reimplementation that changes the allocation of bits, and the author acknowledges that the maximum ID generation throughput ceiling is lower than that of Snowflake.

Snowflake uses a millisecond-precision timestamp, and had a 12-bit sequence number. So 4,096,000 IDs per node-second. At that point, the bottleneck won't be the format of the IDs but the performance of the code, and IPC mechanism. Which, in this case, is non-trivial since Snowflake uses a socket-based approach to communication. Lower-overhead, native IPC mechanisms are certainly possible via JNI but would probably take some doing to implement. For Sonyflake, I don't imagine the socket overhead is all that much of an issue given the low throughput it's capable of with its bit allocations.

Were I to design something like this again[1], I might start with something like Sonyflake (the self-assignment of host ID w/out needing to coordinate via Zookeeper is nice), shave a couple bits from the top of the timestamp, maybe a couple from the top of the host ID, and pack the remainder at the top, leaving a few zero bits at the bottom. That would essentially mean the value returned was the start of a _range_, and anything needing to generate IDs in large quantities can keep its own in-process counter. Only one API call per N IDs generated by a given thread/process. And, of course, unix-domain socket or other lower-overhead approach to communication for when the API calls are needed.

[1] - A decade prior to Snowflake's release, I wound up taking a very similar approach at one of my first startups, albeit much more crude/inelegant and without nice properties like k-ordering.

lmilcin · on Dec 31, 2020

I can imagine Twitter has a lot of services that don't see a lot of traffic and a lot of people who don't have experience with high traffic services.

Take your calculator and do calculations along with me:

10ms = 1/100th of a second.

8 bits = 256 possible values.

For a given node you have at most 256 possible IDs every 1/100th of a second or 25600 every second.

This is absolute limitation which means, you can't just straight use the library without some workarounds to get more traffic through a node. You could of course do things like assign more hardware addresses to each node (maybe even messages on one and odd on another hw address) but then you soon find out the workaround is more complex than just writing it from scratch.

dljsjr · on Dec 31, 2020

Well sure but that's not what this UID scheme was designed to solve. It's an engineering trade-off. It doesn't mean it's bad it just mean it doesn't work for incredibly high throughput use-cases where things scale vertically instead of horizontally. That's fine.

Dowwie · on Dec 31, 2020

Before starting this, did you look at the flaken crate? If so, any concerns or notable differences? https://github.com/bfrog/flaken

Although it hasn't been maintained in 4-5 years, it's not the kind of library that requires any maintenance following creation of a stable generator. Maybe someone could bench it.

abahlo · on Dec 31, 2020

Needed this functionality in a Rust application I'm writing so I ported the Go code to Rust.

bojanz · on Dec 31, 2020

What are the benefits of Snowflake compared to ULID, which already has Rust implementations[1][2]?

[1] https://crates.io/crates/ulid

[2] https://crates.io/crates/rusty_ulid

abahlo · on Dec 31, 2020

Sonyflakes are generally more useful in massive scale environments as you won't run into conflicts due to the "machine-id" part (which defaults to the lower 16 bits of the private IP address).

There are more differences (128bit vs 64bit, rng vs time-based), I think they have different use cases.

eutectic · on Dec 31, 2020

You shouldn't ever have a collision with 128 bits of entropy.

j-pb · on Dec 31, 2020

More structure actually increases the chance for collisions. Because at that scale the chance of bit errors is dominating the equation. More rng protects against that.

ComputerGuru · on Dec 31, 2020

Ulid doesn’t have a “machine id” or “node id” component. I think you are mixing it up with one of the uuid variants.

anomaloustho · on Dec 31, 2020

Not sure if this is also a limitation of Snowflake. But ULIDs are only unique down to a certain timescale. (milliseconds I believe)

sylvain_kerkour · on Dec 31, 2020

Not exactly, They can be sorted, by default, only down to a millisecond, but you can use a monotonic generator to have them sorted, even if more than one Ulid is generated within a millisecond.

Other than that, they have 80 bits of randomness, enough to be unique even if millions are generated per second.

https://github.com/ulid/spec

echelon · on Dec 31, 2020

Nice work!

I'm curious, if you don't mind my asking. What are you building that requires that amount of scale? These are heavy hitting IDs (but of course you know that).

abahlo · on Dec 31, 2020

Thanks and I don't mind at all!

To be honest for the application I didn't want to expose a database serial (otherwise you'd know you're user #100, for example) or use UUIDs, so it's less about scale and more about obscuring ids. The library is well suited for huge scale scenarios nonetheless.

LaundroMat · on Dec 31, 2020

Why did you not want to use UUID's?

abahlo · on Dec 31, 2020

I already use UUIDs for various fields and didn't want the id to be confused with other fields. More of a clarity/style decision than technical.

tinus_hn · on Dec 31, 2020

Unfortunately this means you get to make the same mistake as originally made with UUID: the static machine ID is a privacy leak.

abahlo · on Dec 31, 2020

Is it? Not sure private ips are so sensitive, esp. the lower parts.

tinus_hn · on Dec 31, 2020

If it’s unique it’s a privacy leak. I’m not going to debate that, it’s common knowledge.

UUID was changed in the 90s to be a hash of this instead and later to just be a completely random number because there are so many bits the likelihood of a single duplicate being generated before the sun has swollen enough to consume the earth is slim so you don’t actually need these schemes to provide a unique number.

social_quotient · on Dec 31, 2020

Maybe they need the distribution part and not the scale. But I’m curious now too.

random5634 · on Dec 31, 2020

How do you use 64 bits and only get scale to 180 years? For that bit count I’d expect 1000+

abahlo · on Dec 31, 2020

A Sonyflake ID is composed of

  39 bits for time in units of 10 msec
  8 bits for a sequence number
  16 bits for a machine id

secondcoming · on Dec 31, 2020

Does this mean that using them in a multi-threaded context may produce duplicate ids?

Edit: To answer my own question. No, although it does seem to limit your application to a single process with at most 256 threads, if I'm reading it correctly.

> Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper (though that’s overridable via a config file). [0]

[0] https://blog.twitter.com/engineering/en_us/a/2010/announcing...

random5634 · on Dec 31, 2020

This proves my point. 39 bits for time in 10 msec is something like 17x years.

ULID runs through 10889 AD

olalonde · on Dec 31, 2020

Why do the bits add up to 63 rather than 64?

sjansen · on Dec 31, 2020

Just a guess without reading the code, the unused bit is probably to avoid complications from signed integers. If the sign bit is never set, signed and unsigned integers can store the same data.

lmilcin · on Dec 31, 2020

Presumably to make them work with languages that don't recognize unsigned integers? For example Java...