Range Partitioning: Zero to One

submerge · on March 27, 2024

This is great. Where can I learn more like this?

I am interested in distributed systems and database internals (both traditional and new databases) but find that many database resources tend to be either introductory SQL queries or related to tuning.

loganmhb · on March 27, 2024

Martin Kleppmann's book Designing Data-Intensive Applications is a great starting point if you're not familiar with it.

dangoodmanUT · on March 27, 2024

I personally like to find new distributed systems, and then learn what techniques they use.

For example learning how serf.io ises Vivaldi, how CockroachDB uses raft multi-group, or why FoundationDB has different processes and they each do.

I try to write interesting stuff on distributed systems, but there's a great discord created by eaton phil on software internals that has a lot of great discussions https://twitter.com/eatonphil

dangoodmanUT · on March 27, 2024

oh also, https://lobste.rs and filter by the `distributed` tag

ayende · on March 27, 2024

There is a simple trick to manage hash based partition growth

Don't assign small number of tokens, but a much larger one

If you have a million tokens, there is no need to re-shard, and you can re assign tokens to different nodes as your cluster grows

dangoodmanUT · on March 27, 2024

There is overhead to a token. If that was the "simple trick", then why don't hash based systems like Cassandra, Scylla, Temporal, etc do that by default?

Only somewhat effective if you start at massive scale tbh, and it still doesnt' solve hot partitions because it can be hot because of a single tenant (e.g. company ID) that can't be split across hash tokens, but can be across ranges

meisel · on March 27, 2024

Why can’t hash based partitioning systems just store the full hash with the key for fast rehashing if the number of buckets needs to change, or else recompute the hash?

dangoodmanUT · on March 27, 2024

A few main reasons come to mind:

1. The hash would be an extra column that can be calculated from existing data, wasted storage

2. You effectively have to rewrite the entire database to itself to redistribute, and keeping the DB availabile during this process is _very_ complicated

3. You're putting an extreme load on the DB for a substantial amount of time. This takes away from your DB performance and makes node downtime even more severe

In a distributed DB, you have to remember that the probability your node _doesn't_ have the data you need increases with the size of the cluster, which creates a negative feedback loop for having to rewrite.