>Pick a a stable, guaranteed-to-exist, shard key (composite or atomic properties) and use that.
This is a pretty risky approach since it's almost certainly the case that you won't end up evenly distributing your data across shards using this method.
> This is a pretty risky approach since it's almost certainly the case that you won't end up evenly distributing your data across shards using this method.
Distributing data across shards is a function of the properties selected to use for partitioning. So I do not understand how "a stable, guaranteed-to-exist, shard key (composite or atomic properties)" is "a pretty risky approach."
While high volume multi-tenant "customers/users/accounts" systems are common, they are not the only ones which benefit from sharded persistent stores.
For example, consider a system which monitors farm equipment for Caterpillar and John Deere. Lets say each company has 100k devices which send one message per day to the system.
While it is easy to envision sharding device messages based on "DeviceId / Company" in this hypothetical system, there would be no value sharding the two customers.
you're right, uneven shards are an inevitable outcome of this approach
but shard "even-ness" is in direct tension with the concern of the GP, which is execution atomicity
frequently, it's better to have uneven shards (that you can e.g. scale independently when necessary) that give you atomic execution, than even shards that require distributed transactions
This is a pretty risky approach since it's almost certainly the case that you won't end up evenly distributing your data across shards using this method.