RethinkDB was good from a distributed systems perspective, but a nightmare to ma...

cies · on Sept 6, 2017

Riak builds on theoretical foundation, laid out in Amazon's Dynamo paper. The other NoSQLs did not have this theoretical underpinning, and so they just offer a "best effort".

_asummers · on Sept 6, 2017

Sure, but that said, no other DB Jepsen tested til that point necessitated the kind of gymnastics he had to do to get it to fail. It's pretty solid CS, and it's a shame the project had the end of life that it did.

[0] https://aphyr.com/posts/330-jepsen-rethinkdb-2-2-3-reconfigu...

jchrisa · on Sept 6, 2017

Dynamo is not exactly a performant or efficient model. It's the equivalent of pulling all the distributed systems guts out and handing them to the user to deal with. And the resulting toll is quantifiable: http://damienkatz.net/2013/05/dynamo_sure_works_hard.html

kstrauser · on Sept 6, 2017

Damien's a very smart guy, but I don't think I agree with him here:

> Within a datacenter, the Mean Time To Failure (MTTF) for a network switch is one to two orders of magnitude higher than servers, depending on the quality of the switch.

A switch is highly unlikely to fail. They seem to be bulletproof. But having worked with a datacenter (on the engineering team of an early AWS competitor), switch _misconfiguration_ was all too common. Maybe a tech accidentally plugs in the wrong ethernet cable and forms a switching loop. Maybe someone fat-fingers a tag and a broken VLAN gets automatically deployed to 10,000 nodes. Either way, the _switch_ is alive, well, and pushing packets - but they're the _wrong_ packets and the result is indistinguishable from hardware failure to the end user.

At datacenter scales, these things happen... not infrequently. If you engineer your database to expect that netsplits are rare, you're going to have a bad time.

kyledrake · on Sept 6, 2017

VLANs were the bane of my existence when I had to figure out how to deal with them. I don't envy anyone whose job is to manage them on switches for a lot of servers.

elcritch · on Sept 6, 2017

Good points. Weren't the last couple of AWS outages partly due to misconfigured network configs? Depending on your problem the replicated reads makes sense given those kinda of outages. Though I'm new-ish to riak's "core" design, when you get the APL with the servers, isn't it feasible to create a design similar to what Damien's proposing using a preferred master for a given vnode?

lmm · on Sept 6, 2017

Cassandra was based on the same thing, no? In my experience Cassandra has been what's beaten Riak.

menacingly · on Sept 6, 2017

If I remember correctly, Cassandra is actually an ideological frankenstein from pieces of BigTable & Dynamo

EDIT: I don't mean to disparage it, just that it doesn't come as purely from one direction as Riak. It certainly appears to have won

gabetax · on Sept 6, 2017

I prefer to think of it as the mullet of the database world: Bigtable in the front, Dynamo in the back.

jjirsa · on Sept 7, 2017

Vice versa, if network is front and disk is back (as it is in the code).

koolba · on Sept 6, 2017

Are those times for real? You could snapshot the underlying disks and copy them byte for byte orders of magnitude faster than that.

_asummers · on Sept 6, 2017

The problem was replication across the cluster and getting them to coordinate their values. We eventually did what you suggested in our dev environments so that we didn't lose our sanity. .