I think keystrokes was just a metaphor for whatever operation the application ha...

josephg · on May 11, 2023

No, the GP is right. Text CRDTs (including mine) literally generate an operation per inserted character in a collaborative text document. Diamond types solves the data explosion problem by not storing the operations separately. Instead, we internally run-length encoding everything.

If you have these two operations:

    Insert "h" at position 5, ID (seph, 2)
    Insert "i" at position 6, ID (seph, 3)

... They get immediately merged internally to become:

    Insert "hi" at position 5-6, ID (seph, 2-3)

This reduces the number of items we need to process by an order of magnitude. In practice, the metadata (positions, IDs, etc) ends up taking up about 0.1 bytes on disk per inserted character, which is awesome.

This works great for (agent, seq) style IDs because the IDs can also be run-length encoded. But we can't run-length encode hashes. And I don't want my file size to grow by 32x because we store a hash after every keystroke.

(I didn't invent this idea. Martin Kleppman first used it for storing CRDTs on disk, and Kevin Jahns made in-memory structures use the same tricks in yjs)

vivegi · on May 10, 2023

I was responding to the diamond-types CRDT author's question in the parent comment. Their github project page [1] mentions text editing a lot:

> This repository contains a high performance rust CRDT for text editing. This is a special data type which supports concurrent editing of lists or strings (text documents) by multiple users in a P2P network without needing a centralized server.

> This version of diamond types only supports plain text editing. Work is underway to add support for other JSON-style data types. See the more_types branch for details.

In any case, I agree with the metaphor and batching granular operations can always be done.

[1]: https://github.com/josephg/diamond-types