Oh hey, excited to see Typesense on the front page! Thank you for sharing OP. So...

hiyer · on May 14, 2021

Do you have a document that explains the architecture of the product? I searched a bit on your github and website but didn't find anything. Apologies in advance if I've missed something very obvious :-).

jabo · on May 14, 2021

We don't have an architecture document at the moment, but here's a high-level summary from @karterk's comment from another thread:

At the heart of Typesense is a `token => documents` inverted index backed by an Adapative Radix Tree (https://db.in.tum.de/~leis/papers/ART.pdf), which is a memory-efficient implementation of the Trie data structure. ART allows us to do fast fuzzy searches on a query.

All indices are stored in-memory, while the documents are stored on disk on RocksDB. All underlying data structures were carefully designed, benchmarked and optimized to exploit cache locality and utilize all cores efficiently.

mkcg · on May 14, 2021

Hi,

do you have any metric regarding the memory usage of your ART implementation ?

I tried to implement one for the database I'm currently working on, however I feel that I am using way too much memory.

Basically, with my current implementation a dictionary containing about distinct 2857086 words would require 341MB.

karterk · on May 15, 2021

I did benchmark extensively 4-5 years ago, but I don't have those numbers with me. Tries are quite expensive memory-wise by design, but I found that ART gave the best balance between speed (by exploiting cache locality) and memory. State of art might have improved by now.

As far as Typesense goes though, I found that the actual posting lists, document listings, and other faceting/sorting related indexing data structures is where the bigger overhead is, especially for larger datasets.

mkcg · on May 15, 2021

Thanks for the feedback, my issue is that I allocate only a few MB to my indexing thread so I'm looking for a more efficient implementation to avoid having to produce then merge too many segments from disk.

I'm currently considering using compressed pointers on some part of the tree to reduce the memory footprint as much as I can. Let's see how it goes...