Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh hey, excited to see Typesense on the front page! Thank you for sharing OP.

Some quick context: we are a small bootstrapped team that's been working on Typesense since 2015. It started out as a nights-and-weekends project, out of personal frustration with ElasticSearch's complexity for doing seemingly simple things. So we set out (maybe naively at the time), to see what it would take to build our own search engine, just to scratch our intellectual curiosity. Over the years, we've realized that it takes a LOT of nuanced effort to build a search engine that works well out of the box.

Our goal with Typesense is to democratize search technology on two fronts:

1. Simplify and reduce the amount of developer effort it takes to build a good search experience that works well out of the box. To this end, we pore over API design to make it intuitive and set sane defaults for all parameters.

2. Make good instant-search technology accessible to individuals and teams of all sizes. To this end, we decided to open source our work and make it completely free to self-host. We also optimize for reducing the operational overhead it takes to deploy Typesense to production (eg: single binary with no runtime dependencies, one-step clustering, etc).

In 2020, I left my full-time job and my co-founder left his full-time job a month ago, and we're now both working full-time on Typesense.

Happy to answer any questions!



Do you have a document that explains the architecture of the product? I searched a bit on your github and website but didn't find anything. Apologies in advance if I've missed something very obvious :-).


We don't have an architecture document at the moment, but here's a high-level summary from @karterk's comment from another thread:

At the heart of Typesense is a `token => documents` inverted index backed by an Adapative Radix Tree (https://db.in.tum.de/~leis/papers/ART.pdf), which is a memory-efficient implementation of the Trie data structure. ART allows us to do fast fuzzy searches on a query.

All indices are stored in-memory, while the documents are stored on disk on RocksDB. All underlying data structures were carefully designed, benchmarked and optimized to exploit cache locality and utilize all cores efficiently.


Hi,

do you have any metric regarding the memory usage of your ART implementation ?

I tried to implement one for the database I'm currently working on, however I feel that I am using way too much memory.

Basically, with my current implementation a dictionary containing about distinct 2857086 words would require 341MB.


I did benchmark extensively 4-5 years ago, but I don't have those numbers with me. Tries are quite expensive memory-wise by design, but I found that ART gave the best balance between speed (by exploiting cache locality) and memory. State of art might have improved by now.

As far as Typesense goes though, I found that the actual posting lists, document listings, and other faceting/sorting related indexing data structures is where the bigger overhead is, especially for larger datasets.


Thanks for the feedback, my issue is that I allocate only a few MB to my indexing thread so I'm looking for a more efficient implementation to avoid having to produce then merge too many segments from disk.

I'm currently considering using compressed pointers on some part of the tree to reduce the memory footprint as much as I can. Let's see how it goes...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: