More

mkcg · on April 2, 2022

I don't know about CUDA, however I've made an AVX2 based encoder a few months ago : https://github.com/phoboslab/qoi/pull/143

I intend to create an AVX2 based decoder but I had absolutely no time to work on side projects in the past three months.

You might also want to take a look at this streaming encoder if you want to encode large files with a tiny memory footprint : https://github.com/MKCG/php-qoi/blob/main/src/FFI/lib/qoi.c

mkcg · on July 17, 2021

goto statements are also very useful to parse, compress and uncompress data in the critical loop.

mkcg · on June 20, 2021

You could also prefix it with a varint : in this case it would not require any extra byte compared to the NUL terminated string when the length of the string is less than 128 chars.

mkcg · on May 14, 2021

Hi,

do you have any metric regarding the memory usage of your ART implementation ?

I tried to implement one for the database I'm currently working on, however I feel that I am using way too much memory.

Basically, with my current implementation a dictionary containing about distinct 2857086 words would require 341MB.

karterk · on May 15, 2021

I did benchmark extensively 4-5 years ago, but I don't have those numbers with me. Tries are quite expensive memory-wise by design, but I found that ART gave the best balance between speed (by exploiting cache locality) and memory. State of art might have improved by now.

As far as Typesense goes though, I found that the actual posting lists, document listings, and other faceting/sorting related indexing data structures is where the bigger overhead is, especially for larger datasets.

mkcg · on May 15, 2021

Thanks for the feedback, my issue is that I allocate only a few MB to my indexing thread so I'm looking for a more efficient implementation to avoid having to produce then merge too many segments from disk.

I'm currently considering using compressed pointers on some part of the tree to reduce the memory footprint as much as I can. Let's see how it goes...

mkcg · on July 3, 2020

What is your stack ?

I made a simple Proof-of-concept of a PHP backend based on Redisearch compatible with the Instantsearch.js library.

You can take a look at it here : https://github.com/MKCG/algolia-on-redisearch/tree/master/ap...

With the Covid crisis we were no longer able to pay 3K per month for a basic search engine and now we are using a solution based on this on production (> 30M search/month).

mkcg · on June 28, 2020

You missed the "streaming" part.

Being able to allocate only a few KB to ingest any JSON is a killer feature.

There is a valid reason to reinvent the wheel : in my case I had to do something similar in C99 for a SaaS federated search engine to have the lowest memory footprint possible.

mkcg · on March 21, 2020

I've been working on a lite PHP library acting as an agnostic query engine to aggregate content from different (files, API, SQL, document oriented, ...) kind of datastore and ease the usage of some aggregations : terms, facets, ranges, min, max, average, ...

https://packagist.org/packages/mkcg/php-query-model

And also working on a search engine (in C).

mkcg · on March 2, 2020

http://travaux.ovh.net/?do=details&id=43213& ?

mkcg · on Feb 18, 2019

Hi,

last year I made a similar project at work to be able to inject values coming from our database into our Gherkin features.

I thought that it may help someone else to build more robust test so I decided to simplify it and to opensource it as I am not allowed to share the one that I made at work.

Feel free to contribute to it,

Kévin.

mkcg · on Jan 22, 2019

Hi,

I'm glad to share "Metis" which is a fulltext search engine written in ECMAScript 6 with facet capabilities. Results are not yet sorted using a TF/IDF algorithm, however sort callbacks can be provided.

This is still a WIP project and the engine API will change in the next weeks. The current revision includes a simple fulltext search with results count and facets.

It follows semantic versioning and the 1.0.0 should be released in two months.

If you want to use it, be sure to specify the current tag (0.3), as the overall project structure will changed as explained before. For example, by the revision 1.0.0, a "Query" object will probably be introduced and will be notified when the response is returned by a JS worker running on another thread.

A working example is accessible and generates 1000 documents before indexing them.

Any critics are welcomed.