More

vadman97 · on Oct 3, 2024

If there was only one alert criteria, that'd be simple. Our alerts can be configured for any data filters (eg. only matching logs with column `level='error'`); we would have to create a unique MV for each alerts' filter condition.

atombender · on Oct 4, 2024

You could have an alert ID be part of the MV primary key?

A MV is really more like a trigger, which translates an insert into table A into an insert in table B, evaluating, filtering, and grouping each batch of A inserts to determine what rows to insert into B. Those inserts can be grouped by an alert ID in order to segregate the state columns by alert. To me this sounds like exactly what you're doing using manual inserts?

That said, I while MVs are super powerful and convenient, they're a convenience more than a core function. If you have an ingest flow expressed as Go code (as opposed to, say, Vector or Kafka Connect), then you're basically just "lifting" the convenience of MVs into Go code. You don't get the benefit of MV's ability to efficiently evaluate against a batch of inserts (which gives you access to joins and dictionaries and so on), but it's functionally very similar.

vadman97 · on Oct 3, 2024

We insert from our alerts worker because we want the aggregation to happen per alert (with the aggregated data filtered by the particular alert definition). As each alert is evaluated, we run the following [1] INSERT INTO ... SELECT ... statement based on the alert definition. We can't aggregate with an MV since we'd need to create an MV per unique alert that a customer may set up.

[1]: https://github.com/highlight/highlight/blob/c526daea31fdf764...

vadman97 · on May 15, 2024

Once we hit >100k inserts per second, async inserts didn't work well for us because we had limited control over the background async insert batching happening on the cluster. The background inserts would be too small, resulting in many merges running, causing high CPU, causing high back-pressure (latency) on async inserts which would just result in an ingestion delay.

Plus, async inserts are only available on ClickHouse Cloud.

zX41ZdbW · on May 15, 2024

Asynchronous INSERTs are available in the open-source ClickHouse version and in ClickHouse Cloud. The feature is independent of the Cloud.

To enable asynchronous INSERTs, set `async_insert` to true.

Here is the documentation: https://clickhouse.com/docs/en/optimize/asynchronous-inserts ; intro: https://clickhouse.com/blog/asynchronous-data-inserts-in-cli... and a hands-on guide: https://clickhouse.com/blog/monitoring-asynchronous-data-ins...

dilyevsky · on May 15, 2024

Like sibling comment is saying - async inserts are part of the oss version. Batch flushes are tunable in two dimensions - size and latency. I think it’s likely the default size was just too low for your usecase

vadman97 · on May 15, 2024

Forget the exact details, but we ran into issues with tuning the two setting you're describing on the Cloud instance (these settings were user level settings that seemed to get reset by ClickHouse Cloud because they were updated based on the cluster size). Perhaps this could change with Cloud and isn't an issue with the OSS version.

vadman97 · on May 15, 2024

We do via the OpenTelemetry Elixir SDK (and the Phoenix extension) https://opentelemetry.io/docs/languages/erlang/

vadman97 · on May 15, 2024

Materialized views can help change the ORDER BY with 0 downtime:

* Create a new version of the table with the new ORDER BY.

* Create a materialized view that will insert from the old table to the new table.

* Update your application code to query the new table.

* Start inserting data into the new table.

vadman97 · on May 15, 2024

Exploring bloom filter index merges would be an interesting addition. I do wish it were easier to profile merge performance to break down where most of the CPU time is being spent.

vadman97 · on May 15, 2024

3 nodes of 120 GiB and 30 vCPU each.

vadman97 · on May 15, 2024

I see what you did there.

vadman97 · on Feb 3, 2024

Our browser client would help with tracing what network requests are being made by said apps since it would capture all network requests. However, Shopify may restrict what our browser client can do in your frontend store (I'm guessing that the 3rd party apps are added to a shop as iframes or are otherwise sandboxed).

nozzlegear · on Feb 3, 2024

> I'm guessing that the 3rd party apps are added to a shop as iframes or are otherwise sandboxed

I build Shopify apps, it depends on what kind of app it is, but if it’s on the storefront then it’s generally just injected through an async javascript script tag. They don’t do any iframes or sandboxing on the storefront, although they’re definitely moving toward more privacy-conscious APIs.

vadman97 · on Oct 20, 2023

Hey HN, I'm Vadim, the CTO at highlight.io. We offer session replay to engineering teams and have had a lot of questions around client-side performance.

We just published a benchmark along with a short blog post covering the results. If you'd like to try it yourself, check out https://github.com/highlight/session-replay-performance-benc...