It looks like Kafka is by far and away the way to handle persistent logs/events ...

RhodesianHunter · on Nov 16, 2021

Pulsar is a much better fit when your architecture absolutely requires many queues ex: you need one queue per customer across 100's of thousands of customers.

This architecture certainly exists, but is a lot more burdensome and less frequent than partitioning by customer id across a Kafka topic.

abraxas · on Nov 16, 2021

Kafka is a wonderful tool. I built a few systems on top of it and all of them delivered the scale that was promised and more. With surprisingly little hardware.

I'm very hostile to a lot of hipster tech but Kafka is one of the few genuinely good pieces of tech from the whole "Big Data" craze of the past decade.

bskrobisz · on Nov 16, 2021

It seems weird to hear "a company here in Japan called LINE" -- LINE is big enough in Japan that it sounds kind of equivalent to "a company here in America called Discord".

hardwaresofton · on Nov 16, 2021

I think that works one way but not the other -- America has the blessing of being the source of lots of new apps and tech companies with global success/ambitions (i.e. Discord has some penetration for gamers anywhere), but Japan is less so.

AFAIK LINE has not had such success. I wouldn't be surprised if most people in the US did not know of LINE, unless they were avid readers of TechCrunch or something it just doesn't come up.

Would be interesting to know what % of people on HN know about LINE though

saagarjha · on Nov 16, 2021

I’m pretty sure LINE has more than twice the users that Twitter has. Not knowing it is like not knowing about WeChat: it’s because you’re not familiar with things outside of the US, rather than not being up-to-date with the space in general.

zild3d · on Nov 16, 2021

Never heard of LINE, but looks like they're around 85M monthly active users, twitter somewhere past 330M

https://www.statista.com/statistics/560545/number-of-monthly...

fomine3 · on Nov 17, 2021

The statistics only count users in Japan. It is also used on other asia countries.

kortex · on Nov 16, 2021

How well does Kafka handle high density data (e.g. A/V and images)? I'm scouting out systems for our computer vision pipeline and Kafka would simplify the aggregation/collimation step for marshalling to GPUs, and it would be simplest if I can just send raw frames vs some alternate transport.

hardwaresofton · on Nov 16, 2021

I think the important thing there would be the frame size no? Clearly Kafka can handle the throughput side of things but it doesn't seem to be meant for large messages out of the box[0].

I wouldn't be surprised if it was perfectly fine though -- with compression (and all the video/image specific tricks) the file sizes should get pretty small...

[0]: https://stackoverflow.com/questions/21020347/how-can-i-send-...

kortex · on Nov 16, 2021

Thanks. That's kinda what I figured, but wanted to sounding board it out a bit as a sanity check.

The link is a great reference by the way.

> Your API should use cloud storage (for example, AWS S3) and simply push a reference to S3 to Kafka or any other message broker.

This is more or less what I figured. We already archive to S3 anyways so switching to using it as transport would be straightforward.

hardwaresofton · on Nov 16, 2021

> Thanks. That's kinda what I figured, but wanted to sounding board it out a bit as a sanity check.

I'm by no means a Kafka expert or a video expert of course, but glad I could serve as a rubber duck. Maybe there's some lessons to be learned from Encore?[0]

> The link is a great reference by the way.

Yeah the amount of info in there is pretty good -- feels like Kafka could definitely be tuned to do the job but maybe it's better to just start with something better attuned.

> This is more or less what I figured. We already archive to S3 anyways so switching to using it as transport would be straightforward.

Yeah I figured this is what you were trying to avoid -- the round trips to S3 to get the data to the processing would be wasteful if the data is in this case small enough to flow along the processing route. Guess it really depends on your data. I could have sworn I saw some analysis of how kafka performs versus the size of messages it must deliver...

Looks like DZone has some good content[1], LinkedIn of course[2]... Ah I finally found the one I was looking for and it's DZone[3]. All those links make mention of message size

[0]: https://svt.github.io/encore-doc/

[1]: https://dzone.com/articles/processing-large-messages-with-ap...

[2]: https://engineering.linkedin.com/kafka/benchmarking-apache-k...

[3]: https://dzone.com/articles/benchmarking-nats-streaming-and-a...

pram · on Nov 16, 2021

It can, but this depends on the volume and size of the topic messages. The broker and consumer will need a LOT more memory. I did this at a previous job and the GC on the broker started getting very shitty and performance was crap. Consumers were constantly getting OOM and needed bigger containers, etc. It was a bad idea and we just moved the stuff to S3.

wdb · on Nov 17, 2021

Are they any alternative that aren't using the Java runtime?

hardwaresofton · on Nov 18, 2021

NATS is built with golang which is one of the reasons I like it…

https://github.com/nats-io/nats-server