It could be easy to operate when everything is fine but what's about incidents?
If I understand correctly, there is a metadata database (BTW, is it multi-AZ as well?). But what if there is a data loss incident and some metadata was lost? Is it possible to recover from S3? If this is possible, then I guess that can't be very simple and should require a lot of time because S3 is not that easy to scan to find all the artefacts needed for recovery.
Also, this metadata database looks like a bottleneck. All writes and reads should go through it so it could be a point of failure. It's probably distributed and in this case it has its own complex failure modes and it has to be operated somehow.
Also, putting things from different partitions into one object is also something I'm not very keen about. You're introducing a lot of read amplification and S3 bills for egress. So if the object/file has data from 10 partitions and I only need 1, I'm paying for 10x more egress than I need to. The doc mentions fanout reads from multiple agents to satisfy a fetch request. I guess this is the price to pay for this. This is also affects the metadata database. If every object stores data from one partition the metadata can be easily partitioned. But if the object could have data from many partitions it's probably difficult to partition. One reason why Kafka/Redpanda/Pulsar scale very well is that the data and metadata can be easily partitioned and these systems do not have to handle as much metadata as I think WarpStream have to.
I'm not going to respond to your comment directly (we've already solved all the problems you've mentioned), but I thought I should mention for the sake of the other readers of this thread that you work for Redpanda which is a competitor of ours and didn't disclose that fact. Not a great look.
I'm not asking anything on behalf of any company and just genuinely curious (and I don't think that we're competitors, both systems are designed for totally different niches). I'm working on tiered-storage implementation btw. Looks like the approach here is the total opposite of what everyone else is doing. I see some advantages but also disadvantages to this. Hence the question.
Also, this metadata database looks like a bottleneck. All writes and reads should go through it so it could be a point of failure. It's probably distributed and in this case it has its own complex failure modes and it has to be operated somehow.
Also, putting things from different partitions into one object is also something I'm not very keen about. You're introducing a lot of read amplification and S3 bills for egress. So if the object/file has data from 10 partitions and I only need 1, I'm paying for 10x more egress than I need to. The doc mentions fanout reads from multiple agents to satisfy a fetch request. I guess this is the price to pay for this. This is also affects the metadata database. If every object stores data from one partition the metadata can be easily partitioned. But if the object could have data from many partitions it's probably difficult to partition. One reason why Kafka/Redpanda/Pulsar scale very well is that the data and metadata can be easily partitioned and these systems do not have to handle as much metadata as I think WarpStream have to.