> It seems like they are building a complex distributed system to solve a problem that is better solved by tar -cvf
That doesn't work on Parquet or anything compressed. In real-time analytics you want to load small files quickly into a central location where they can be both queried and compacted (different workloads) at the same time. This is hard to do in existing table formats like Iceberg. Granted not everyone shares this requirement but it's increasingly important for a wide range of use cases like log management.
That doesn't work on Parquet or anything compressed. In real-time analytics you want to load small files quickly into a central location where they can be both queried and compacted (different workloads) at the same time. This is hard to do in existing table formats like Iceberg. Granted not everyone shares this requirement but it's increasingly important for a wide range of use cases like log management.