Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those of you from the AI world, this is the equivalent of the bitter lesson and DeWitts argument about database machines from the early 80s. That is, if you wait a bit with the exponential pace of Moores law (or modern equivalents), improvements in “general purpose” hardware will obviate DB specific improvements. The problem is that back in 2012, we had customers that wanted to query terabytes of logs for observability, or analyze adtech streams, etc. So, I feel like this is a pointless argument. If your data fit on an old MacBook Pro, sure you should’ve built for that.


AWS started offering local SSD storage up to 2 TB in 2012 (HI1 instance type) and in late 2013 this went up to 6.4 TB (I2 instance type). While these amounts don't cover all customers, plenty of data fits on these machines. But the software stack to analyze it efficiently was lacking, especially in the open-source space.


AWS also had customers that had petabytes of data in Redshift for analysis. The conversation is missing a key point: DuckDB is optimizing for a different class of use cases. They’re optimizing for data science and not traditional data warehousing use cases. It’s masquerading as size. Even for small sizes, there are other considerations: access control, concurrency control, reliability, availability, and so on. The requirements are different for those different use cases. Data science tends to be single user, local, and lower availability requirements than warehouses that serve production pipelines, data sharing, and so on. I also think that DuckDB can be used for those, but not optimized for those.

Data size is a red herring in the conversation.


>Data size is a red herring in the conversation.

Not really. A Redshift paper just shared that.

>..here is a small number of tables in Redshift with trillions of rows, while the majority is much more reasonably sized with only millions of rows. In fact, most tables have less than a million rows and the vast majority (98 %) has less than a billion rows.

The argument can be made that 98% of people using redshift can potentially get by with DuckDB.

https://assets.amazon.science/24/3b/04b31ef64c83acf98fe3fdca...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: