> MySQL (technically InnoDB) and SQL Server have a huge problem with them due to their design as a clustering index – since the tuples are physically located, or clustered, around the PK, the random nature means that doing a simple query like "SELECT id FROM customer WHERE date_created > NOW() - INTERVAL 1 WEEK" means that the hits are going to be on tons of different pages. Even if everything is cached, that's still a lot of extra work.
The postgres query planner has a correlation statistic for how well indexes match the order of the table data on disk, and if it's bad and the query is expected to return enough rows, it'll skip the index entirely. Instead it'll do a sequential scan and sort, to work with the disk cache instead of against it.
Postgres provides a CLUSTER command to reorder the table data on disk to match a given index, resulting in a high correlation and allowing it to do an index scan without the penalty for random disk access. If you're on an SSD or know you have enough memory to have all the data cached you can also disable the random access penalty in the configuration; it exists mainly for spinning disks.
Imo I agree CLUSTER should help, although pg_repack is even better, since it doesn't yknow hold an exclusive lock while it copies the entire table. pg_repack still is very slow compared to consistent BTree rebalancing in MySQL.
The problem isn't fixed with SSD or RAM or the settings in Postgres. Fast random access is great, but if your data isn't localized you are leaving a 100x optimization on the table.
The original comment I replied to explained the data localization problem as if it was a MySQL problem and Postgres doesn't have the issue. That's like saying that Rust allows unsafe and this can cause memory corruption, so let's use C++. It's hard to understate how little sense it makes.
Postgres is basically the only DB which doesn't offer a decent solution for data localization. Every other DB lets you control the layout of rows automatically with minimal resource overhead, postgres tools are extremely bad in comparison.
How do you fix this problem in Postgres?