Also SELECT should be the last clause (at least for row-oriented DBMS, for colum...

em500 · on Jan 25, 2024

No, it should come after WHERE, GROUP BY but before ORDER BY. From my favorite SQL tutorial[1]:

The lexical ordering is:

    SELECT
    FROM
    WHERE
    GROUP BY
    HAVING
    UNION
    ORDER BY

while the logical order is:

    FROM
    WHERE
    GROUP BY
    HAVING
    SELECT
    UNION
    ORDER BY

[1] https://blog.jooq.org/10-easy-steps-to-a-complete-understand...

nivertech · on Jan 25, 2024

Your example for a row-oriented just adds UNION.

But you need first to order by column to get the rows in the desired order, while SELECT just indicates which column from the rows to return.

source table -> filtered rowset -> grouped rowset -> filtered grouped rowset -> ordered rows -> ordered rows with selected columns only

Source: I was working several years as a Human Query Planner for the columnar DB ;)

chrisjc · on Jan 25, 2024

I'm interested to understand why you think it should differ between OLAP and OLTP?

Are you suggesting that columns closer to the top matters for OLAP bc OLAP is columnar? Ultimately, a query planner is going to figure out what happens first, so I can't imagine your point has anything to do with execution.

nivertech · on Jan 25, 2024

1. Not every OLAP/columnar/timeseries DBMS has a Query Planner. For example in kdb+/q a developer need to think about correct order of WHERE clauses, know which columns have attributes (kind of "secondary indexes"), and lots of other tricks. The advantage is predictability of execution, the disadvantage - it's too complex.

2. I'm not talking about cases where a Query Planner parses the query language DSL, and computes Query Plan out of it. More the case when you kind of have an explicit Query Plan. The best example is FluxQL[1].

--

[1]. https://docs.influxdata.com/influxdb/cloud/reference/syntax/...

chrisjc · on Jan 25, 2024

Now that you mention it, I guess I've experienced *similar* issues (more to do with joins, unions, materializations, windows, etc) even when there is a query planner.

Even though the query planner has an optimizer, the plan produced isn't optimal. In some cases I have had to play around with the SQL (resulting in less than ideal SQL) to get the optimizer to do what it should.

This demonstrates the exact point you made:

> The advantage is predictability of execution

In my example, while I ultimately overcame the issues with the sub-optimal plan, there's no assurances about what the query planner/optimizer will come up with tomorrow.

nivertech · on Jan 25, 2024

Relevant infographics, probably inspired by this HN post:

https://x.com/alexxubyte/status/1750560163101315463?s=20

marcosdumay · on Jan 25, 2024

Nah, in the row one, it should come before the order by.

Alternatively, its semantics could change to reflect this ordering, but it would loss expressiveness.