Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also SELECT should be the last clause (at least for row-oriented DBMS, for columnar databases it might come before WHERE, GROUP BY, ORDER BY to hint which columns we want to select to query on).

Row-oriented:

  FROM table_name
  WHERE condition
  GROUP BY ...
  HAVING ...
  ORDER BY ...
  SELECT column1, column2, ...;
Columnar:

  FROM table_name
  SELECT column1, column2, ...
  ORDER BY ...
  WHERE condition
  GROUP BY ...
  HAVING ...;


No, it should come after WHERE, GROUP BY but before ORDER BY. From my favorite SQL tutorial[1]:

The lexical ordering is:

    SELECT
    FROM
    WHERE
    GROUP BY
    HAVING
    UNION
    ORDER BY
while the logical order is:

    FROM
    WHERE
    GROUP BY
    HAVING
    SELECT
    UNION
    ORDER BY
[1] https://blog.jooq.org/10-easy-steps-to-a-complete-understand...


Your example for a row-oriented just adds UNION.

But you need first to order by column to get the rows in the desired order, while SELECT just indicates which column from the rows to return.

source table -> filtered rowset -> grouped rowset -> filtered grouped rowset -> ordered rows -> ordered rows with selected columns only

Source: I was working several years as a Human Query Planner for the columnar DB ;)


I'm interested to understand why you think it should differ between OLAP and OLTP?

Are you suggesting that columns closer to the top matters for OLAP bc OLAP is columnar? Ultimately, a query planner is going to figure out what happens first, so I can't imagine your point has anything to do with execution.


1. Not every OLAP/columnar/timeseries DBMS has a Query Planner. For example in kdb+/q a developer need to think about correct order of WHERE clauses, know which columns have attributes (kind of "secondary indexes"), and lots of other tricks. The advantage is predictability of execution, the disadvantage - it's too complex.

2. I'm not talking about cases where a Query Planner parses the query language DSL, and computes Query Plan out of it. More the case when you kind of have an explicit Query Plan. The best example is FluxQL[1].

--

[1]. https://docs.influxdata.com/influxdb/cloud/reference/syntax/...


Now that you mention it, I guess I've experienced *similar* issues (more to do with joins, unions, materializations, windows, etc) even when there is a query planner.

Even though the query planner has an optimizer, the plan produced isn't optimal. In some cases I have had to play around with the SQL (resulting in less than ideal SQL) to get the optimizer to do what it should.

This demonstrates the exact point you made:

> The advantage is predictability of execution

In my example, while I ultimately overcame the issues with the sub-optimal plan, there's no assurances about what the query planner/optimizer will come up with tomorrow.


Relevant infographics, probably inspired by this HN post:

https://x.com/alexxubyte/status/1750560163101315463?s=20


Nah, in the row one, it should come before the order by.

Alternatively, its semantics could change to reflect this ordering, but it would loss expressiveness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: