FROM is not a command, it's a parameter, and an optional one at that.
This is valid SQL.
SELECT 1;
The SQL commands are SELECT, UPDATE, INSERT, etc. Therefore, those commands should be the first thing in an instruction. If you have a file full of SQL, you probably want all the lines to start with those commands. Gonna be pretty weird to read if you have both SELECT and UPDATE lines that start with FROM. Probably difficult for the parser also. Even hairier when it comes to subqueries.
I do however, strongly agree that the most common scenario for user workflow is to choose tables first, then choose columns from those tables. I don't know if changing the language is really the answer. Intelligent tooling can and does already solve this. What I typically do is start all my queries by selecting *, then I go back and fill in the columns last.
The godfather of relational calculus languages, Alpha, put the relation context first, and then allowed you to send multiple commands that operate within that context. SQL eventually gave in and added WITH to soft of, kind of, try to get to the same place.
But, ultimately, SQL was designed around the idea discrete function calls rather than unit operations (with continued efforts to try and hack on the latter after the fact), so in that sense, the operator going first does make more sense.
"SELECT 1" would still be valid, and you'd still have the commands first, but you could also get the benefits of IDE autocomplete for columns by specifying the table before the columns.
An SQL parser has no problem supporting both forms as there is no parsing ambiguity.
(SQL is full of corner cases e.g. EXTRACT(WEEKDAY FROM field) etc. Putting the FROM first, or supporting multiple chained WHERE clauses or allowing WHERE before JOIN and applying to the preceding projection etc is all possible in an SQL parser that chooses to allow some relaxations. Personally, I am really irritated that I can't have HAVING without GROUP or QUALIFY without window functions etc, as I often construct queries programmatically.)
Nitpick -- SQL was not designed to be used programmatically. Same as shell commands, it's for humans to write manually. Proper API could have way more concise, and much more efficient format to parse. And would avoid a bunch of security issues along the way, same as for shell.
Exactly. I still remember about this "revolutionary" querying language being designed, that would allow for non-technical people to perform complex queries and reports from databases.
People: stop trying to mangle SQL "because it would be better..."; nah, SQL is supposed to be the "better" already. The idiosyncrasies it has is because it was designed to be kinda-sorta conversational, modeled as an english language query.
For non-technical people. It is not supposed to be the "better" for the technical people who end up using SQL in practice. QUEL was the "better" for us, being much closer to Codd's vision. But, alas, Oracle won with the business people and Postgres lost.
This specific issue is a problem for humans, not for APIs.
I have a very smart SQL IDE with great intellisense, but when I type "SELECT", it can't help me because it has no idea what I want. Being able to type "FROM table SELECT" would be way friendlier for humans, because IDEs could offer me immediately columns I'm very likely interested in.
Why is a very smart SQL IDE not able to work around that limitation with smart UI/UX? I can see why baby's first SQL IDE would benefit from the language being amenable to IDEs, but a very smart one should be able to take things further, no?
Given that it is a very smart IDE, the column suggestion UI integrates table selection in a smart way. Once the IDE knows which table(s) you are referencing, it automatically fills in the FROM clause for you.
Or just use a better language. SQL was designed for non-technical people, using statements that resemble the English they already know. It was not designed for tech-minded people who are able to grasp higher level technical concepts.
Given how often `SELECT ` is already the developer "default" especially in debugging and REPL work there's a good case that `SELECT ` can truly just be the implied default case and something like `FROM 1` makes perfect sense as a standalone statement.
The poster's problem is not recognizing that SQL is not procedural. The SQL query is declarative, though some databases are more literal in turning these into query plans than others.
> SELECT COUNT(*) FROM ℤ;
ℵ₀
> SELECT COUNT(*) FROM ℝ;
POWER(2,ℵ₀)
> EXEC sp_configure 'continuum hypothesis','1';
> RECONFIGURE;
> SELECT COUNT(*) FROM ℝ;
ℵ₁
While non-termination is probably best left implementation-defined (to allow caller-terminated streams of infinite results where they make sense), ORDER BY would clearly be an error where no well-defined "first" or "next" result exists.
How fun it would be to receive a Cantor result in which as many elements from the original set as there were have been removed, leaving a set drawn from what you started with which still contains as many elements!
that may very well be the case but perhaps there could be an evolution of the language because for newcomers the idea that the from clause is evaluated/executed/defined first is a pain point.
Newbies I help sometimes wonder why things they reference in the SELECT part aren't visible/available in the FROM clause and that's because one is selecting the result of FROM ... JOIN ... WHERE etc anyway
> he idea that the from clause is evaluated/executed/defined first is a pain point
Maybe my brain is broken by years of SQL and from learning English as a second language. But isn’t this supposed to follow a fairly mundane English sentence structure? “Select socks and pants from drawer”.
If you started saying “from drawer select socks and pants”, wouldn’t that feel like a weird sentence structure to most people?
Slightly weird, though grammatically acceptably, as far as I know.
But the debate is actually coming from something else your example shows nicely. When you select something from a drawer, you select entities. Usually when we select from a relational table, we select properties. In SQL, the drawer is not a thing.
A different way to keep the English-style would be to add a BUT-JUST-[THEIR] clause.
SELECT
FROM socks
WHERE size = 45
BUT JUST [THEIR] color
All fairly tongue in cheek of course. SQL isn't going to change, so there's really no point in worrying too much about it :-)
> Usually when we select from a relational table, we select properties. In SQL, the drawer is not a thing
Ah but that’s by convention! If we want to be super pedantic, they’re all just relations. The drawer table describes a relation between values. And SELECT just defines a new relation! You’re creating an ad-hoc “table”.
That’s why “SELECT socks FROM (SELECT socks, pants FROM drawer)” works just fine.
Ok yes my brain has definitely been broken by years of SQL. I’m not the right person to understand newbies anymore.
Things are records / rows. Collections of the same things are entities (rubber band/ folders). Groups of entities are tables (Drawers).
A star schema usually contains one table to one entity mapping.
Dynamic tables for custom field on the other hand are usually multi entity tables. In this case, it would be a large junk drawer with various unrelated folders stuffed inside with everything from report cards, to keys to toys, to take out menus. It would have the toys entities, take out menu entities, keys entities, etc inside a mixed table.
Trouble is that it falls apart when you have more than one operation. "select socks and pants from drawer, and delete money from drawer" is the weird sentence structure. Whereas if 'from' comes first: "from drawer, select socks and pants, and delete money". Which, granted, is still weird, but less so, and fixable with better word choice. Perhaps: "open drawer, select socks and pants, and remove money.
Codd's original vision would have seen that expressed as something like:
RANGE DRAWER D
GET W (D#,SOCKS,PANTS)
DELETE D:(D.MONEY=1)
The only thing really missing in that from an English sentence structure perspective is generally we want just one more article in it and maybe a comma: from the drawer, select socks and pants.
But similarly I've written a lot of C# LINQ statements at this point and its from-first pattern seems to flow naturally enough that I don't see anything wrong with it (and a lot right, C# definitely has a great autocomplete experience under LINQ).
Linq does this [0], maybe partially because it's SQL-like but built for working with enumerables instead so they could do what they wanted. One other quirk of SQL ordering that always gets me is that SET comes before WHERE in UPDATE. I always get terrified that I'll run without the WHERE or without selecting the WHERE. Thankfully a good database tool like DataGrip will yell at you if you try to do a modifying operation without a WHERE.
SQL could also have done what they wanted, for example allowing both
FROM foo
WHERE baz = quux
SELECT bar
and even (the equivalent linq wont compile because ‘baz’ isn’t there anymore when the ‘where’ runs)
FROM foo
SELECT bar
WHERE baz = quux
However, SQL predates smart code completion being an expected language feature, so they went for the feature “looks like normal American English” (“from the kitchen, can you get me the scales?” is less common than “can you get me the scales from the kitchen?” or “can you get me the scales? They’re in the kitchen”) instead of “make a grammar where smart code completion works well”.
I can't remember where I read it, but I believe one of the main reasons why Linq moved the FROM clause to the front was for better intellisense. If you start writing SELECT x in an IDE, there really is no way of providing intellisense for x.
However if you write FROM table SELECT x, the IDE can first provide intellisense for table names while you write your FROM clause. Then it can provide intellisense for the SELECT clause based on columns in the tables/views/etc you listed in your FROM. So it was basically done for better UX.
I believe LINQ maps to a chains of function calls on an enumerable, with the functions usually taking anonymous functions as input (and then the series of functions is rewritten into a SQL statement by EF to hit the DB). I don’t think you’d be able to support type analysis if you didn’t specify the enumerable upfront
Eg
Mylist.select(x => …) can determine the type of x, because it has the type of mylist —> List<T>
Select(x => …).from(mylist) and you simply can’t determine x, because C# type analysis can’t go backwards.
So I think it’s less a UX question and more of an absolute requirement for the API to work at all. The alternative would be a SelectFrom(x => …, mylist) function so it can be submitted in one shot, essentially what SQL is doing, but that’s disgusting — who doesn’t love function chaining?
No, you would be able to support type analysis just as easily because desugaring
select x => ... from mylist;
into
mylist.SelectMany(x => ...);
is straightforward: C# compiler is not expected to be single-pass, it has an AST to operate upon. You'll have a LinqSelectNode with Projection (a lambda expression), Filter, and Source fields which you replace with a new ExtensionMethodCall{ Lhs = origNode.Source, MethodName = "SelectMany", Args = new []Node{ origNode.Projection }), easy.
In terms of actually writing queries live, like if you're in the CLI client, the order of UPDATE is definitely terrifying.
The workarounds of writing it out of order or as a SELECT first are fine... I'd almost like to see a mode the interactive client sets that just rejects any UPDATE without a WHERE, and you'd have to do WHERE 1 or similar to get an "UPDATE everything."
If you are doing an update the first word is actually BEGIN.
/* use your PPE */
BEGIN;
/* make the change */
UPDATE foo SET bar = ‘baz’;
/* sanity check */
SELECT * FROM foo WHERE bar != ‘baz’;
/* oops, let’s pretend this never happened */
ROLLBACK;
This terrifies me as well. The workaround is to write the WHERE clause before the SET clause. If you inadvertently submit the query partway, it will be invalid and it's not a big deal.
If you have to use TSQL like me, you can use BEGIN/ROLLBACK TRAN with an OUTPUT clause to easily confirm, then just change it to COMMIT. I actually like this workflow quite a lot
I have always appreciated how Linq's syntax sets the data context first, then the filtering/grouping, and FINALLY selecting what columns should be in the result set. Not only does this make IDE hinting much easier but it's also more logical: "From the set of all left-handed NBA players who are from Europe and shorter than 201 cm, give me their month of birth."
I am surprised that SQL Server hasn't offered Linq syntax as an option for writing queries and stored procedures; could always start such queries with `Using Linq:` prefix so the query engine knows it's not using T-SQL...
It would be really nice if PostgreSQL and sqlite implemented this natively as it really feels how SQL should be (re)designed today.
It also makes writing complex queries easier as it would be possible to split out actions in multiple steps where you can see what columns and types you have created (and found) so far.
Complex SQL is like complex C++: works if you get it right, but no help whatsoever for figuring out what mistake you made when messing up. Step-by-step is such a great help.
On one hand SQL has some oddities, on the other I'd rather there not be yet more different-but-equivalent ways to do things in SQL. I think the current state is good enough.
Though I can see a strict subset of SQL being useful. First I'd axe the timestamp type in favor of always using timestamptz, cause those two things don't mean what most people think.
I think that Perl taught us that there should only be one correct way to do something. Otherwise you wind up with a write-only language that nobody else can ever maintain.
Also SELECT should be the last clause (at least for row-oriented DBMS, for columnar databases it might come before WHERE, GROUP BY, ORDER BY to hint which columns we want to select to query on).
Row-oriented:
FROM table_name
WHERE condition
GROUP BY ...
HAVING ...
ORDER BY ...
SELECT column1, column2, ...;
Columnar:
FROM table_name
SELECT column1, column2, ...
ORDER BY ...
WHERE condition
GROUP BY ...
HAVING ...;
I'm interested to understand why you think it should differ between OLAP and OLTP?
Are you suggesting that columns closer to the top matters for OLAP bc OLAP is columnar? Ultimately, a query planner is going to figure out what happens first, so I can't imagine your point has anything to do with execution.
1. Not every OLAP/columnar/timeseries DBMS has a Query Planner. For example in kdb+/q a developer need to think about correct order of WHERE clauses, know which columns have attributes (kind of "secondary indexes"), and lots of other tricks. The advantage is predictability of execution, the disadvantage - it's too complex.
2. I'm not talking about cases where a Query Planner parses the query language DSL, and computes Query Plan out of it. More the case when you kind of have an explicit Query Plan. The best example is FluxQL[1].
Now that you mention it, I guess I've experienced *similar* issues (more to do with joins, unions, materializations, windows, etc) even when there is a query planner.
Even though the query planner has an optimizer, the plan produced isn't optimal. In some cases I have had to play around with the SQL (resulting in less than ideal SQL) to get the optimizer to do what it should.
This demonstrates the exact point you made:
> The advantage is predictability of execution
In my example, while I ultimately overcame the issues with the sub-optimal plan, there's no assurances about what the query planner/optimizer will come up with tomorrow.
On page 4 of "A Critique of Modern SQL And A Proposal Towards A Simple and
Expressive Query Language" [0] (recently published at CIDR 2024) there's a great diagram covering this and many other "semantical ordering" confusions with SQL. It's an interesting paper in general, by a couple of the biggest names in modern database research (...though admittedly perhaps not language design).
Something like 'SaneQL' (which the paper introduces) deserves to succeed outside of the lab. Source is here [1].
Speak for yourself, I've always loved Python's list comprehension syntax, it makes ALL the sense. For ME.
It's a question of ergonomics, just like my favorite chair is not going to be your favorite chair for whatever reason, programming language constructs that bug you doesn't mean it'll bug anyone else.
Your foreach() example bugs the hell out of me, for example. But I'm not campaigning to have it erased from existence, I just don't use languages that model their syntax like that.
You have a good point, but the problem that not knowing the table to select beforehand does seem valid.
What if the FROM clause is called SELECT and the SELECT clause is called PROJECT? (So it will read something like SELECT <table> PROJECT <columns>?)
The Lil[0] scripting language, like most APL-derived languages, has uniform operator precedence; expressions evaluate right-to-left unless you introduce parens:
3 * 2 + 1
9
(3 * 2) + 1
7
Lil includes an integrated query syntax which loosely resembles SQL. Queries begin with a command (select, update, extract), contain intermediate clauses in any order (where, orderby, by), and conclude with "from":
select key value orderby value desc from x
The order of evaluation of clauses matches precedence of other operators: "from" executes, then every intermediate clause, right-to-left, then finally the command and any column expressions or aggregations. Since queries are expressions, rather than statements (like all control structures in lil), having "from" last means that you can chain "subqueries" without nesting them:
extract key orderby value desc from
select key:first value value:count value by value from a
I think this approach is nicely internally-consistent; the only real downside is that, as with SQL, this syntax is not ideal for IDE-driven auto-completion.
WTF is right-to-left evaluation's purpose? It breaks the most fundamental rules of arithmetic. As a human that has been taught those rules, I'd never come up with 9 as an answer. Why did humans create a programming language that would?
Traditional mathematical notation has precedence rules optimized for expressing polynomials. Programming languages- especially in the APL family- tend to have a richer collection of primitive verbs, and an "ideal" tower of binding precedence for all of them would be extremely complex, difficult to remember, and often unhelpful.
Unform evaluation order is much easier to remember and extend (no special cases!) and becomes even more natural than PEMDAS with a little practice. The APL family is hardly unique in this approach to precedence: Smalltalk has uniform infix precedence, Forths are uniformly postfix (unless you get goofy with parsing words), Lisps are uniformly prefix.
IMO, infix operators in math are a historical relic of function notation being developed after the basic operators became widespread. I’ve always thought uniform prefix notation is ideal, whether it’s lisp-style (+ 1 x y) or the more common +(1, x, y)
I remember having this question when learning J, and they pointed out that it makes the = operator behave correctly. After:
a = 1 + 2
One expects `a` to equal 3. If you have a uniform left to right evaluation order, it will equal 1 and then the whole expression will be set to 3, so you’d need to write this to get the intuitive result:
SELECT ... FROM ... is more natural for smaller SQL.
When you get 1000 line SQL files or 100s of SQL files, code management pain makes you yearn for FROM ... SELECT ... (three cheers for data build tool!)
Absolutely! I'm just generally saying that SQL could be more succinct and composable - it's all a combination of project, filter and other primitive operations from relational algebra. SQL is rather verbose, and the question of whether FROM or SELECT should come first is just paint on the object.
and boy is it an awkward syntax. Circa 2008 I was getting interested in the "semantic web" and wasn't so happy with RDFS and OWL and thought Datalog would be a useful approach and it was an obscure topic then. 10 years later people struggling w/ SQL and other query languages revived it because it seems so much conceptually clean than alternatives.
Similarly there is something that looks terribly half-baked about triggers, stored procedures, etc. in SQL and I've long thought something based on production rules could be cleaner but the world just hasn't cared.
I've been on the lookout for new languages as well (though haven't created anything myself - unlike you).
I'd say, getting the data (1) and triggers/procedures (2) are totally different domains with regarding to syntax. For select (i.e, 1), I have my ideas, but for (2) I've no clue. For (2), now that I think about it, I would say there are two additional levels of syntax that need to be solved: first, in addition to "getting data" you need also to modify it, so there need be syntax for that part (i.e the UPDATE part of SQL); and second, how to you connect these modifiers to events that happen, this is yet another domain of syntax IMO. (This latter feels like a general purpose programming language already, so maybe build it in to any of them which have great syntax already?)
FROM Table AS t WHERE t.Condition SELECT t.col1, t.col2, ...
might be more natural than the traditional
SELECT t.col1, t.col2, ... FROM Table AS t WHERE t.Condition
If we compare it with how loop are described in programming languages:
Loop -> SELECT-FROM-WHERE
Table -> Collection
AS t -> Loop instance variable
WHERE -> condition on instances
In Java and many other PLs, we write loops as follows:
foreach x in Collection
if x.field == value:
continue
// Do something with x, for example, return in a result set
So we first define the collection (table) we want to process elements from. Then we think about the condition they have satisfy by using the instance variable. And finally in the loop body we do whatever we want, for example, return elements which satisfy the condition.
In Python, loops also specify the collection first:
for x in Collection:
Python list comprehension however uses the traditional order:
[(x.col1, x.col2) for x in Collection if x.field2 == value]
Here we first specify what we want to return, then collection with condition.
I'm actually assuming most of the time you would leave off the "from" entirely in spoken language. It is implied or completely obviated by lack of choices, no?
Until you encounter a more complex example. Now it is "Select green apples from the refrigerator, insert more milk into the refrigerator, and remove any expired items from the refrigerator" which is less natural.
In contrast, the father of SQL, Alpha, allowed: "From the refrigerator, select the green apples, insert more milk, and remove any expired items."
What SQL shares with English (and most other natural languages) is that the rules are sometimes confusing/inconsistent/illogical, but evolution is very slow so most of us are probably stuck with the status quo during our lifetimes (career time for SQL) due to the overwhelming network effects.
The problem with English is that it isn't really a single language but a mashup of Latin/French/German/Scandinavian due to the number of invasions and each invader not really covering the whole country.
A simple way to demonstrate this is how we ended up with cities that have massively different pronunciation even though the spelling is similar.
English readability is only true for the simplest of queries. That goes overboard quickly as complexity increases. So I'd happily give that up for the from-select order if it means better code completion.
This is the obvious answer assuming the designer of SQL was a native english speaker.
Most of these arguments that it should work some way logical to the DB make too many assumptions about how the internals of the database work and don't think about how the query optimizer/planner might work very differently than the way a query is organized.
SQL is really old now, 50+ years, assumptions about how it worked or work are probably not that relevant across its entire history.
I first learned it almost 30 years ago now, it was definitely taught back in the day that SQL was one of those odd languages that was designed assuming you "wouldn't need an engineer" to write it. Laughable but would explain why engineers might not find the syntax logical.
There is a hammer. Some people are skilled and proficient when using the hammer, some new people sometimes hit their fingers with the hammer. Do not try to replace all hammers in the world with something better, make a better hammer and leave old hammer alone, then if it's good enough, proficient people will adopt it and use it and new people will save their fingers.
I still largely reject the code completion complaint. Specifically, it is very common to know what columns you want before you know what tables have said columns. This is especially true on normalized schemas where you are almost certainly having to do some joins to get all that you want.
So, does it somewhat complicate the logic? Of course. It is by no means impossible, though, and unless you are using super generic column names everywhere, the search space for what tables to suggest will be helped by knowing what columns you are looking for.
I'm not sure. I think it's fairly common for something like "Now was the column name on this table description_primary or primary_description? I can't remember... Oh well, I'll just SELECT * and figure it out later" to happen. Starting with FROM would at least eliminate that backtracking.
I can't say that you are wrong. I still find it hard to take as a complete argument. For one, you can almost certainly autocomplete all column names with an indicator for the table they come from. For two, you should be far more consistent in how you prefix things like that. :)
Oh for sure. This is definitely one of those "inherited some weird & inconsistent legacy system" situations.
Honestly, it didn't even register as a potential issue in my mind until I had a chance to use LINQ query syntax in C#, and thought it was kind of nice to have the `from` up front. It's a minor annoyance at most, at any rate.
It's always frustrating to me that the order matters at all. I understand SQL can be complex to parse, but surely we are smart enough to come up with a parser that doesn't care whether FROM is before or after SELECT?
HoneySQL lets us define queries with maps, like {:select [:col1 :col2] :from :table}, and turns that into SQL. In a better world, SQL would be structured data like HoneySQL, and the strange SQL syntax we know and love would be a layer on top of that, or wouldn't exist.
well, FROM should not be needed unless you are aliasing table names
So ideally it would be
SELECT Employee.Name, Address.Street
WHERE ...
GROUP BY
HAVING
ORDER BY
or
FROM Employee AS EMP , Address as A
SELECT EMP.Name, A.Street
WHERE ...
GROUP BY
HAVING
ORDER BY
We should always use fully qualified names is the select
And optionally ommit them in further down clauses such as WHERE or ORDER BY
if the names are unambigious
Is the purpose of the syntax to maximize human readability or to make parsing efficient?
SELECT/DELETE/INSERT etc. are commands, it makes sense to me that if I were writing a parser I would start with the imperative that will determine the rest of the path through the parser. I'm speculating, but I think I'm right, given this is late 70's/early 80's technology and resources were much less abundant.
If you ever need column name autocomplete, writing an UPDATE statement and changing it after is an ok workaround.
It would probably be a tough change to push through for the standard committee... a great one for users though, even if it takes 10+ years until you can use it in production.
You're probably getting downvoted just because of Azure, but you are right. ADX/Kusto and KQL is an extremely powerful query language that answers exactly what OP questions about SQL.
Here's an example of a KQL query that I have in my browser...
Things
| where DeviceTags has "Installed"
| order by LastHeardFromTimeStamp desc
| take 10
KQL also takes ideas from R's Tidyverse and magrittr package. It takes datasets and pipes them into a new function. Like this...
car_data <-
mtcars %>%
subset(hp > 100)
From the Microsoft auto downvoters out there, all Azure dashboards and infrastructure analytics run on KQL (think Graphana, but on Azure). There are billions of KQL queries executing continuously, so it is absolutely a good example of a non-SQL query language that is active and mature.
SQL is a mess. It's actually a fairly well-thought-out and straightforwards language. However, in the quest to make it 'human-readable', it's very unintuitive to compose.
Then let's stop using it for the wrong purpose. Any DBMS can have two interfaces -- SQL for human interaction, and proper API for machines to talk. More secure and efficient.
I completely agree. I've written several DSLs that are much better for machine use, and they've gained traction, but to really shift the industry would require involvement from vendors, who seem uninterested.
My mental model was always that the order of keywords mirrored the order the database engine applied them. So WHERE is processed first and SELECT last.
Right but that’s not how SQl executes unfortunately. You go to Starbucks then ask for what you need. SQL does a Cartesian product of all the tables you need, then filters with the ON clause. Only then can you “order” what you want in SELECT.
In standard SQL syntax you’re essentially asking for what you want at your house then driving to a Starbucks.
More readable but doesn’t follow the execution order at all.
This is valid SQL.
The SQL commands are SELECT, UPDATE, INSERT, etc. Therefore, those commands should be the first thing in an instruction. If you have a file full of SQL, you probably want all the lines to start with those commands. Gonna be pretty weird to read if you have both SELECT and UPDATE lines that start with FROM. Probably difficult for the parser also. Even hairier when it comes to subqueries.I do however, strongly agree that the most common scenario for user workflow is to choose tables first, then choose columns from those tables. I don't know if changing the language is really the answer. Intelligent tooling can and does already solve this. What I typically do is start all my queries by selecting *, then I go back and fill in the columns last.