SQL SERVER – Logical Query Processing Phases – Order of Statement Execution

April 6, 2009

Of late, I penned down an article – SQL SERVER – Interesting Observation of ON Clause on LEFT JOIN – How ON Clause Effects Resultset in LEFT JOIN – which received a very intriguing comment from one of my regular blog readers Craig. According to him this phenomenon happens due to Logical Query Processing. His comment instigated a question in my mind. I have put forth this question to all my readers at the end of the article. Let me first give you an introduction to Logical Query Processing Phase.

What actually sets SQL Server apart from other programming languages is the way SQL Server processes its code. Generally, most programming languages process statement from top to bottom. By contrast, SQL Server processes them in a unique order which is known as Logical Query Processing Phase. These phases generate a series of virtual tables with each virtual table feeding into the next phase (virtual tables not viewable). These phases and their orders are given as follows:

1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10 ORDER BY
11. TOP

As OUTER join is applied subsequent to ON clause, all rows eliminated by the ON clause will still be included by the OUTER join as described in the article SQL SERVER – Interesting Observation of ON Clause on LEFT JOIN – How ON Clause Effects Resultset in LEFT JOIN.

However, I am perplexed about the last two, ORDER BY and TOP. According to some people TOP comes first in logical query processing phase while others suggest that ORDER BY comes first. Now, here I’ve laid down my questions for you all to think about:

1) What is the correct answer for order query processing phase – ORDER BY or TOP?
2) How can we create an example to verify query processing phase for ORDER BY and TOP?

I will soon publish the answers I receive to the above questions on this blog, with due credit given to my readers.

Reference : Pinal Dave (https://blog.sqlauthority.com)

Database, Software Development, SQL Joins

SQL SERVER – 2008 – Management Studio New Features

SQL SERVER – Server Type and File Extention

38 Comments. Leave new

Vinit
April 6, 2009 11:04 am
Hi Pinal,
1) In My knowledge ORDER BY clause comes first and then TOP.
2) Example:
/* Create a Table to Play with */
CREATE TABLE MyTable(ID INT)
/* Insert Some random Data */
INSERT INTO MyTable(ID) VALUES(8)
INSERT INTO MyTable(ID) VALUES(9)
INSERT INTO MyTable(ID) VALUES(2)
INSERT INTO MyTable(ID) VALUES(1)
INSERT INTO MyTable(ID) VALUES(5)
INSERT INTO MyTable(ID) VALUES(3)
INSERT INTO MyTable(ID) VALUES(4)
INSERT INTO MyTable(ID) VALUES(10)
INSERT INTO MyTable(ID) VALUES(6)
INSERT INTO MyTable(ID) VALUES(7)
/* See the records */
SELECT * FROM MyTable
/*
If the TOP is applied first and then ORDER BY clause then we must see top 5 records in acsending order, something like 1,2,5,8,9
Otherwise, all records first gets into order and then top 5 will be displayed from that ordered list, like 1,2,3,4,5
Lets verify….
*/
SELECT TOP 5 * FROM MyTable ORDER BY 1
/* Clean Up */
DROP TABLE MyTable
-Vinit
Reply
- NA NA
  June 1, 2016 3:42 pm
  In order to process the sql statement first it sort the records and then take top records from of it. for Example –
  CREATE TABLE TEST(ID INT);
  INSERT INTO TEST VALUES(1);
  INSERT INTO TEST VALUES(2);
  INSERT INTO TEST VALUES(5);
  INSERT INTO TEST VALUES(6);
  INSERT INTO TEST VALUES(3);
  INSERT INTO TEST VALUES(4);
  SELECT TOP 5 * FROM TEST
  ID
  1
  2
  5
  6
  3
  SELECT TOP 5 * FROM TEST ORDER BY ID DESC
  ID
  6
  5
  4
  3
  2
  SELECT TOP 5 * FROM TEST ORDER BY ID
  ID
  1
  2
  3
  4
  5
  Reply
fly
April 6, 2009 3:03 pm
Integer- table containing digits 1- 1 000 000
select top 10 * from integer order by 1 desc returns
100000
99999
99998
99997
99996
99995
99994
99993
99992
99991
this proves order by is the first in simple words.
Also, if you take a look at query plan, table\index scan is ordered first. Then top is applied.
…..
…..
Reply
fly
April 6, 2009 3:05 pm
ooops, sorry, XML tags are not shown in my prev post
Top
|—IndexScan Ordered=”true”
Reply
Brian Tkatch
April 6, 2009 7:13 pm
The help file for TOP states “If the query includes an ORDER BY clause, the first expression rows, or expression percent of rows, ordered by the ORDER BY clause are returned. If the query has no ORDER BY clause, the order of the rows is arbitrary.”
Example:
First for an example, we want to get the query to read the TABLE in a different order than the ORDER BY. To do that, the data should be stored opposite the ORDER BY. That can be accomplished via a CLUSTERED INDEX:
CREATE TABLE #Data
(
Id_Clustered TINYINT,
Id_Ordered TINYINT,
CONSTRAINT Data_Id_PK PRIMARY KEY CLUSTERED (Id_Clustered)
);
INSERT INTO #Data(Id_Clustered, Id_Ordered)
SELECT 1, 2 UNION ALL
SELECT 2, 1;
Id_Clustered is the COLUMN the TABLE will be CLUSTERED on, whereas Id_Ordered will be in the ORDER BY of the query. The values in the two records are opposites. This will force a query without an ORDER BY and with an ORDER BY to be different:
SELECT
Id_Clustered,
Id_Ordered
FROM
#Data;
Yeilds:
1,2
2,1
Because that is how it is CLUSTERED. But, if we add an ORDER BY:
SELECT
Id_Clustered,
Id_Ordered
FROM
#Data
ORDER BY
Id_Ordered;
The result is reversed to:
2,1
1,2
because it is following the ORDER BY statement.
Add a TOP 1 for the test to see which one gets pulled:
SELECT TOP 1
Id_Clustered,
Id_Ordered
FROM
#Data
ORDER BY
Id_Ordered;
Which shows:
2,1
This is the same as the ORDER BY results.
The Query plan shows the Clustered Index Scan first and then a Sort (Top N Sort). It specifically reordered the results to follow the ORDER BY.
This example shows that ORDER BY is handled first, and then TOP.
Reply
fan of SQLAuthority
April 6, 2009 8:37 pm
Brian,
You are answering like Imran, who used to answer really nice.
Reply
Mike Walsh
April 7, 2009 7:26 am
Hi Pinal –
Those examples definitely help show that order by happens first. The query plan will definitely show it as well.
As for a simple way to think about it, just think about what happens with each operator.
The ORDER BY is ordering your results, it is forcing them to come back in a certain way (contrary to relational theory :) ). The TOP operator is telling SQL Server to restrict the rows returned to a certain number of rows (either explicitly or by percentage). If TOP happened first that would mean you would just be ordering by the number of rows specified in the top. If you were to query a table and ask for the TOP 1 ORDERED BY a certain value that would be silly for a couple reasons:
1.) Silly to order by one row
2.) Let’s say you were querying a column that was not the clustered key, the TOP isn’t going to guarantee you which row you get (as described above) so not doing the ORDER BY first means you are not going to be getting meaningful results.
Good post and always helpful to remind folks the order of logical query processing (physical processing doesn’t always follow that order but it helps people think of ways to better limit their results earlier and improve performance.
Itzik’s book INSIDE SQL Server 2005 T-SQL Querying describes each of these steps in detail (though I believe that is actually Lubar’s chapter if memory serves correct)
Reply
MarlonRibunal
April 8, 2009 6:29 am
Here, take a look at Itzik Ben-Gan’s flow chart of Logical Query Processing (PDF download):
Reply
vijay ahire
April 25, 2009 12:49 pm
“the breakpoint will not currently be hit.unable to bind sql breakpoint at this time”.
Can anybody help me to solve this error while try to debug
SQL CLR procedure in VS2005.
Reply
Mike Walsh
April 27, 2009 4:58 pm
So looking at what it does, how it works and looking at Itzik’s poster in the link above and his other books that logical processing is described in. I would say the answer is “Yes”
It would appear as though to do the top it has to know the Order By clause and it needs to understand there is an Order By, and then filter only the top number/percent of rows according to that.
The actual ordering for display or passing on to the next query happens after the TOP filter is applied (even though it looked at the order by).
One important thing to remember is that we are talking about the logical query processing here. The physical order of events is not necessarily (not, actually) the same always.
Reply
Jones
April 28, 2009 3:53 am
Nice post…
Reply
Yousuf Khan
August 6, 2009 12:06 am
Left Join is Quite Simple !
but turns out to be very dangerous
if used incorrectly that is the query may take hours to execute and the final result contains lakhs of unwanted rows.
Simple way to implement join
on tables all having one field in common
say ID,i.e. ID field links all tables to each other
Sample simple query could be used for n number of tables and n number of joins
select
a.ID,
a.field2,
a.field3,
b.field1,
b.field2,
c.field1,
c.field2,
d.field1,
d.field2,
d.field3,
d.field4
from
(
(select * from table1) as a
left join
(select * from table2) as b
on a.ID=b.ID
left join
(select * from table3) as c
on a.ID=c.ID
left join
(select * from table4) as d
on a.ID=d.ID
)
I suggest never use left join !
If have to , create index on field like ID in above sample
on all tables to reduce query time
Reply
Bob
September 24, 2009 6:01 am
The logical processing order that you repeat above from the March blog comments isn’t correct in all cases, despite its being repeated in books, classes, and blogs. It is useful for explaining to those new to SQL how one might envision the processing being accomplished, however, it is neither specified by any ANSI SQL standard, nor is it guaranteed by Microsoft, nor is it accurate with respect to Microsoft’s current or prior implementations. No ANSI SQL standard requires a specific processing order, but instead leaves that to the implementation. The implementation’s are therefore free to use that flexibility to optimize certain scenarios as they see fit and as long as the syntax and results conform to the standard. Even the ORDER BY isn’t as cut and dry as one might suspect, since the implementation can use the information it knows to be true about the data to best effect. As a result, it would be unwise to declare statements in such a way as to depend on a specific processing order.
In the case of ORDER BY and TOP, Brian Tkatch gave you the only answer that you can hang your hat on; that is, that in the absence of an ORDER BY clause, TOP will return arbitrary rows. Therefore, you can consider that ORDER BY precedes TOP when ORDER BY is specified, however, that may not be what actually happens procedurally, just what apparently happens when the results come back. I can conceive of times where an optimizer could know enough about the data to select the TOP rows before even making a JOIN. Like gravity, Microsoft’s implementation is a black box to us: we all can make use of gravity everyday without having to know how it works. In this case, we are better off not knowing how it works, because we don’t want to be implementation-specific when we write our SQL. I can produce sample code that shows these differences if you’re interested.
P.S. It would be easy for someone to read Brian Tkatch’s comment and think that an ORDER BY is not required in the case where you pull results from a single table with a clustered index and the cluster key order is the order you desire. The fact is that result ordering is never guaranteed in the absence of an ORDER BY clause. On a related note, rows are not physically order by cluster key. The clustered index is a structure that is built from the data pages.
Reply
Brian Tkatch
September 24, 2009 7:07 pm
@Bob
Can you give an example where there is a CLUSTERED INDEX, no ORDER BY, and the query uses the CLUSTERED INDEX but the ordering is otherwise?
Reply
Bob
September 26, 2009 3:07 am
Brian,
>> Can you give an example …uses the CLUSTERED INDEX but the ordering is otherwise?
Something like this:
CREATE TABLE A (Col varchar(900) NOT NULL, Data varchar(4) NULL
, CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED ([Col] ASC)
)
GO
INSERT INTO A (Col, Data) VALUES (‘2kakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnl1akfvbsaib83y34nlnmhsojhg’, ‘B’)
INSERT INTO A (Col, Data) VALUES (‘1kakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnmhsojhg’, ‘A’)
GO
CREATE NONCLUSTERED INDEX [NCIX_Data_DESC] ON [dbo].[A] ([Data] DESC)
GO
SELECT TOP 2 * FROM A
The point is that when the BOL says that TOP returns arbitrary rows in the absence of ORDER BY, they are describing the expected behavior and not trying to give us insight into the implementation. The implementation, confronted with a clustered index without an ORDER BY, may typically return an ordered set; but we are not supposed to perceive it as an ordered set because there may be at least one codepath through the optimizer that doesn’t order as we would expect.
pinaldave seems to have a knack for finding those code paths :)
I can also give an example where step #2 ON is evaluated after step #8 SELECT or STEP #4 WHERE.
Don’t rely on implementation whenever and wherever you can get away with it.
Reply
- Anjum Niaz
  November 12, 2013 5:32 pm
  Bob,
  Its too late, I just read your conversation, but it might help some other reader. In your example, if you see, you created a non-clustered index. And the priority is, IF THE ONLY SELECTABLE COLUMN(s) IS AVAILABLE IN AN INDEX, and Order by is required, then instead of selecting from table, Sql Server prefers reading from Index. As it generates less I/O.
  Create a table with two columns, A and B, Add a non-clustered Index on A, Now select
  SELECT A
  WHERE A BETWEEN (2500 AND 3000)
  Now Create the same table with a Clustered Index and select
  SELECT A
  WHERE A BETWEEN (2500 AND 3000)
  Compare the I/O of two queries. Query two will have much higher I/O than Query 1. Now change the query to
  SELECT A, B
  WHERE A BETWEEN (2500 AND 3000)
  This time I/O will remain same for clustered Index, but much higher for non-clustered Index.
  This depicts that Sql Server uses non-clustered Index if operations do not require any other column.
  Since you created only one column. Add one more column, and select both of them and then tell the difference.
  I hope this will clear the understanding (though after 4 years)
  Reply
Bob
September 26, 2009 3:12 am
Brian,
I just reread the challenge. To have the query use the clustered index and not return cluster key ordered rows?
No, I don’t think I’ll attempt that.
But the query above shows that just having a clustered key present doesn’t mean that the TOP results will use that key. In the case above the Non-clustered index is used even w/o an ORDER BY.
Reply
Brian Tkatch
September 30, 2009 7:09 pm
@Bob
“just having a clustered key present doesn’t mean that the TOP results will use that key.”
That’s something the help file states.
But if the CLUSTERED INDEX is used, that is “probably” the order. According to Denny Cherry “If you don’t have an ORDER BY, then the SQL Server will probably sort by the clustered index and use it for filtering. If there is no clustered index it would then fall back to using the table itself and return the data in the order it comes across it.” From:
So, technically it doesn’t have to use it. And sure, you probably shouldn’t rely on it. But the fact remains, it is used. And blaming bugs in the code on “techncially”s is silly. Unless it really is the case, but my gut feeling is otherwise.
Reply
Zafar Iqbal
February 28, 2011 6:14 pm
Hi Pinal , In which sequence Where clause is executed ?
Reply
lakhan pariyani
July 27, 2012 3:08 pm
thanks for this best post… can you please tell me which comes first in this two things
the increment(Identity) is incremented first
or the constraints are checked first
As I know the answer that increments comes first but I need a reason for it..
because if the increment is done first and then constraints are checked if constraints are not matched then still the identity is increased which is wrong according to me..
but why these happens please reply as soon as possible
Reply
Emilian
July 31, 2012 8:58 pm
declare @x table (i int)
insert into @x(i) values(1),(5),(10),(2)
select top(2) i as Z from @x
SELECT * FROM (select top(2) i as Z from @x) y order by y.Z
Why it applies order by before the top?:)
Reply
Neeraj
May 14, 2013 11:15 am
How to use on ,out, CUBE and ROLLUP?
please give me one example
Reply