SQL SERVER – Logical Query Processing Phases – Order of Statement Execution

You can download the poster from Itzik Ben-Gan’s T-SQL Querying page over here.

Of late, I penned down an article – SQL SERVER – Interesting Observation of ON Clause on LEFT JOIN – How ON Clause Effects Resultset in LEFT JOIN – which received a very intriguing comment from one of my regular blog readers Craig. According to him this phenomenon happens due to Logical Query Processing. His comment instigated a question in my mind. I have put forth this question to all my readers at the end of the article. Let me first give you an introduction to Logical Query Processing Phase.

What actually sets SQL Server apart from other programming languages is the way SQL Server processes its code. Generally, most programming languages process statement from top to bottom. By contrast, SQL Server processes them in a unique order which is known as Logical Query Processing Phase. These phases generate a series of virtual tables with each virtual table feeding into the next phase (virtual tables not viewable). These phases and their orders are given as follows:

1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10 ORDER BY
11. TOP

As OUTER join is applied subsequent to ON clause, all rows eliminated by the ON clause will still be included by the OUTER join as described in the article SQL SERVER – Interesting Observation of ON Clause on LEFT JOIN – How ON Clause Effects Resultset in LEFT JOIN.

However, I am perplexed about the last two, ORDER BY and TOP. According to some people TOP comes first in logical query processing phase while others suggest that ORDER BY comes first. Now, here I’ve laid down my questions for you all to think about:

1) What is the correct answer for order query processing phase – ORDER BY or TOP?
2) How can we create an example to verify query processing phase for ORDER BY and TOP?

I will soon publish the answers I receive to the above questions on this blog, with due credit given to my readers.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

31 thoughts on “SQL SERVER – Logical Query Processing Phases – Order of Statement Execution

  1. Hi Pinal,
    1) In My knowledge ORDER BY clause comes first and then TOP.
    2) Example:

    /* Create a Table to Play with */
    CREATE TABLE MyTable(ID INT)
    /* Insert Some random Data */
    INSERT INTO MyTable(ID) VALUES(8)
    INSERT INTO MyTable(ID) VALUES(9)
    INSERT INTO MyTable(ID) VALUES(2)
    INSERT INTO MyTable(ID) VALUES(1)
    INSERT INTO MyTable(ID) VALUES(5)
    INSERT INTO MyTable(ID) VALUES(3)
    INSERT INTO MyTable(ID) VALUES(4)
    INSERT INTO MyTable(ID) VALUES(10)
    INSERT INTO MyTable(ID) VALUES(6)
    INSERT INTO MyTable(ID) VALUES(7)
    /* See the records */
    SELECT * FROM MyTable
    /*
    If the TOP is applied first and then ORDER BY clause then we must see top 5 records in acsending order, something like 1,2,5,8,9
    Otherwise, all records first gets into order and then top 5 will be displayed from that ordered list, like 1,2,3,4,5
    Lets verify….
    */
    SELECT TOP 5 * FROM MyTable ORDER BY 1
    /* Clean Up */
    DROP TABLE MyTable

    -Vinit

    Like

  2. Integer- table containing digits 1- 1 000 000

    select top 10 * from integer order by 1 desc returns
    100000
    99999
    99998
    99997
    99996
    99995
    99994
    99993
    99992
    99991
    this proves order by is the first in simple words.

    Also, if you take a look at query plan, table\index scan is ordered first. Then top is applied.

    …..

    …..

    Like

  3. The help file for TOP states “If the query includes an ORDER BY clause, the first expression rows, or expression percent of rows, ordered by the ORDER BY clause are returned. If the query has no ORDER BY clause, the order of the rows is arbitrary.”

    Example:

    First for an example, we want to get the query to read the TABLE in a different order than the ORDER BY. To do that, the data should be stored opposite the ORDER BY. That can be accomplished via a CLUSTERED INDEX:

    CREATE TABLE #Data
    (
    Id_Clustered TINYINT,
    Id_Ordered TINYINT,
    CONSTRAINT Data_Id_PK PRIMARY KEY CLUSTERED (Id_Clustered)
    );
    INSERT INTO #Data(Id_Clustered, Id_Ordered)
    SELECT 1, 2 UNION ALL
    SELECT 2, 1;

    Id_Clustered is the COLUMN the TABLE will be CLUSTERED on, whereas Id_Ordered will be in the ORDER BY of the query. The values in the two records are opposites. This will force a query without an ORDER BY and with an ORDER BY to be different:

    SELECT
    Id_Clustered,
    Id_Ordered
    FROM
    #Data;
    Yeilds:
    1,2
    2,1

    Because that is how it is CLUSTERED. But, if we add an ORDER BY:

    SELECT
    Id_Clustered,
    Id_Ordered
    FROM
    #Data
    ORDER BY
    Id_Ordered;

    The result is reversed to:
    2,1
    1,2

    because it is following the ORDER BY statement.

    Add a TOP 1 for the test to see which one gets pulled:

    SELECT TOP 1
    Id_Clustered,
    Id_Ordered
    FROM
    #Data
    ORDER BY
    Id_Ordered;

    Which shows:

    2,1

    This is the same as the ORDER BY results.

    The Query plan shows the Clustered Index Scan first and then a Sort (Top N Sort). It specifically reordered the results to follow the ORDER BY.

    This example shows that ORDER BY is handled first, and then TOP.

    Like

  4. Hi Pinal –

    Those examples definitely help show that order by happens first. The query plan will definitely show it as well.

    As for a simple way to think about it, just think about what happens with each operator.

    The ORDER BY is ordering your results, it is forcing them to come back in a certain way (contrary to relational theory :) ). The TOP operator is telling SQL Server to restrict the rows returned to a certain number of rows (either explicitly or by percentage). If TOP happened first that would mean you would just be ordering by the number of rows specified in the top. If you were to query a table and ask for the TOP 1 ORDERED BY a certain value that would be silly for a couple reasons:

    1.) Silly to order by one row
    2.) Let’s say you were querying a column that was not the clustered key, the TOP isn’t going to guarantee you which row you get (as described above) so not doing the ORDER BY first means you are not going to be getting meaningful results.

    Good post and always helpful to remind folks the order of logical query processing (physical processing doesn’t always follow that order but it helps people think of ways to better limit their results earlier and improve performance.

    Itzik’s book INSIDE SQL Server 2005 T-SQL Querying describes each of these steps in detail (though I believe that is actually Lubar’s chapter if memory serves correct)

    Like

  5. “the breakpoint will not currently be hit.unable to bind sql breakpoint at this time”.
    Can anybody help me to solve this error while try to debug
    SQL CLR procedure in VS2005.

    Like

  6. So looking at what it does, how it works and looking at Itzik’s poster in the link above and his other books that logical processing is described in. I would say the answer is “Yes”

    It would appear as though to do the top it has to know the Order By clause and it needs to understand there is an Order By, and then filter only the top number/percent of rows according to that.

    The actual ordering for display or passing on to the next query happens after the TOP filter is applied (even though it looked at the order by).

    One important thing to remember is that we are talking about the logical query processing here. The physical order of events is not necessarily (not, actually) the same always.

    Like

  7. Pingback: SQLAuthority News - Gandhinagar SQL Server User Group Meeting April 24, 2009 Journey to SQL Authority with Pinal Dave

  8. Left Join is Quite Simple !

    but turns out to be very dangerous
    if used incorrectly that is the query may take hours to execute and the final result contains lakhs of unwanted rows.

    Simple way to implement join
    on tables all having one field in common
    say ID,i.e. ID field links all tables to each other

    Sample simple query could be used for n number of tables and n number of joins

    select

    a.ID,
    a.field2,
    a.field3,
    b.field1,
    b.field2,
    c.field1,
    c.field2,
    d.field1,
    d.field2,
    d.field3,
    d.field4

    from

    (

    (select * from table1) as a

    left join

    (select * from table2) as b

    on a.ID=b.ID

    left join

    (select * from table3) as c

    on a.ID=c.ID

    left join

    (select * from table4) as d

    on a.ID=d.ID

    )

    I suggest never use left join !

    If have to , create index on field like ID in above sample
    on all tables to reduce query time

    Like

  9. The logical processing order that you repeat above from the March blog comments isn’t correct in all cases, despite its being repeated in books, classes, and blogs. It is useful for explaining to those new to SQL how one might envision the processing being accomplished, however, it is neither specified by any ANSI SQL standard, nor is it guaranteed by Microsoft, nor is it accurate with respect to Microsoft’s current or prior implementations. No ANSI SQL standard requires a specific processing order, but instead leaves that to the implementation. The implementation’s are therefore free to use that flexibility to optimize certain scenarios as they see fit and as long as the syntax and results conform to the standard. Even the ORDER BY isn’t as cut and dry as one might suspect, since the implementation can use the information it knows to be true about the data to best effect. As a result, it would be unwise to declare statements in such a way as to depend on a specific processing order.

    In the case of ORDER BY and TOP, Brian Tkatch gave you the only answer that you can hang your hat on; that is, that in the absence of an ORDER BY clause, TOP will return arbitrary rows. Therefore, you can consider that ORDER BY precedes TOP when ORDER BY is specified, however, that may not be what actually happens procedurally, just what apparently happens when the results come back. I can conceive of times where an optimizer could know enough about the data to select the TOP rows before even making a JOIN. Like gravity, Microsoft’s implementation is a black box to us: we all can make use of gravity everyday without having to know how it works. In this case, we are better off not knowing how it works, because we don’t want to be implementation-specific when we write our SQL. I can produce sample code that shows these differences if you’re interested.

    P.S. It would be easy for someone to read Brian Tkatch’s comment and think that an ORDER BY is not required in the case where you pull results from a single table with a clustered index and the cluster key order is the order you desire. The fact is that result ordering is never guaranteed in the absence of an ORDER BY clause. On a related note, rows are not physically order by cluster key. The clustered index is a structure that is built from the data pages.

    Like

  10. Brian,

    >> Can you give an example …uses the CLUSTERED INDEX but the ordering is otherwise?

    Something like this:
    CREATE TABLE A (Col varchar(900) NOT NULL, Data varchar(4) NULL
    , CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED ([Col] ASC)
    )
    GO
    INSERT INTO A (Col, Data) VALUES (‘2kakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnl1akfvbsaib83y34nlnmhsojhg’, ‘B’)
    INSERT INTO A (Col, Data) VALUES (‘1kakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnlkakfvbsaib83y34nlnmhsojhg’, ‘A’)
    GO
    CREATE NONCLUSTERED INDEX [NCIX_Data_DESC] ON [dbo].[A] ([Data] DESC)
    GO
    SELECT TOP 2 * FROM A

    The point is that when the BOL says that TOP returns arbitrary rows in the absence of ORDER BY, they are describing the expected behavior and not trying to give us insight into the implementation. The implementation, confronted with a clustered index without an ORDER BY, may typically return an ordered set; but we are not supposed to perceive it as an ordered set because there may be at least one codepath through the optimizer that doesn’t order as we would expect.

    pinaldave seems to have a knack for finding those code paths :)

    I can also give an example where step #2 ON is evaluated after step #8 SELECT or STEP #4 WHERE.

    Don’t rely on implementation whenever and wherever you can get away with it.

    Like

    • Bob,
      Its too late, I just read your conversation, but it might help some other reader. In your example, if you see, you created a non-clustered index. And the priority is, IF THE ONLY SELECTABLE COLUMN(s) IS AVAILABLE IN AN INDEX, and Order by is required, then instead of selecting from table, Sql Server prefers reading from Index. As it generates less I/O.

      Create a table with two columns, A and B, Add a non-clustered Index on A, Now select
      SELECT A
      WHERE A BETWEEN (2500 AND 3000)

      Now Create the same table with a Clustered Index and select
      SELECT A
      WHERE A BETWEEN (2500 AND 3000)

      Compare the I/O of two queries. Query two will have much higher I/O than Query 1. Now change the query to

      SELECT A, B
      WHERE A BETWEEN (2500 AND 3000)

      This time I/O will remain same for clustered Index, but much higher for non-clustered Index.

      This depicts that Sql Server uses non-clustered Index if operations do not require any other column.

      Since you created only one column. Add one more column, and select both of them and then tell the difference.

      I hope this will clear the understanding (though after 4 years)

      Like

  11. Brian,

    I just reread the challenge. To have the query use the clustered index and not return cluster key ordered rows?

    No, I don’t think I’ll attempt that.

    But the query above shows that just having a clustered key present doesn’t mean that the TOP results will use that key. In the case above the Non-clustered index is used even w/o an ORDER BY.

    Like

  12. @Bob

    “just having a clustered key present doesn’t mean that the TOP results will use that key.”

    That’s something the help file states.

    But if the CLUSTERED INDEX is used, that is “probably” the order. According to Denny Cherry “If you don’t have an ORDER BY, then the SQL Server will probably sort by the clustered index and use it for filtering. If there is no clustered index it would then fall back to using the table itself and return the data in the order it comes across it.” From: http://itknowledgeexchange.techtarget.com/sql-server/back-to-basics-whats-the-difference-between-a-scan-and-a-seek/

    So, technically it doesn’t have to use it. And sure, you probably shouldn’t rely on it. But the fact remains, it is used. And blaming bugs in the code on “techncially”s is silly. Unless it really is the case, but my gut feeling is otherwise.

    Like

  13. Pingback: SQL SERVER – Download Logical Query Processing Poster Journey to SQL Authority with Pinal Dave

  14. I just attended Itzik’s session on this subject.

    One of the best!

    He also discussed that when TOP is there ORDER BY is executed along with it. If it does not exist, ORDER BY is executed separate.

    More detail I will post soon.

    Great session.

    Kind Regards,
    Pinal

    Like

  15. Pingback: SQL SERVER – An Important Part of Most SELECT statement – WHERE clause – Quiz – Puzzle – 4 of 31 « SQL Server Journey with SQL Authority

  16. thanks for this best post… can you please tell me which comes first in this two things

    the increment(Identity) is incremented first
    or the constraints are checked first
    As I know the answer that increments comes first but I need a reason for it..
    because if the increment is done first and then constraints are checked if constraints are not matched then still the identity is increased which is wrong according to me..
    but why these happens please reply as soon as possible

    Like

  17. declare @x table (i int)
    insert into @x(i) values(1),(5),(10),(2)
    select top(2) i as Z from @x
    SELECT * FROM (select top(2) i as Z from @x) y order by y.Z

    Why it applies order by before the top?:)

    Like

  18. Pingback: SQL SERVER – UNION ALL and ORDER BY – How to Order Table Separately While Using UNION ALL « SQL Server Journey with SQL Authority

  19. Hi Pinal,

    I would like to know what the sub query order. Please help me out about this.

    Example Query:

    Select * from Dept where DeptNo in (Select DeptNo from Emp)

    I would like know how the above query will execute.

    As of my knowledge, The above query will execute in the following Order:
    1. Main Query From Clause
    2. main Query Where Clause
    3. Sub Query From Clause
    4. Sub Query Where Clause
    5. Sub Query Select
    6. Main Query Select

    Please Let me know my thought process is correct or not.

    Thanks in advance.

    Regards,
    Sekhar Reddy

    Like

  20. CREATE TABLE #Temp (
    ID INT IDENTITY(1, 1)
    ,Ename VARCHAR(1000)
    ,Sal VARCHAR(100)
    ,DepAvgSal VARCHAR(100)
    )

    Insert into #Temp(Ename,Sal,DepAvgSal)
    select ‘Nikhil’,10,20 union
    select ‘Siva’,20,20 union
    select ‘Sandeep’,’NA’,20

    ALTER TABLE #Temp
    ADD Emsg Varchar(2000)

    –Process1:

    Update #Temp
    set Emsg=’Error’
    where isnumeric(Sal)1

    select * from #Temp
    where
    cast(Sal as float)<cast(DepAvgSal as float)
    and isnull(Emsg,'')=''

    –Process2:

    select * from #Temp
    where
    cast(Sal as float)<cast(DepAvgSal as float)
    and Emsg is null

    OR

    select * from #Temp
    where
    cast(Sal as float)<cast(DepAvgSal as float)
    and isnumeric(Sal)=1

    Drop Table #Temp

    –My Question: Why we are getting "Error converting data type varchar to float." Error in Process1 why we are not getting in process2.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s