SQL SERVER – Simple Puzzle with UNION – Part 2

Yesterday we had very easy kind of Back to Basics Puzzle with UNION and I have received tremendous response to the simple puzzle. Even though there is no giveaway due to sheer interest in the subject, I have received many replies. Due to all the request, here is another back to the basic question with UNION again.

Let us execute following three query one by one. Please make sure to enable Execution Plan in SQL Server Management Studio (SSMS).

Query 1

SELECT 1
UNION ALL
SELECT 2

The query above will return following result

The query above will return following execution plan

Query 2

SELECT 1
UNION ALL
SELECT 2
ORDER BY 1

The query above will return following result

The query above will return following execution plan

Query 3

SELECT DISTINCT 1
UNION ALL
SELECT DISTINCT 2
ORDER BY 1

The query above will return following result

The query above will return following execution plan

Now let us look at all the execution plans together.

When you look at closely at all the resultset – they all returns the same result. When we see their execution plan they are very different from each other. Now here is the question back to you.

Question: When we add DISTINCT in Query 3 it is technically more work for SQL Server to do than Query 2. However, the execution plan demonstrates that Query 3 is using much lesser resources than Query 2. WHY?

Please leave your answer in the comment section. I will publish all the valid answer in the blog next week with due credit.

Do not miss to checkout the part 1 of this puzzle.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

32 thoughts on “SQL SERVER – Simple Puzzle with UNION – Part 2

  1. Hi Pinal,

    Query 2 : In this case all the select statements are combined to form a table and a constant scan goes for entire table,followed by sorting.

    Query 3: while in this case each select statement itself goes for constant scan and then combined to form a table .So in this case entire constant scan is avoided.

    So the query 3 is using lesser resource than Query 2.

    • Sanjay,
      When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.

      I got this info. from James Curran.

  2. Using your execution plan figure we can say that no need of sorting in query 3 thats why its using less resources than Query 2.

  3. Query 2 lists all available records including duplicates which could increase the output list size and the sorting time while the presence of distinct in query 3 decreases the length of output list and helps to minimize sorting time. I think distinct is taking less time than sort because it has to work on less data while sort has to arrange the output of both part of the query.

  4. The distinct causes the two sets to be sorted as a side-effect. Since the sets are one row each in this case, that sorting is really fast. Then taking two sorted lists (even sets with one than one row), and making one big sorted list out of them requires just merging them — going through them linearly, and taking whichever is the next lowest.

  5. Hi, If you execute all the three queries together in a single session, then 100%,67%,33% cost comes. If you execute the 3rd query separately, 100% cost comes. That means when executing in the same session, the query uses the previously executed plan for the current execution. Correct if my answer is wrong.

    • No. When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.

  6. Hey Pinal,

    Query 3 uses the Merge Join operator while Query 2 uses the sort operator.
    A merge join operator is used on the two tables whose output columns have been presorted.
    The sort operator uses the concept of looping to get its output which requires an extra thread to perform what we call ‘Rebinding’. This increases the I/O Cost drastically. Hence the difference.

    Regards,
    Kunal Rawal

  7. Hey Pinal,

    Here, query 3 taking lower cost than 2 as there is only 1 row for selection. Sorting individual rows in q3 is not taking any cost. Now, q3 uses merge and q2 uses sorting for ordering the result and merging is faster than sorting as merge is being applied on presorted lists.

    If we select from a un-indexed table with 10 rows. then results are 13/30/57 (my sample), which are consistent. Since, q3 includes sorting individual results first so q3 cost is high.

  8. Pingback: SQL SERVER – Simple Puzzle with UNION – Part 3 | Journey to SQL Authority with Pinal Dave

  9. Pingback: SQL SERVER – Simple Puzzle with UNION – Part 4 | Journey to SQL Authority with Pinal Dave

  10. Pingback: SQL SERVER – Simple Puzzle with UNION – Part 5 | Journey to SQL Authority with Pinal Dave

  11. Query (3) , using Distinct so its already sorting the records, thats why its showing the lowest query cost

    I have checked one more scenario, in this both query is showing the same cost, ie 50% each

    —inserting records into a temporary table
    SELECT 1 ID INTO #T1
    UNION ALL
    SELECT 2

    SELECT ID FROM #T1 ORDER BY ID
    SELECT DISTINCT ID FROM #T1 ORDER BY ID

  12. I believe it is because query 2 uses a sort while query 3 uses the merge join operator “exploiting their sort order”.

    I found it a little surprising that this uses two merge joins and not a sort ([1], [2]) and a merge join ([1,2], [3]).

    SELECT 1
    UNION ALL
    SELECT 2
    UNION ALL
    SELECT distinct 3
    ORDER BY 1

  13. I have to guess that there is a big difference between the Sort operator and the Merge Join operator. Looking at the properties of both I see an IO cost and a rebind associated with the sort operator that does not exist for the Merge Join. I have to guess that the Sort is internally doing some kind of loop that requires IO.

  14. I want consolidate of numbers in SQL …For an example I having number like
    1,2,3,4,.6.7.8.9,12,13,14,16,18,20
    then I want the result as
    1-4,6-9,12-14,16,18,20
    How to achieve it please help me

    • –>–First of all create one table valued function & use it in code block…
      create FUNCTION [dbo].[Split] (
      @string AS VARCHAR(8000),
      @splitAt AS VARCHAR(8000)
      )
      RETURNS @strings TABLE (Strings VARCHAR(8000))
      AS
      BEGIN
      DECLARE @splitLen AS INT
      DECLARE @index AS INT

      SET @splitLen = LEN(@splitAt)

      — SET @string = LTRIM(RTRIM(@string))
      SET @splitAt = ‘%’ + @splitAt + ‘%’
      SET @index = PATINDEX(@splitAt, @string)

      WHILE(@index > 0)
      BEGIN
      INSERT INTO @strings
      VALUES(SUBSTRING(@string, 1, @index-1))

      SET @string = SUBSTRING(@string, @index + 1, LEN(@string))
      SET @index = PATINDEX(@splitAt, @string)
      END

      IF LEN(@string) > 0
      INSERT INTO @strings VALUES(@string)

      RETURN
      END
      –=======================================================================
      DECLARE @string nvarchar(max)
      DECLARE @Input nvarchar(max)
      DECLARE @old int

      SET @Input = ‘1,2,3,4,6,7,8,9,12,13,14,16,18,20’
      SET @string = ”

      SELECT
      @string = @string + case when Strings = @old + 1 THEN ”
      when @string = ” then Strings
      ELSE ‘-‘ + cast(@old as nvarchar) + ‘,’ + Strings
      end,
      @old = Strings
      FROM dbo.split(@Input,’,’)

      select @string as result

  15. Pingback: SQL SERVER – Five Puzzles around UNION – Participate in All Five | Journey to SQL Authority with Pinal Dave

  16. HI PINAL,
    IN the SECOND query,”SORT” operation takes place and we get estimated I/O cost is “0.0112613”
    IN the THIRD query,”MERGE JOIN” operation takes place i.e.., it takes two already sorted input tables exploitting the sort order.and we get estimated I/O cost is “0”

    in other word ,in the second query both physical and logical operations are “SORTING”
    whereas,IN the third query physical operation=MERGE JOIN(takes already sorted input table) and logical operation=CONCATINATION

  17. Hi Pinal,
    Because of explicit SORT operation on final result set causes second query more expensive … Whereas 3rd query doesn’t require any SORT operation externally…
    Note that “Merge join itself is very fast, but it can be an expensive choice if sort operations are required.” means if there is no sort operation required then MERGE JOIN is very fast…
    The above two are the reasons to have less cost for 2nd query

  18. If I look at the plan carefully, the Query 2 uses the Sort operation and the Query 3 uses the Merge Join.
    There is an extra cost on performing Sort operation as it does sorting at both Physical and Logical operations.
    The Merger Join operator matches the rows from two sorted input tables thus at physical level its merging and logical level its concatenating the operation thus exploiting the sort order.

  19. MERGE JOIN internally use SORTING technique while merging the input data. Also DISTINCT itself sorts the data internally, so SORT operation time is reduced here.

  20. HI PINAL,
    the SECOND query,”SORT” operation takes place,
    hence the sorting took place in tempdb and we got the
    estimated I/O cost which is greater than 0 this make
    the query use little more resource than other query

    In the THIRD query the select statement is already sorted with distinct operator
    and then the sorted result set is merged using merge JOIN concatenation,
    since there is no any writes or reads required here so this query take little
    amount of resource

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s