SQL SERVER – Simple Puzzle with UNION – Part 2

Yesterday we had very easy kind of Back to Basics Puzzle with UNION and I have received tremendous response to the simple puzzle. Even though there is no giveaway due to sheer interest in the subject, I have received many replies. Due to all the request, here is another back to the basic question with UNION again.

Let us execute following three query one by one. Please make sure to enable Execution Plan in SQL Server Management Studio (SSMS).

Query 1

SELECT 1
UNION ALL
SELECT 2

The query above will return following result

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-1

The query above will return following execution plan

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-2

Query 2

SELECT 1
UNION ALL
SELECT 2
ORDER BY 1

The query above will return following result

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-1

The query above will return following execution plan

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-3

Query 3

SELECT DISTINCT 1
UNION ALL
SELECT DISTINCT 2
ORDER BY 1

The query above will return following result

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-1

The query above will return following execution plan

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-4

Now let us look at all the execution plans together.

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-5

When you look at closely at all the resultset – they all returns the same result. When we see their execution plan they are very different from each other. Now here is the question back to you.

Question: When we add DISTINCT in Query 3 it is technically more work for SQL Server to do than Query 2. However, the execution plan demonstrates that Query 3 is using much lesser resources than Query 2. WHY?

Please leave your answer in the comment section. I will publish all the valid answer in the blog next week with due credit.

Do not miss to checkout the part 1 of this puzzle.

Reference: Pinal Dave (https://blog.sqlauthority.com)

,
Previous Post
SQL SERVER – Simple Puzzle with UNION
Next Post
SQL SERVER – Weekly Series – Memory Lane – #045

Related Posts

28 Comments. Leave new

  • pal_neeraj2003@rediffmail.com
    September 6, 2013 9:58 am

    Filter Condition is increased query 3 rather than 2. So relatively cost will decrease.

    Reply
  • Filter Condition is increased query 3 rather than 2. So cost is low.

    Reply
  • Hi Pinal,

    Query 2 : In this case all the select statements are combined to form a table and a constant scan goes for entire table,followed by sorting.

    Query 3: while in this case each select statement itself goes for constant scan and then combined to form a table .So in this case entire constant scan is avoided.

    So the query 3 is using lesser resource than Query 2.

    Reply
  • Pinal, I am getting execution plan for query 3 same as query 2 and 50% query cost in both, why?

    Reply
    • Sanjay,
      When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.

      I got this info. from James Curran.

      Reply
      • Yes Lakshman
        But I am getting 0/50/50 for all 3 queries together

      • Justin Whitehead
        September 14, 2013 12:00 am

        I am also getting 0/50/50. The displayed execution plan in the article doesn’t make sense. How can there not be a Sort for query 3? How would it know the order in which to display the results given the request of “ORDER BY 1”?

  • Using your execution plan figure we can say that no need of sorting in query 3 thats why its using less resources than Query 2.

    Reply
  • Query 2 lists all available records including duplicates which could increase the output list size and the sorting time while the presence of distinct in query 3 decreases the length of output list and helps to minimize sorting time. I think distinct is taking less time than sort because it has to work on less data while sort has to arrange the output of both part of the query.

    Reply
  • The distinct causes the two sets to be sorted as a side-effect. Since the sets are one row each in this case, that sorting is really fast. Then taking two sorted lists (even sets with one than one row), and making one big sorted list out of them requires just merging them — going through them linearly, and taking whichever is the next lowest.

    Reply
  • Hi, If you execute all the three queries together in a single session, then 100%,67%,33% cost comes. If you execute the 3rd query separately, 100% cost comes. That means when executing in the same session, the query uses the previously executed plan for the current execution. Correct if my answer is wrong.

    Reply
    • No. When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.

      Reply
  • Hey Pinal,

    Query 3 uses the Merge Join operator while Query 2 uses the sort operator.
    A merge join operator is used on the two tables whose output columns have been presorted.
    The sort operator uses the concept of looping to get its output which requires an extra thread to perform what we call ‘Rebinding’. This increases the I/O Cost drastically. Hence the difference.

    Regards,
    Kunal Rawal

    Reply
  • Deepak Khandelwal
    September 7, 2013 10:19 am

    Hey Pinal,

    Here, query 3 taking lower cost than 2 as there is only 1 row for selection. Sorting individual rows in q3 is not taking any cost. Now, q3 uses merge and q2 uses sorting for ordering the result and merging is faster than sorting as merge is being applied on presorted lists.

    If we select from a un-indexed table with 10 rows. then results are 13/30/57 (my sample), which are consistent. Since, q3 includes sorting individual results first so q3 cost is high.

    Reply
  • Due to Filter condition cost increase in query2…………

    Reply
  • Query (3) , using Distinct so its already sorting the records, thats why its showing the lowest query cost

    I have checked one more scenario, in this both query is showing the same cost, ie 50% each

    —inserting records into a temporary table
    SELECT 1 ID INTO #T1
    UNION ALL
    SELECT 2

    SELECT ID FROM #T1 ORDER BY ID
    SELECT DISTINCT ID FROM #T1 ORDER BY ID

    Reply
  • I believe it is because query 2 uses a sort while query 3 uses the merge join operator “exploiting their sort order”.

    I found it a little surprising that this uses two merge joins and not a sort ([1], [2]) and a merge join ([1,2], [3]).

    SELECT 1
    UNION ALL
    SELECT 2
    UNION ALL
    SELECT distinct 3
    ORDER BY 1

    Reply
  • I have to guess that there is a big difference between the Sort operator and the Merge Join operator. Looking at the properties of both I see an IO cost and a rebind associated with the sort operator that does not exist for the Merge Join. I have to guess that the Sort is internally doing some kind of loop that requires IO.

    Reply
  • I want consolidate of numbers in SQL …For an example I having number like
    1,2,3,4,.6.7.8.9,12,13,14,16,18,20
    then I want the result as
    1-4,6-9,12-14,16,18,20
    How to achieve it please help me

    Reply
    • –>–First of all create one table valued function & use it in code block…
      create FUNCTION [dbo].[Split] (
      @string AS VARCHAR(8000),
      @splitAt AS VARCHAR(8000)
      )
      RETURNS @strings TABLE (Strings VARCHAR(8000))
      AS
      BEGIN
      DECLARE @splitLen AS INT
      DECLARE @index AS INT

      SET @splitLen = LEN(@splitAt)

      — SET @string = LTRIM(RTRIM(@string))
      SET @splitAt = ‘%’ + @splitAt + ‘%’
      SET @index = PATINDEX(@splitAt, @string)

      WHILE(@index > 0)
      BEGIN
      INSERT INTO @strings
      VALUES(SUBSTRING(@string, 1, @index-1))

      SET @string = SUBSTRING(@string, @index + 1, LEN(@string))
      SET @index = PATINDEX(@splitAt, @string)
      END

      IF LEN(@string) > 0
      INSERT INTO @strings VALUES(@string)

      RETURN
      END
      –=======================================================================
      DECLARE @string nvarchar(max)
      DECLARE @Input nvarchar(max)
      DECLARE @old int

      SET @Input = ‘1,2,3,4,6,7,8,9,12,13,14,16,18,20’
      SET @string = ”

      SELECT
      @string = @string + case when Strings = @old + 1 THEN ”
      when @string = ” then Strings
      ELSE ‘-‘ + cast(@old as nvarchar) + ‘,’ + Strings
      end,
      @old = Strings
      FROM dbo.split(@Input,’,’)

      select @string as result

      Reply
  • ADITHYA RAPARTHI
    September 18, 2013 11:30 am

    HI PINAL,
    IN the SECOND query,”SORT” operation takes place and we get estimated I/O cost is “0.0112613”
    IN the THIRD query,”MERGE JOIN” operation takes place i.e.., it takes two already sorted input tables exploitting the sort order.and we get estimated I/O cost is “0”

    in other word ,in the second query both physical and logical operations are “SORTING”
    whereas,IN the third query physical operation=MERGE JOIN(takes already sorted input table) and logical operation=CONCATINATION

    Reply
  • Hi Pinal,
    Because of explicit SORT operation on final result set causes second query more expensive … Whereas 3rd query doesn’t require any SORT operation externally…
    Note that “Merge join itself is very fast, but it can be an expensive choice if sort operations are required.” means if there is no sort operation required then MERGE JOIN is very fast…
    The above two are the reasons to have less cost for 2nd query

    Reply
  • Hi Pinal

    Query 2 : it forms 2 different tables and scans these 2 tables to combine result set.
    Query 3 : it forms 1 table and scans 1 table to form result set.

    Reply
  • If I look at the plan carefully, the Query 2 uses the Sort operation and the Query 3 uses the Merge Join.
    There is an extra cost on performing Sort operation as it does sorting at both Physical and Logical operations.
    The Merger Join operator matches the rows from two sorted input tables thus at physical level its merging and logical level its concatenating the operation thus exploiting the sort order.

    Reply
  • Hi Pinal,
    The Query Cost is 100% at my end for both the queries.

    Reply

Leave a Reply