SQL SERVER – Simple Puzzle with UNION – Part 2

September 6, 2013

SQL, SQL Puzzle, SQL Server, SQL Tips and Tricks

Yesterday we had very easy kind of Back to Basics Puzzle with UNION and I have received tremendous response to the simple puzzle. Even though there is no giveaway due to sheer interest in the subject, I have received many replies. Due to all the request, here is another back to the basic question with UNION again.

Let us execute following three query one by one. Please make sure to enable Execution Plan in SQL Server Management Studio (SSMS).

Query 1

SELECT 1
UNION ALL
SELECT 2

The query above will return following result

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-1

The query above will return following execution plan

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-2

Query 2

SELECT 1
UNION ALL
SELECT 2
ORDER BY 1

The query above will return following result

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-1

The query above will return following execution plan

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-3

Query 3

SELECT DISTINCT 1
UNION ALL
SELECT DISTINCT 2
ORDER BY 1

The query above will return following result

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-1

The query above will return following execution plan

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-4

Now let us look at all the execution plans together.

SQL SERVER - Simple Puzzle with UNION - Part 2 puzzunion-5

When you look at closely at all the resultset – they all returns the same result. When we see their execution plan they are very different from each other. Now here is the question back to you.

Question: When we add DISTINCT in Query 3 it is technically more work for SQL Server to do than Query 2. However, the execution plan demonstrates that Query 3 is using much lesser resources than Query 2. WHY?

Please leave your answer in the comment section. I will publish all the valid answer in the blog next week with due credit.

Do not miss to checkout the part 1 of this puzzle.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Union clause, Union

SQL SERVER – Simple Puzzle with UNION

SQL SERVER – Weekly Series – Memory Lane – #045

28 Comments. Leave new

pal_neeraj2003@rediffmail.com
September 6, 2013 9:58 am
Filter Condition is increased query 3 rather than 2. So relatively cost will decrease.
Reply
neeraj pal
September 6, 2013 10:00 am
Filter Condition is increased query 3 rather than 2. So cost is low.
Reply
Shan
September 6, 2013 10:52 am
Hi Pinal,
Query 2 : In this case all the select statements are combined to form a table and a constant scan goes for entire table,followed by sorting.
Query 3: while in this case each select statement itself goes for constant scan and then combined to form a table .So in this case entire constant scan is avoided.
So the query 3 is using lesser resource than Query 2.
Reply
Sanjay Monpara
September 6, 2013 11:23 am
Pinal, I am getting execution plan for query 3 same as query 2 and 50% query cost in both, why?
Reply
- Lakshman
  September 6, 2013 2:53 pm
  Sanjay,
  When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.
  I got this info. from James Curran.
  Reply
  - Sanjay Monpara
    September 6, 2013 5:14 pm
    Yes Lakshman
    But I am getting 0/50/50 for all 3 queries together
  - Justin Whitehead
    September 14, 2013 12:00 am
    I am also getting 0/50/50. The displayed execution plan in the article doesn’t make sense. How can there not be a Sort for query 3? How would it know the order in which to display the results given the request of “ORDER BY 1”?
Sanjay Monpara
September 6, 2013 11:25 am
Using your execution plan figure we can say that no need of sorting in query 3 thats why its using less resources than Query 2.
Reply
ub04
September 6, 2013 11:57 am
Query 2 lists all available records including duplicates which could increase the output list size and the sorting time while the presence of distinct in query 3 decreases the length of output list and helps to minimize sorting time. I think distinct is taking less time than sort because it has to work on less data while sort has to arrange the output of both part of the query.
Reply
James Curran
September 6, 2013 11:58 am
The distinct causes the two sets to be sorted as a side-effect. Since the sets are one row each in this case, that sorting is really fast. Then taking two sorted lists (even sets with one than one row), and making one big sorted list out of them requires just merging them — going through them linearly, and taking whichever is the next lowest.
Reply
Lakshman
September 6, 2013 12:07 pm
Hi, If you execute all the three queries together in a single session, then 100%,67%,33% cost comes. If you execute the 3rd query separately, 100% cost comes. That means when executing in the same session, the query uses the previously executed plan for the current execution. Correct if my answer is wrong.
Reply
- James Curran
  September 6, 2013 12:59 pm
  No. When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.
  Reply
  - Lakshman
    September 6, 2013 2:51 pm
    Thank you James for correcting my answer.
kunal rawal
September 6, 2013 12:10 pm
Hey Pinal,
Query 3 uses the Merge Join operator while Query 2 uses the sort operator.
A merge join operator is used on the two tables whose output columns have been presorted.
The sort operator uses the concept of looping to get its output which requires an extra thread to perform what we call ‘Rebinding’. This increases the I/O Cost drastically. Hence the difference.
Regards,
Kunal Rawal
Reply
Deepak Khandelwal
September 7, 2013 10:19 am
Hey Pinal,
Here, query 3 taking lower cost than 2 as there is only 1 row for selection. Sorting individual rows in q3 is not taking any cost. Now, q3 uses merge and q2 uses sorting for ordering the result and merging is faster than sorting as merge is being applied on presorted lists.
If we select from a un-indexed table with 10 rows. then results are 13/30/57 (my sample), which are consistent. Since, q3 includes sorting individual results first so q3 cost is high.
Reply
Alok Kumar
September 11, 2013 12:22 pm
Due to Filter condition cost increase in query2…………
Reply
Suraj
September 11, 2013 6:21 pm
Query (3) , using Distinct so its already sorting the records, thats why its showing the lowest query cost
I have checked one more scenario, in this both query is showing the same cost, ie 50% each
—inserting records into a temporary table
SELECT 1 ID INTO #T1
UNION ALL
SELECT 2
SELECT ID FROM #T1 ORDER BY ID
SELECT DISTINCT ID FROM #T1 ORDER BY ID
Reply
Robert Horvick
September 11, 2013 7:05 pm
I believe it is because query 2 uses a sort while query 3 uses the merge join operator “exploiting their sort order”.
I found it a little surprising that this uses two merge joins and not a sort ([1], [2]) and a merge join ([1,2], [3]).
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT distinct 3
ORDER BY 1
Reply
AM
September 12, 2013 2:16 am
I have to guess that there is a big difference between the Sort operator and the Merge Join operator. Looking at the properties of both I see an IO cost and a rebind associated with the sort operator that does not exist for the Merge Join. I have to guess that the Sort is internally doing some kind of loop that requires IO.
Reply
arsath
September 13, 2013 4:26 pm
I want consolidate of numbers in SQL …For an example I having number like
1,2,3,4,.6.7.8.9,12,13,14,16,18,20
then I want the result as
1-4,6-9,12-14,16,18,20
How to achieve it please help me
Reply
- Sanjay Monpara
  September 16, 2013 11:21 am
  –>–First of all create one table valued function & use it in code block…
  create FUNCTION [dbo].[Split] (
  @string AS VARCHAR(8000),
  @splitAt AS VARCHAR(8000)
  )
  RETURNS @strings TABLE (Strings VARCHAR(8000))
  AS
  BEGIN
  DECLARE @splitLen AS INT
  DECLARE @index AS INT
  SET @splitLen = LEN(@splitAt)
  — SET @string = LTRIM(RTRIM(@string))
  SET @splitAt = ‘%’ + @splitAt + ‘%’
  SET @index = PATINDEX(@splitAt, @string)
  WHILE(@index > 0)
  BEGIN
  INSERT INTO @strings
  VALUES(SUBSTRING(@string, 1, @index-1))
  SET @string = SUBSTRING(@string, @index + 1, LEN(@string))
  SET @index = PATINDEX(@splitAt, @string)
  END
  IF LEN(@string) > 0
  INSERT INTO @strings VALUES(@string)
  RETURN
  END
  –=======================================================================
  DECLARE @string nvarchar(max)
  DECLARE @Input nvarchar(max)
  DECLARE @old int
  SET @Input = ‘1,2,3,4,6,7,8,9,12,13,14,16,18,20’
  SET @string = ”
  SELECT
  @string = @string + case when Strings = @old + 1 THEN ”
  when @string = ” then Strings
  ELSE ‘-‘ + cast(@old as nvarchar) + ‘,’ + Strings
  end,
  @old = Strings
  FROM dbo.split(@Input,’,’)
  select @string as result
  Reply
ADITHYA RAPARTHI
September 18, 2013 11:30 am
HI PINAL,
IN the SECOND query,”SORT” operation takes place and we get estimated I/O cost is “0.0112613”
IN the THIRD query,”MERGE JOIN” operation takes place i.e.., it takes two already sorted input tables exploitting the sort order.and we get estimated I/O cost is “0”
in other word ,in the second query both physical and logical operations are “SORTING”
whereas,IN the third query physical operation=MERGE JOIN(takes already sorted input table) and logical operation=CONCATINATION
Reply
Chandrika
September 18, 2013 2:57 pm
Hi Pinal,
Because of explicit SORT operation on final result set causes second query more expensive … Whereas 3rd query doesn’t require any SORT operation externally…
Note that “Merge join itself is very fast, but it can be an expensive choice if sort operations are required.” means if there is no sort operation required then MERGE JOIN is very fast…
The above two are the reasons to have less cost for 2nd query
Reply
Manoj Bhole
September 19, 2013 11:54 am
Hi Pinal
Query 2 : it forms 2 different tables and scans these 2 tables to combine result set.
Query 3 : it forms 1 table and scans 1 table to form result set.
Reply
Hitesh Shah
September 21, 2013 5:28 pm
If I look at the plan carefully, the Query 2 uses the Sort operation and the Query 3 uses the Merge Join.
There is an extra cost on performing Sort operation as it does sorting at both Physical and Logical operations.
The Merger Join operator matches the rows from two sorted input tables thus at physical level its merging and logical level its concatenating the operation thus exploiting the sort order.
Reply
Pooja
September 24, 2013 1:57 pm
Hi Pinal,
The Query Cost is 100% at my end for both the queries.
Reply