Yesterday we had very easy kind of Back to Basics Puzzle with UNION and I have received tremendous response to the simple puzzle. Even though there is no giveaway due to sheer interest in the subject, I have received many replies. Due to all the request, here is another back to the basic question with UNION again.
Let us execute following three query one by one. Please make sure to enable Execution Plan in SQL Server Management Studio (SSMS).
Query 1
SELECT 1
UNION ALL
SELECT 2
The query above will return following result
The query above will return following execution plan
Query 2
SELECT 1
UNION ALL
SELECT 2
ORDER BY 1
The query above will return following result
The query above will return following execution plan
Query 3
SELECT DISTINCT 1
UNION ALL
SELECT DISTINCT 2
ORDER BY 1
The query above will return following result
The query above will return following execution plan
Now let us look at all the execution plans together.
When you look at closely at all the resultset – they all returns the same result. When we see their execution plan they are very different from each other. Now here is the question back to you.
Question: When we add DISTINCT in Query 3 it is technically more work for SQL Server to do than Query 2. However, the execution plan demonstrates that Query 3 is using much lesser resources than Query 2. WHY?
Please leave your answer in the comment section. I will publish all the valid answer in the blog next week with due credit.
Do not miss to checkout the part 1 of this puzzle.
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
28 Comments. Leave new
Filter Condition is increased query 3 rather than 2. So relatively cost will decrease.
Filter Condition is increased query 3 rather than 2. So cost is low.
Hi Pinal,
Query 2 : In this case all the select statements are combined to form a table and a constant scan goes for entire table,followed by sorting.
Query 3: while in this case each select statement itself goes for constant scan and then combined to form a table .So in this case entire constant scan is avoided.
So the query 3 is using lesser resource than Query 2.
Pinal, I am getting execution plan for query 3 same as query 2 and 50% query cost in both, why?
Sanjay,
When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.
I got this info. from James Curran.
Yes Lakshman
But I am getting 0/50/50 for all 3 queries together
I am also getting 0/50/50. The displayed execution plan in the article doesn’t make sense. How can there not be a Sort for query 3? How would it know the order in which to display the results given the request of “ORDER BY 1”?
Using your execution plan figure we can say that no need of sorting in query 3 thats why its using less resources than Query 2.
Query 2 lists all available records including duplicates which could increase the output list size and the sorting time while the presence of distinct in query 3 decreases the length of output list and helps to minimize sorting time. I think distinct is taking less time than sort because it has to work on less data while sort has to arrange the output of both part of the query.
The distinct causes the two sets to be sorted as a side-effect. Since the sets are one row each in this case, that sorting is really fast. Then taking two sorted lists (even sets with one than one row), and making one big sorted list out of them requires just merging them — going through them linearly, and taking whichever is the next lowest.
Hi, If you execute all the three queries together in a single session, then 100%,67%,33% cost comes. If you execute the 3rd query separately, 100% cost comes. That means when executing in the same session, the query uses the previously executed plan for the current execution. Correct if my answer is wrong.
No. When executing all three together, the breakdown is 0/67/33 — the numbers will always total 100%, regardless of the number of queries. The query cost percentage can only show the time in relationship to the other queries in the batch. Hence, when a query is run by itself, it’s query cost will always be 100%.
Thank you James for correcting my answer.
Hey Pinal,
Query 3 uses the Merge Join operator while Query 2 uses the sort operator.
A merge join operator is used on the two tables whose output columns have been presorted.
The sort operator uses the concept of looping to get its output which requires an extra thread to perform what we call ‘Rebinding’. This increases the I/O Cost drastically. Hence the difference.
Regards,
Kunal Rawal
Hey Pinal,
Here, query 3 taking lower cost than 2 as there is only 1 row for selection. Sorting individual rows in q3 is not taking any cost. Now, q3 uses merge and q2 uses sorting for ordering the result and merging is faster than sorting as merge is being applied on presorted lists.
If we select from a un-indexed table with 10 rows. then results are 13/30/57 (my sample), which are consistent. Since, q3 includes sorting individual results first so q3 cost is high.
Due to Filter condition cost increase in query2…………
Query (3) , using Distinct so its already sorting the records, thats why its showing the lowest query cost
I have checked one more scenario, in this both query is showing the same cost, ie 50% each
—inserting records into a temporary table
SELECT 1 ID INTO #T1
UNION ALL
SELECT 2
SELECT ID FROM #T1 ORDER BY ID
SELECT DISTINCT ID FROM #T1 ORDER BY ID
I believe it is because query 2 uses a sort while query 3 uses the merge join operator “exploiting their sort order”.
I found it a little surprising that this uses two merge joins and not a sort ([1], [2]) and a merge join ([1,2], [3]).
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT distinct 3
ORDER BY 1
I have to guess that there is a big difference between the Sort operator and the Merge Join operator. Looking at the properties of both I see an IO cost and a rebind associated with the sort operator that does not exist for the Merge Join. I have to guess that the Sort is internally doing some kind of loop that requires IO.
I want consolidate of numbers in SQL …For an example I having number like
1,2,3,4,.6.7.8.9,12,13,14,16,18,20
then I want the result as
1-4,6-9,12-14,16,18,20
How to achieve it please help me
–>–First of all create one table valued function & use it in code block…
create FUNCTION [dbo].[Split] (
@string AS VARCHAR(8000),
@splitAt AS VARCHAR(8000)
)
RETURNS @strings TABLE (Strings VARCHAR(8000))
AS
BEGIN
DECLARE @splitLen AS INT
DECLARE @index AS INT
SET @splitLen = LEN(@splitAt)
— SET @string = LTRIM(RTRIM(@string))
SET @splitAt = ‘%’ + @splitAt + ‘%’
SET @index = PATINDEX(@splitAt, @string)
WHILE(@index > 0)
BEGIN
INSERT INTO @strings
VALUES(SUBSTRING(@string, 1, @index-1))
SET @string = SUBSTRING(@string, @index + 1, LEN(@string))
SET @index = PATINDEX(@splitAt, @string)
END
IF LEN(@string) > 0
INSERT INTO @strings VALUES(@string)
RETURN
END
–=======================================================================
DECLARE @string nvarchar(max)
DECLARE @Input nvarchar(max)
DECLARE @old int
SET @Input = ‘1,2,3,4,6,7,8,9,12,13,14,16,18,20’
SET @string = ”
SELECT
@string = @string + case when Strings = @old + 1 THEN ”
when @string = ” then Strings
ELSE ‘-‘ + cast(@old as nvarchar) + ‘,’ + Strings
end,
@old = Strings
FROM dbo.split(@Input,’,’)
select @string as result
HI PINAL,
IN the SECOND query,”SORT” operation takes place and we get estimated I/O cost is “0.0112613”
IN the THIRD query,”MERGE JOIN” operation takes place i.e.., it takes two already sorted input tables exploitting the sort order.and we get estimated I/O cost is “0”
in other word ,in the second query both physical and logical operations are “SORTING”
whereas,IN the third query physical operation=MERGE JOIN(takes already sorted input table) and logical operation=CONCATINATION
Hi Pinal,
Because of explicit SORT operation on final result set causes second query more expensive … Whereas 3rd query doesn’t require any SORT operation externally…
Note that “Merge join itself is very fast, but it can be an expensive choice if sort operations are required.” means if there is no sort operation required then MERGE JOIN is very fast…
The above two are the reasons to have less cost for 2nd query
Hi Pinal
Query 2 : it forms 2 different tables and scans these 2 tables to combine result set.
Query 3 : it forms 1 table and scans 1 table to form result set.
If I look at the plan carefully, the Query 2 uses the Sort operation and the Query 3 uses the Merge Join.
There is an extra cost on performing Sort operation as it does sorting at both Physical and Logical operations.
The Merger Join operator matches the rows from two sorted input tables thus at physical level its merging and logical level its concatenating the operation thus exploiting the sort order.
Hi Pinal,
The Query Cost is 100% at my end for both the queries.