In my earlier post on SQL SERVER – Delete Duplicate Rows, I showed you a method of removing duplicate rows with the help of ROW_NUMBER() function and COMMON TABLE EXPRESSION. In this post, I am going to show you a tricky method of removing duplicate rows using traditional UNION operator.
As you know UNION is used to combine the multiple result sets into a single result set by removing duplicates.
Let us create the same dataset created in the earlier post.
USE TEMPDB GO CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT) INSERT INTO DuplicateRcordTable SELECT 1, 1 UNION ALL SELECT 1, 1 --duplicate UNION ALL SELECT 1, 1 --duplicate UNION ALL SELECT 1, 2 UNION ALL SELECT 1, 2 --duplicate UNION ALL SELECT 1, 3 UNION ALL SELECT 1, 4 GO
The following query will return all seven rows from the table
SELECT col1,col2 FROM DuplicateRcordTable
Now to remove the duplicate rows, use the same table and UNION it with an empty result set returned by the same table as shown below
SELECT col1,col2 FROM DuplicateRcordTable UNION SELECT col1,col2 FROM DuplicateRcordTable WHERE 1=0
The result is shown below
The second part of the UNION query uses the dummy WHERE condition 1=0 to return the empty result set. This empty result set is combined with the first part of the UNION part. The UNION operator simply returns the resultset by removing the duplicates.
Let me know if you have any other trick to remove duplicate rows. I will be happy to publish it on the blog with due credit to you. Please leave a comment.
Reference: Pinal Dave (https://blog.sqlauthority.com)
16 Comments. Leave new
Hi Pinal
Please change Union all to Union.Becoz Union all will not eliminate duplicate rows. UNION statement effectively does a SELECT DISTINCT on the results set.
Isn’t a distinct or a group by more obvious? E.g.
SELECT col1, col2
FROM DuplicateRcordTable
GROUP BY col1, col2
or
SELECT DISTINCT col1, col2
FROM DuplicateRcordTable
I think the last script is:
SELECT col1,col2
FROM DuplicateRcordTable
UNION /* ALL */
SELECT col1,col2
FROM DuplicateRcordTable WHERE 1=0
hi,
IMHO, you should omit ALL from UNION in the 2nd sql
SELECT col1,col2
FROM DuplicateRcordTable
UNION ALL <-
SELECT col1,col2
FROM DuplicateRcordTable WHERE 1=0
thx
jim
Hi All, I fixed the typo where I had incorrectly had UNION ALL instead of UNION… thanks for bringing it to my attention. Thanks so much!
Do the different ways of deduplicating (Distinct, Group By, Union) result in different execution plans (performance)? If not, DISTINCT is the “shortest” script to write -> and to maintain
Hi Pinal,
The query used in the example only retrieves the unique records, but not delete the duplicate rows from the table. Your previous blog on removing duplicate rows using ROW_NUMBER() with CTE actually removes the duplicates from the table. I am expecting the same type of behavior when I saw the blog but it is not the case.
Thanks,
Srinivas
Exactly. I too thought you would be removing duplicates, as that is the title of the blog.
Hi Pinal,
We can use UNION and top 0 to achieve the same result.
SELECT col1,col2
FROM DuplicateRcordTable
UNION
SELECT top 0 col1,col2
FROM DuplicateRcordTable
I have applied this from one of your recent blogs on “Which is better in terms of performance WHERE 1 = 2 or top 0 ” :)
Although the same can be achieved multiple ways as suggested by other blog readers.
Thanks,
Srini
I use ctid along with where condition..
Hi Babu KM,
Can you please share your query? I am not clear what you are using in your WHERE clause.
Thanks,
Srini
Hi Pinal,
Thanks for this post.
Intersect has also same property to remove duplicate records.
As it is retrieving common records between Table A and Table B, but it will be unique.
Thanks
Vaibhav
Select * from DuplicateRcordTable
Union
Select * from DuplicateRcordTable
Why the condition 1=0??
The condition is to prevent any rows from being returned from the second query in the union. WHERE 1=0 always means false, so no rows are returned.
will the result change if i remove where 1=0?
I do not understand why do we need to union an empty set? How does a UNION operator work.
Does it remove the duplicates from the resulting combined table or does it pick a row from table A compares it with all rows in Table B and if it is not the duplicate (same as in table b) populates it in the result set.
UNION will combine the result sets and find out DISTINCT data. So you do not need anything from the same table from the second select statement and that’s why WHERE 1=0 is used. It is faster than bringing out all rows from the same table