SQL SERVER – Remove Duplicate Rows Using UNION Operator

September 19, 2018

In my earlier post on SQL SERVER – Delete Duplicate Rows, I showed you a method of removing duplicate rows with the help of ROW_NUMBER() function and COMMON TABLE EXPRESSION. In this post, I am going to show you a tricky method of removing duplicate rows using traditional UNION operator.

As you know UNION is used to combine the multiple result sets into a single result set by removing duplicates.

Let us create the same dataset created in the earlier post.

USE TEMPDB
GO
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The following query will return all seven rows from the table

SELECT col1,col2
FROM DuplicateRcordTable

SQL SERVER - Remove Duplicate Rows Using UNION Operator dup_result1

Now to remove the duplicate rows, use the same table and UNION it with an empty result set returned by the same table as shown below

SELECT col1,col2
FROM DuplicateRcordTable
UNION 
SELECT col1,col2
FROM DuplicateRcordTable WHERE 1=0

The result is shown below

SQL SERVER - Remove Duplicate Rows Using UNION Operator dup_result2

The second part of the UNION query uses the dummy WHERE condition 1=0 to return the empty result set. This empty result set is combined with the first part of the UNION part. The UNION operator simply returns the resultset by removing the duplicates.

Let me know if you have any other trick to remove duplicate rows. I will be happy to publish it on the blog with due credit to you. Please leave a comment.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Scripts, SQL Server, SQL Union clause

SQL SERVER – How to Fix log_reuse_wait_desc – AVAILABILITY_REPLICA?

SQL SERVER – Simple Method to Find FIRST and LAST Day of Current Date

16 Comments. Leave new

Nitika Jakhal
September 19, 2018 7:53 am
Hi Pinal
Please change Union all to Union.Becoz Union all will not eliminate duplicate rows. UNION statement effectively does a SELECT DISTINCT on the results set.
Reply
Greg
September 19, 2018 9:30 am
Isn’t a distinct or a group by more obvious? E.g.
SELECT col1, col2
FROM DuplicateRcordTable
GROUP BY col1, col2
or
SELECT DISTINCT col1, col2
FROM DuplicateRcordTable
Reply
Robin Li 李瑞彬 (@robinliks)
September 19, 2018 9:53 am
I think the last script is:
SELECT col1,col2
FROM DuplicateRcordTable
UNION /* ALL */
SELECT col1,col2
FROM DuplicateRcordTable WHERE 1=0
Reply
jim
September 19, 2018 1:04 pm
hi,
IMHO, you should omit ALL from UNION in the 2nd sql
SELECT col1,col2
FROM DuplicateRcordTable
UNION ALL <-
SELECT col1,col2
FROM DuplicateRcordTable WHERE 1=0
thx
jim
Reply
Pinal Dave
September 19, 2018 1:19 pm
Hi All, I fixed the typo where I had incorrectly had UNION ALL instead of UNION… thanks for bringing it to my attention. Thanks so much!
Reply
RobertoMirelman
September 19, 2018 3:37 pm
Do the different ways of deduplicating (Distinct, Group By, Union) result in different execution plans (performance)? If not, DISTINCT is the “shortest” script to write -> and to maintain
Reply
Srini
September 19, 2018 6:06 pm
Hi Pinal,
The query used in the example only retrieves the unique records, but not delete the duplicate rows from the table. Your previous blog on removing duplicate rows using ROW_NUMBER() with CTE actually removes the duplicates from the table. I am expecting the same type of behavior when I saw the blog but it is not the case.
Thanks,
Srinivas
Reply
- Webwilliam
  October 16, 2018 7:48 am
  Exactly. I too thought you would be removing duplicates, as that is the title of the blog.
  Reply
Srini
September 19, 2018 8:42 pm
Hi Pinal,
We can use UNION and top 0 to achieve the same result.
SELECT col1,col2
FROM DuplicateRcordTable
UNION
SELECT top 0 col1,col2
FROM DuplicateRcordTable
I have applied this from one of your recent blogs on “Which is better in terms of performance WHERE 1 = 2 or top 0 ” :)
Although the same can be achieved multiple ways as suggested by other blog readers.
Thanks,
Srini
Reply
Babu KM
September 19, 2018 10:26 pm
I use ctid along with where condition..
Reply
- Srini
  September 19, 2018 11:15 pm
  Hi Babu KM,
  Can you please share your query? I am not clear what you are using in your WHERE clause.
  Thanks,
  Srini
  Reply
Vaibhav Shukla
September 20, 2018 11:56 am
Hi Pinal,
Thanks for this post.
Intersect has also same property to remove duplicate records.
As it is retrieving common records between Table A and Table B, but it will be unique.
Thanks
Vaibhav
Reply
Harsha
September 22, 2018 1:34 pm
Select * from DuplicateRcordTable
Union
Select * from DuplicateRcordTable
Why the condition 1=0??
Reply
- Piotr
  September 24, 2018 2:12 am
  The condition is to prevent any rows from being returned from the second query in the union. WHERE 1=0 always means false, so no rows are returned.
  Reply
Parul
March 12, 2019 2:59 am
will the result change if i remove where 1=0?
I do not understand why do we need to union an empty set? How does a UNION operator work.
Does it remove the duplicates from the resulting combined table or does it pick a row from table A compares it with all rows in Table B and if it is not the duplicate (same as in table b) populates it in the result set.
Reply
- madhivanan
  June 25, 2019 2:38 pm
  UNION will combine the result sets and find out DISTINCT data. So you do not need anything from the same table from the second select statement and that’s why WHERE 1=0 is used. It is faster than bringing out all rows from the same table
  Reply