SQL SERVER – Remove Duplicate Rows Using UNION Operator

In my earlier post on SQL SERVER – Delete Duplicate Rows, I showed you a method of removing duplicate rows with the help of ROW_NUMBER() function and COMMON TABLE EXPRESSION. In this post, I am going to show you a tricky method of removing duplicate rows using traditional UNION operator.

SQL SERVER - Remove Duplicate Rows Using UNION Operator removedup

As you know UNION is used to combine the multiple result sets into a single result set by removing duplicates.

Let us create the same dataset created in the earlier post.

Solarwinds
USE TEMPDB
GO
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The following query will return all seven rows from the table

SELECT col1,col2
FROM DuplicateRcordTable

SQL SERVER - Remove Duplicate Rows Using UNION Operator dup_result1

Now to remove the duplicate rows, use the same table and UNION it with an empty result set returned by the same table as shown below

SELECT col1,col2
FROM DuplicateRcordTable
UNION 
SELECT col1,col2
FROM DuplicateRcordTable WHERE 1=0

The result is shown below

SQL SERVER - Remove Duplicate Rows Using UNION Operator dup_result2

The second part of the UNION query uses the dummy WHERE condition 1=0 to return the empty result set. This empty result set is combined with the first part of the UNION part. The UNION operator simply returns the resultset by removing the duplicates.

Let me know if you have any other trick to remove duplicate rows. I will be happy to publish it on the blog with due credit to you. Please leave a comment.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Solarwinds
, ,
Previous Post
SQL SERVER – How to Fix log_reuse_wait_desc – AVAILABILITY_REPLICA?
Next Post
SQL SERVER – Simple Method to Find FIRST and LAST Day of Current Date

Related Posts

16 Comments. Leave new

  • Hi Pinal
    Please change Union all to Union.Becoz Union all will not eliminate duplicate rows. UNION statement effectively does a SELECT DISTINCT on the results set.

    Reply
  • Isn’t a distinct or a group by more obvious? E.g.
    SELECT col1, col2
    FROM DuplicateRcordTable
    GROUP BY col1, col2

    or
    SELECT DISTINCT col1, col2
    FROM DuplicateRcordTable

    Reply
  • Robin Li 李瑞彬 (@robinliks)
    September 19, 2018 9:53 am

    I think the last script is:
    SELECT col1,col2
    FROM DuplicateRcordTable
    UNION /* ALL */
    SELECT col1,col2
    FROM DuplicateRcordTable WHERE 1=0

    Reply
  • hi,
    IMHO, you should omit ALL from UNION in the 2nd sql

    SELECT col1,col2
    FROM DuplicateRcordTable
    UNION ALL <-
    SELECT col1,col2
    FROM DuplicateRcordTable WHERE 1=0

    thx
    jim

    Reply
  • Hi All, I fixed the typo where I had incorrectly had UNION ALL instead of UNION… thanks for bringing it to my attention. Thanks so much!

    Reply
  • Do the different ways of deduplicating (Distinct, Group By, Union) result in different execution plans (performance)? If not, DISTINCT is the “shortest” script to write -> and to maintain

    Reply
  • Hi Pinal,

    The query used in the example only retrieves the unique records, but not delete the duplicate rows from the table. Your previous blog on removing duplicate rows using ROW_NUMBER() with CTE actually removes the duplicates from the table. I am expecting the same type of behavior when I saw the blog but it is not the case.

    Thanks,
    Srinivas

    Reply
  • Hi Pinal,

    We can use UNION and top 0 to achieve the same result.

    SELECT col1,col2
    FROM DuplicateRcordTable
    UNION
    SELECT top 0 col1,col2
    FROM DuplicateRcordTable

    I have applied this from one of your recent blogs on “Which is better in terms of performance WHERE 1 = 2 or top 0 ” :)

    Although the same can be achieved multiple ways as suggested by other blog readers.

    Thanks,
    Srini

    Reply
  • I use ctid along with where condition..

    Reply
  • Hi Pinal,

    Thanks for this post.

    Intersect has also same property to remove duplicate records.
    As it is retrieving common records between Table A and Table B, but it will be unique.

    Thanks
    Vaibhav

    Reply
  • Select * from DuplicateRcordTable
    Union
    Select * from DuplicateRcordTable

    Why the condition 1=0??

    Reply
    • The condition is to prevent any rows from being returned from the second query in the union. WHERE 1=0 always means false, so no rows are returned.

      Reply
  • will the result change if i remove where 1=0?
    I do not understand why do we need to union an empty set? How does a UNION operator work.
    Does it remove the duplicates from the resulting combined table or does it pick a row from table A compares it with all rows in Table B and if it is not the duplicate (same as in table b) populates it in the result set.

    Reply
    • UNION will combine the result sets and find out DISTINCT data. So you do not need anything from the same table from the second select statement and that’s why WHERE 1=0 is used. It is faster than bringing out all rows from the same table

      Reply

Leave a Reply

Menu