SQL SERVER – Delete Duplicate Rows

I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.

This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.

/*Create Table with 7 entries - 3 are duplicate entries */
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.

/* It should give you 7 rows */
SELECT *
FROM DuplicateRcordTable
GO

SQL SERVER - Delete Duplicate Rows dup1

The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO

It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.

/* It should give you Distinct 4 records */
SELECT *
FROM DuplicateRcordTable
GO

SQL SERVER - Delete Duplicate Rows dup2

This method is a breeze and we can use this for SQL Server version 2005 and the later versions.

Watch the view to see the above concept in action:

Reference : Pinal Dave (https://blog.sqlauthority.com)

, , ,
Previous Post
SQL SERVER – Interesting Observation – Using sqlcmd From SSMS Query Editor
Next Post
SQL SERVER – Connecting to Server Using Windows Authentication by SQLCMD

Related Posts

145 Comments. Leave new

  • NIce post.. helped me a lot…

    Reply
  • Hello Pinal Sir ,
    I am little bit confuse in one thing that if we have thousands of records in a table then how can we find the duplicate records and delete it by CTE. In your example , u mention only 7 rows data…Please guide

    Reply
  • Pinal,

    This works great!!! In order to keep the records where that were changed most recently, I changed the order by so that the most recently changed row was the first row – order by lastname, firstname, changedate desc.

    As always I appreciate your blogs!

    Reply
  • hi pinal,

    I usually select duplicate rows by the other way:

    select x,x,sum(1)
    from TBL
    order by x,x

    Reply
  • GRACIAS ME AHORRASTE UN TEDIOSO TRABAJO!!!.EXCELENTE APORTE.

    Reply
  • i used this valuable query but its throw an error like “semi colon missing in previous line of query”. if i put semi colon, is there any problem that terminate the stored procedure. after this i using some queries whether it will execute or not ?

    Reply
  • Awesome and neat way to do it than putting in a staging table.

    Reply
  • very clear demonstration, easy to understand.

    Reply
  • thanks Pinal!!!!

    Reply
  • Thanks Pinal Dave.

    Reply
  • Worked Beautifully!!! Thanks Pinal Dave!

    Reply
  • It worked !! thanks a lot

    Reply
  • Great!!! Thanks a lot!!!

    Reply
  • Great technique. Thanks for the informative post.

    Reply
  • Hi Pinal,

    I have below script and want to delete duplicated but when i run the query then i got an error msg like
    ”Msg 4405, Level 16, State 1, Line 2
    View or function ’employeeDetails’ is not updatable because the modification affects multiple base tables.”

    with employeeDetails
    (SNO,ID,Title,FName,Lname,Contact,Payrate)
    as
    (select ROW_NUMBER()over(PARTITIOn by a.employeeID order by c.PayFrequency)SNO, a.EmployeeID,a.Title,b.FirstName,b.LastName,b.Phone,c.PayFrequency from HumanResources.Employee a inner join Person.Contact b on a.ContactID=b.ContactID
    inner join HumanResources.EmployeePayHistory c on a.EmployeeID=c.EmployeeID)
    delete from employeeDetails where sno>1
    Please help

    Thanks
    Jaya sharma

    Reply
  • perfect trick, thank you so much… u’ve help me….

    Reply
  • Dinesh Panchal
    April 6, 2017 7:09 pm

    Thank Very much this query helped me lot

    Reply
  • Hi, Pinal, is this applicable/advisable to use in a million records/rows?
    Will it affect the performance?

    thanks,

    Reply
  • deleting rows in cte, but how does it deletes rows from original table – what is the inner algorithm behind it. I checked many blogs but no answer is satisfactory yet.

    Reply

Leave a Reply

Menu