SQL SERVER – Delete Duplicate Rows

June 23, 2009

I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.

This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.

/*Create Table with 7 entries - 3 are duplicate entries */
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.

/* It should give you 7 rows */
SELECT *
FROM DuplicateRcordTable
GO

SQL SERVER - Delete Duplicate Rows dup1

The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO

It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.

/* It should give you Distinct 4 records */
SELECT *
FROM DuplicateRcordTable
GO

SQL SERVER - Delete Duplicate Rows dup2

This method is a breeze and we can use this for SQL Server version 2005 and the later versions.

Watch the view to see the above concept in action:

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Function, SQL Scripts, SQL Server

SQL SERVER – Interesting Observation – Using sqlcmd From SSMS Query Editor

SQL SERVER – Connecting to Server Using Windows Authentication by SQLCMD

SQL SERVER – SSMS: Top Object and Batch Execution Statistics Reports

July 30, 2014

SQL SERVER – Finding Last Backup Time for All Database

November 4, 2010

SQL SERVER – 2005 – Create Script to Copy Database Schema and All The Objects – Stored Procedure, Functions, Triggers, Tables, Views, Constraints and All Other Database Objects

August 21, 2007

146 Comments. Leave new

Chathura
July 5, 2013 3:41 pm
NIce post.. helped me a lot…
Reply
Prateek Jain
August 16, 2013 8:21 am
Hello Pinal Sir ,
I am little bit confuse in one thing that if we have thousands of records in a table then how can we find the duplicate records and delete it by CTE. In your example , u mention only 7 rows data…Please guide
Reply
Janette Andrews
September 24, 2013 12:00 am
Pinal,
This works great!!! In order to keep the records where that were changed most recently, I changed the order by so that the most recently changed row was the first row – order by lastname, firstname, changedate desc.
As always I appreciate your blogs!
Reply
karthikvanama
September 29, 2013 10:52 pm
hi pinal,
I usually select duplicate rows by the other way:
select x,x,sum(1)
from TBL
order by x,x
Reply
Sebas77ian
October 6, 2013 7:05 pm
GRACIAS ME AHORRASTE UN TEDIOSO TRABAJO!!!.EXCELENTE APORTE.
Reply
Alex S
October 11, 2013 11:07 am
i used this valuable query but its throw an error like “semi colon missing in previous line of query”. if i put semi colon, is there any problem that terminate the stored procedure. after this i using some queries whether it will execute or not ?
Reply
sam
March 19, 2014 7:10 pm
Awesome and neat way to do it than putting in a staging table.
Reply
rekha
April 16, 2014 11:27 am
very clear demonstration, easy to understand.
Reply
Daryl
June 4, 2014 9:16 am
thanks Pinal!!!!
Reply
Cu Lao Pho
July 10, 2014 10:36 am
Thanks Pinal Dave.
Reply
Marty
September 3, 2014 10:56 pm
Worked Beautifully!!! Thanks Pinal Dave!
Reply
KC
October 30, 2014 10:48 pm
It worked !! thanks a lot
Reply
Andres Garza
December 18, 2014 5:29 am
Great!!! Thanks a lot!!!
Reply
Aashika
July 15, 2015 12:53 pm
Great technique. Thanks for the informative post.
Reply
- Pinal Dave
  July 16, 2015 5:27 am
  thanks Aashika. I am glad you liked it.
  Reply
Jaya Sharma
August 19, 2015 1:15 am
Hi Pinal,
I have below script and want to delete duplicated but when i run the query then i got an error msg like
”Msg 4405, Level 16, State 1, Line 2
View or function ’employeeDetails’ is not updatable because the modification affects multiple base tables.”
with employeeDetails
(SNO,ID,Title,FName,Lname,Contact,Payrate)
as
(select ROW_NUMBER()over(PARTITIOn by a.employeeID order by c.PayFrequency)SNO, a.EmployeeID,a.Title,b.FirstName,b.LastName,b.Phone,c.PayFrequency from HumanResources.Employee a inner join Person.Contact b on a.ContactID=b.ContactID
inner join HumanResources.EmployeePayHistory c on a.EmployeeID=c.EmployeeID)
delete from employeeDetails where sno>1
Please help
Thanks
Jaya sharma
Reply
firman
September 29, 2016 1:04 pm
perfect trick, thank you so much… u’ve help me….
Reply
Dinesh Panchal
April 6, 2017 7:09 pm
Thank Very much this query helped me lot
Reply
- Pinal Dave
  April 7, 2017 6:49 am
  Thanks Dinesh. I am glad that this helped you.
  Reply
Josh
March 10, 2019 11:04 am
Hi, Pinal, is this applicable/advisable to use in a million records/rows?
Will it affect the performance?
thanks,
Reply
Soumyajit
October 9, 2019 6:18 pm
deleting rows in cte, but how does it deletes rows from original table – what is the inner algorithm behind it. I checked many blogs but no answer is satisfactory yet.
Reply
Prashant Agrawal
July 2, 2023 5:42 am
Your youtube video demonstrates an ENTIRELY DIFFERENT method than that which this page discusses. Very confusing!
Reply