I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.
This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.
/*Create Table with 7 entries - 3 are duplicate entries */ CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT) INSERT INTO DuplicateRcordTable SELECT 1, 1 UNION ALL SELECT 1, 1 --duplicate UNION ALL SELECT 1, 1 --duplicate UNION ALL SELECT 1, 2 UNION ALL SELECT 1, 2 --duplicate UNION ALL SELECT 1, 3 UNION ALL SELECT 1, 4 GO
The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.
/* It should give you 7 rows */ SELECT * FROM DuplicateRcordTable GO
The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.
/* Delete Duplicate records */ WITH CTE (COl1,Col2, DuplicateCount) AS ( SELECT COl1,Col2, ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount FROM DuplicateRcordTable ) DELETE FROM CTE WHERE DuplicateCount > 1 GO
It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.
/* It should give you Distinct 4 records */ SELECT * FROM DuplicateRcordTable GO
This method is a breeze and we can use this for SQL Server version 2005 and the later versions.
Watch the view to see the above concept in action:
Reference : Pinal Dave (https://blog.sqlauthority.com)
NIce post.. helped me a lot…
Hello Pinal Sir ,
I am little bit confuse in one thing that if we have thousands of records in a table then how can we find the duplicate records and delete it by CTE. In your example , u mention only 7 rows data…Please guide
This works great!!! In order to keep the records where that were changed most recently, I changed the order by so that the most recently changed row was the first row – order by lastname, firstname, changedate desc.
As always I appreciate your blogs!
I usually select duplicate rows by the other way:
order by x,x
GRACIAS ME AHORRASTE UN TEDIOSO TRABAJO!!!.EXCELENTE APORTE.
i used this valuable query but its throw an error like “semi colon missing in previous line of query”. if i put semi colon, is there any problem that terminate the stored procedure. after this i using some queries whether it will execute or not ?
Awesome and neat way to do it than putting in a staging table.
very clear demonstration, easy to understand.
Thanks Pinal Dave.
Worked Beautifully!!! Thanks Pinal Dave!
It worked !! thanks a lot
Great!!! Thanks a lot!!!
Great technique. Thanks for the informative post.
thanks Aashika. I am glad you liked it.
I have below script and want to delete duplicated but when i run the query then i got an error msg like
”Msg 4405, Level 16, State 1, Line 2
View or function ’employeeDetails’ is not updatable because the modification affects multiple base tables.”
(select ROW_NUMBER()over(PARTITIOn by a.employeeID order by c.PayFrequency)SNO, a.EmployeeID,a.Title,b.FirstName,b.LastName,b.Phone,c.PayFrequency from HumanResources.Employee a inner join Person.Contact b on a.ContactID=b.ContactID
inner join HumanResources.EmployeePayHistory c on a.EmployeeID=c.EmployeeID)
delete from employeeDetails where sno>1
perfect trick, thank you so much… u’ve help me….
Thank Very much this query helped me lot
Thanks Dinesh. I am glad that this helped you.
Hi, Pinal, is this applicable/advisable to use in a million records/rows?
Will it affect the performance?
deleting rows in cte, but how does it deletes rows from original table – what is the inner algorithm behind it. I checked many blogs but no answer is satisfactory yet.