I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.
This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.
/*Create Table with 7 entries - 3 are duplicate entries */ CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT) INSERT INTO DuplicateRcordTable SELECT 1, 1 UNION ALL SELECT 1, 1 --duplicate UNION ALL SELECT 1, 1 --duplicate UNION ALL SELECT 1, 2 UNION ALL SELECT 1, 2 --duplicate UNION ALL SELECT 1, 3 UNION ALL SELECT 1, 4 GO
The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.
/* It should give you 7 rows */ SELECT * FROM DuplicateRcordTable GO
The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.
/* Delete Duplicate records */ WITH CTE (COl1,Col2, DuplicateCount) AS ( SELECT COl1,Col2, ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount FROM DuplicateRcordTable ) DELETE FROM CTE WHERE DuplicateCount > 1 GO
It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.
/* It should give you Distinct 4 records */ SELECT * FROM DuplicateRcordTable GO
This method is a breeze and we can use this for SQL Server version 2005 and the later versions.
Watch the view to see the above concept in action:
Reference : Pinal Dave (https://blog.sqlauthority.com)
Thanks, Saved tons of time for large number records
Thanks for this great post. I was able to solve the same problem.
Superb one to delete duplicate records.Thank you Pinal sir
Very useful… But can u post to delete duplicates from approx. 9000000 records. I am using your solution but it is taking to much time in large data..
HI i m saravanan
do u have 70 433 dumps pls reply me..
before delete duplicates, how to kept into Temporary Table those duplicates(to be deleted records).
Thanks so much
Thanks, i got it…..was looking for ROWID as we do in Oracle and in SQLSERVER there’s no ROWID….so we can do it with CTE and ROW_NUMBER() function…:)
Thanks for the great post!!!!!!
i want to delete multiple records selected by users … em working on c# and connected sql server rite now em using a stored procedure which deletes one record at a time (delete form abc where email=”etc” and group=”etc”) … can you tell me how can i deleted selected records …?
You need to post some sample data with expected result
not working if hav 3 columns same records data
hi i had one problem with removing duplicates pls help me out.
i have 3 columns named as a col1,col2,col3 with different different values and some what unique values,now i want remove that data.
col1 col2 col3
1 1 1
1 1 1
2 1 1
3 2 1
now i want remove second repeated row from above ex..
Refer point 6 at
delete from a
(select col1, col2,col3
,ROW_NUMBER() over (partition by col1, col2,col3
order by col1, col2,col3) RowNumber
from your_table ) a
where a.RowNumber >1
how to drop function pls give me solutions
cannot drop function because it is being referenced by object….
if we create a table with primary key and we use it as a foreign key then how we delete data and TRUNCAT.plz suggest me.
Very good article..Thank you
Any suggestions for a huge table with hundreds million rows and 150 columns?