SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.

This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.

/* Create Table with 7 entries - 3 are duplicate entries */
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.

/* It should give you 7 rows */
SELECT *
FROM DuplicateRcordTable
GO


The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created  DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM
CTE
WHERE DuplicateCount > 1
GO

It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.

/* It should give you Distinct 4 records */
SELECT *
FROM DuplicateRcordTable
GO


This method is a breeze and we can use this for SQL Server version 2005 and the later versions.

Click to Download Scripts

Watch the view to see the above concept in action:

Reference : Pinal Dave (http://blog.SQLAuthority.com)

About these ads

143 thoughts on “SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

  1. This new feature is very good.

    I use this key word very often.

    The other good one is Simple OVER PARTITION BY with out ROW_NUMBER () and RANK.

    We extensively use GROUP BY and we all know problem with GROUP BY (We should have all selected column from select clause in group by clause, other wise we cannot use group by). But by using OVER PARTITION BY we can group by few of many selected column in select clause.

    I believe this feature was first introduced in ORACLE. And then in SQL Server.

    But this is really a cool feature to use.

    Thanks.
    Imran.

    Like

  2. Pretty nice post. I just stumbled upon your site and wanted to say
    that I have really liked browsing your blog posts. In any case
    I’ll be subscribing to your blog and I hope you write again soon!

    Like

  3. This is a neat mehod, Initially i thought deleting from CTE wouldn’t delete the dup records from the base table, surely it does, thanks a bunch.

    Like

  4. Nice method to avoid using temp/derived and group by operation.
    Can we perform update operation on cte? will it reflect the base table also?.
    Thanks in advance.

    Like

  5. this ERROR OCCURS after executing the above sql

    Msg 306, Level 16, State 2, Line 1
    The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.

    name regno
    kkk 1111
    kkk 1111
    kkk 1111
    ppp 1110
    ppp 1110

    how to delete these duplicatess

    Like

    • use CTE(name,regno,duplicate count)
      as
      (
      select name,regno,
      row number() over(partition by name,regno order by
      name)
      from [table name]
      )
      delete from cte where duplicate count>1
      go

      Like

  6. Hey Pinal,

    loved ur blog..

    am a newbie to sql server 2005…nd would really appreciate your help..

    i have more columns than 2 that you have(around 6) in which i have the data…am a little confused as to how to write the RowNumber() and Over() methods..

    thnx in advance

    Like

  7. Yes, I have understood the answer for the above question.

    But, I have been trying to import data from an excel table to sql server 2005 with avoiding the duplicate rows. So, how should I be able to implement the above code to a C# program.

    Thank You very much

    Like

  8. Hey Pinal very very very nice postings… really helpful
    I dont know how to remove rows from excel using sql server query… please post about this…
    Thanx in advance :)

    Like

  9. Hey Pinal,

    Superb Blog yar !!!! I really loved u r blog. Its wonderful and huge collections…

    I am new one.. now onwards i will share my knowledge through u r blog….

    Keep it up…

    Like

  10. This blog spot cleared majority of my SQL clarification. CTE is used in the above query to delete duplicate rows in such a cute way.

    Great Job….

    Like

  11. greate job!

    i was trying each and every query from 20-30 web pages for doing the same thing but none of them worked.

    this is really awesome.
    thanks. :)

    Like

  12. dear,sir
    if we create a table with primary key and we use it as a foreign key then how we delete data and TRUNCAT.plz suggest me.
    thank’s

    Like

  13. I have Question
    how can I update/delete row from tables which has a composite primary key by using with CTE().
    when I try to update the tble which has the composite primary key but it won’t allow me
    it shows error message like
    Msg 2627, Level 14, State 1, Procedure T_UPD_Membership_Question, Line 14
    Violation of PRIMARY KEY constraint ‘PK_Membership_Question’. Cannot insert duplicate key in object ‘dbo.Membership_Question’.
    The statement has been terminated

    could u please help me regarding this.

    Like

  14. Hello
    I have a question
    I have a table named membership_question
    I want to delete the dups from this table
    i write the code like follows
    it is working but it is deleting all the dup’s
    but I want to keep one row in the table and i want to delete all the dups regarding that
    ————————————————————————–
    WITH CTE1 ( PersonID,QuestionID,duplicateCount)AS
    (
    SELECT PersonID,QuestionID,
    ROW_NUMBER() OVER(PARTITION BY PersonID,QuestionID ORDER BY PersonID) AS duplicateCount
    From Membership_Question
    )
    select PersonID,QuestionID into #TempDupedeleteMembership_Question from
    CTE1 Where duplicateCount>1

    select * from #TempDupedeleteMembership_Question

    DELETE Membership_Question
    FROM Membership_Question
    INNER JOIN #TempDupedeleteMembership_Question
    ON Membership_Question.PersonID = #TempDupedeleteMembership_Question.PersonID AND
    Membership_Question.QuestionID = #TempDupedeleteMembership_Question.QuestionID

    could any can help regarding this
    Is first() function can work with CTE

    Thank you

    Like

  15. I have a question
    I have a view with 4 fields: PersonId, TracKID,CurriculumID,Totaltime
    I can group the person ID And display the data like below

    PersonID CurricuID TotalTIME TrackID
    10053 2 90 2
    10053 2 90 3
    10053 2 90 1

    But I want to add the totaltime and dispaly in minutes respetive to the group of personId’s without rounding

    Could any one can help regarding this
    Thank you

    Like

    • hi if i Join three tables and if i use this code its not working for me

      And my code is as follows
      ;WITH CTE(PartnerProfileId,FieldName, DuplicateCount)
      AS
      (
      select P.PartnerProfileId ,F.FieldName,
      ROW_NUMBER() OVER(PARTITION BY P.PartnerProfileId,F.FieldName ORDER BY P.PartnerProfileId) AS DuplicateCount
      from tblPartnerProfile P
      JOIN tblPartnerLanguage L ON P.PartnerProfileId = L.PartnerProfileId
      JOIN tblPartnerFields F ON P.PartnerProfileId = F.ParentProfileId
      JOIN tblParnerClients C ON P.PartnerProfileId = C.PartnerProfileId
      — JOIN tblPartnerRegionofExperience R ON P.PartnerProfileId = R.PartnerProfileId
      — JOIN tblPartnerOrgStatus S ON P.PartnerProfileId = S.PartnerProfileId

      )
      DELETE
      FROM CTE
      WHERE DuplicateCount > 1

      And i got this error while i run a particular script
      “View or function ‘CTE’ is not updatable because the modification affects multiple base tables.

      Like

  16. Hello Mr.Dev
    thank u for the response
    here I am giving example input and output

    Input:
    PersonID CurricuID TotalTIME TrackID
    10053 2 90 2
    10053 2 90 3
    10053 2 90 1
    10133 2 270 2
    10133 2 270 3
    10133 2 270 1

    output
    PersonID CurricuID TotalTIME TrackID
    10053 2 4.5 2
    10053 2 4.5 3
    10053 2 4.5 1
    10133 2 13.5 2
    10133 2 13.5 3
    10133 2 13.5 1

    Thank you
    I apprciate your help

    Like

  17. Thank you much Mr.Dev
    It works for me
    but
    here the total time is rounding
    like below
    10053 4 2
    I want Like without rounding off like below
    10053 4.5 2
    Could u help me like What I need to do getting output like that.

    Like

  18. how to delete 2 same records in table in sqlserver2005/2008

    EX:

    empid ename sal

    101 sai 1000
    102 ramya 2500
    102 ramya 2500
    103 suresh 1500
    103 suresh 1500

    plese help me……….

    Like

    • Thank you much Mr.Dev
      It works for me
      but
      here the total time is rounding
      like below
      10053 4 2
      I want Like without rounding off like below
      10053 4.5 2
      Could u help me like What I

      Like

  19. try this:

    create table dupdata(id int,name varchar(30),sal money)
    insert into dupdata values(1,’veeru’,2100)
    insert into dupdata values(1,’veeru’,2100)
    insert into dupdata values(2,’sud’,2100)
    insert into dupdata values(2,’sud’,2100)

    alter table dupdata add rowid int identity(1,1)

    delete from dupdata where rowid in(select max(rowid) from dupdata group by id,name,sal

    having count(*)>1)

    alter table dupdata drop column rowid

    select * from dupdata

    Like

  20. This CTE trick is the best thing since sliced bread!
    Unfortunately, I had invested (Wasted) several hours trying to delete duplicate rows by using inner join and by first inserting all the duplicates into a Temp table. Whew!

    This may have saved my sanity :)

    THANK YOU Dave!

    Like

  21. Can any body guide how to use CTE if table has several columns. I am using following query but getting error.

    WITH CTE ([MSC-S KPI],*,DuplicateCount)
    AS
    (
    SELECT [MSC-S KPI],*
    ROW_NUMBER() OVER(PARTITION BY [MSC-S KPI],* ORDER BY [MSC-S KPI]) AS DuplicateCount
    FROM dbo.KPI_pl$
    )
    DELETE
    FROM CTE
    WHERE DuplicateCount > 1
    Go

    Msg 102, Level 15, State 1, Line 1
    Incorrect syntax near ‘*’.
    Msg 102, Level 15, State 1, Line 5
    Incorrect syntax near ‘ROW_NUMBER’.

    Is it necessary to enlist all the columns in CTE expression.?

    Like

  22. How can I add mins on current time and display
    time incrementing by added mins
    For eg:

    13:25 + 15 mins = 13:14
    or
    13:45 + 15 mins = 14:00
    or
    13:50 + 15 mins = 14:05

    Thank you in advance

    Sam

    Like

  23. hello sir

    your duplicate rows query is very best

    sir i have 6 column in table (employee)
    column name
    ip_code
    11–12345656
    11-23657896
    11-34784523
    11-09864584
    11-54830765
    11-67094587
    means total column 200 only

    i want a that only 5 interger value come in output
    means
    11-123
    11-236
    11-347
    11-098
    11-548
    plz sir answer this question becoz this query is very important for me
    thankyou
    pinal dave (sir)

    Like

  24. I have a result set from aTable Colors like follows which having coloms Name and Color

    Name color

    A Green
    A RED
    A Yellow
    Davis Bloon
    Davis Red

    I want to replace that the result set as follows.
    Name color
    A Green
    RED
    Yellow
    Davis Bloon
    Red

    Like

    • WITH cte (clr,ROwNum)as
      (SELECT clr,row_Number() OVER(PARTITION BY LEFT(clr,CHARINDEX(SPACE(1),clr)-1) ORDER BY LEFT(clr,CHARINDEX(SPACE(1),clr)-1)) FROM color)
      –select * from cte where ROwNum>1
      –select clr,CHARINDEX(SPACE(1),clr),
      –LEFT(clr,CHARINDEX(SPACE(1),clr)-1),
      UPDATE cte SET clr=Right(clr,len(clr)-CHARINDEX(SPACE(1),clr))–AS ToUpdate from cte
      WHERE ROwNum>1

      Like

  25. Pingback: SQL SERVER – Common Table Expression (CTE) and Few Observation Journey to SQLAuthority

  26. Hi Panel but i still confuse about ur query
    i try at home and get the result
    but i don’ t understand why
    we write

    DELETE FROM CTS WHERE DC>1

    it means the delete the records where dc is greater than 1
    but in ur row number column no records is greater than 1
    then how ur query work

    Please help me out
    Short Knowledge is Bigger hurdle in Success.

    Like

  27. I got an error: View or function ‘repAuthPatStaff’ is not updatable because the modification affects multiple base tables.
    when I tried to apply the CTE delete to multiple tables

    Like

  28. Pingback: SQL SERVER – Database Worst Practices – New Town and New Job and New Disasters Journey to SQLAuthority

  29. Hi Pinal,

    Could you give me help on the following?
    1. How many parameters that CTE can accepts?
    2. Shall we able to move duplicate data into another table and How to delete duplicate records simultaneously using CTE.? Let me explain my actual scenario. I have a table called Test.

    Test has the following data
    Col1 Col2 Col3
    —— —— ——
    101 abc xyz
    102 aaa bbb
    101 abc xyz
    101 abc xyz
    103 xxx yyy
    101 abc xyz
    101 abc xyz
    101 abc xyz
    101 abc xyz
    101 abc xyz
    :::::::::::::::::::::::::

    I need to store the duplicate records in the table Test_duplicate. Could you tell me how to achieve it with CTE.

    My require output is

    Test:

    Col1 Col2 Col3
    —— —– —–
    101 abc xyz
    102 aaa bbb
    103 xxx yyy

    Test_duplicate:
    Col1 Col2 Col3
    —– —– —–
    101 abc xyz
    101 abc xyz
    101 abc xyz
    101 abc xyz
    101 abc xyz
    101 abc xyz
    101 abc xyz
    ::::::::::::::::::::::::

    Thanks in advance.

    Like

  30. If you’re using 2008, why not use the “undocumented” feature of physloc? Substitute your table for tablename and substitute the column which contains the duplicate values for columnwithdups. Works rather nicely if I do say so myself.

    delete
    from dbo.tablename
    where tablename.%%physloc%%
    NOT IN (Select min(b.%%physloc%%)
    from dbo.tablename b
    group by b.columnwithdups);

    Like

  31. hi, Thanks for the article, it was so useful.
    but i have another problem, similar to this:
    My table has 10 rows, let say R1 to R10
    R1 is the primary key .
    i had some data while is filled in R1 to R5, and some else to R10.

    The problem is: the system by mistake inserted duplicate data in R2-R5 (R1 is PK assuem autonumber) and in one of these duplicates R6 and R7 are also filled.
    I want to remove the duplicates, which their R6,R7 are filled.
    Consider that the R6,R7,… may be filled for other data, and they are ok!
    by this method i could find them and remove the duplicates, but i cannot decide to remove which one! I mean I can not force to remove the duplicates which has R6,R7.
    Please advice.

    Like

  32. Ok, here Iam.
    these are the scripts (SQL Server 2005):
    Table creation:
    USE [MyDB]
    GO
    /****** Object: Table [dbo].[tBLtEST] Script Date: 10/03/2011 21:48:04 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    SET ANSI_PADDING ON
    GO
    CREATE TABLE [dbo].[tBLtEST](
    [R1] [varchar](16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [R2] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R3] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R4] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R5] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R6] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R7] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R8] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R9] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [R10] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    CONSTRAINT [PK_tBLtEST] PRIMARY KEY CLUSTERED
    (
    [R1] ASC
    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]

    GO
    SET ANSI_PADDING OFF
    ———————————————————————————

    Data Insertion section:

    INSERT INTO dbo.tBLtEST VALUES
    (‘1′,’R2′,’R3′,’R4′,’R5′,NULL,NULL,NULL,NULL,NULL)–NORMAL RECORD
    INSERT INTO dbo.tBLtEST VALUES
    (‘A’,’RR’,’RR3′,’RG4′,’RG5′,’RH6′,’RH7′,’RTJ’,NULL,NULL)–NORMAL RECORD
    INSERT INTO dbo.tBLtEST VALUES
    (‘B’,’RB2′,’RB3′,’RB4′,’RB5′,NULL,’KJL’,’LKJU’,’BBNJ’,’TEST’)–NORMAL RECORD
    INSERT INTO dbo.tBLtEST VALUES
    (‘C’,’R2′,’R3′,’R4′,’R5′,’R6′,’R7′,NULL,NULL,NULL)–DUPLCATE RECORD WITH THE FIRST RECORD
    INSERT INTO dbo.tBLtEST VALUES
    (‘D’,’DR2′,’DR3′,’DR4′,’DR5′,NULL,NULL,NULL,NULL,NULL)
    INSERT INTO dbo.tBLtEST VALUES
    (‘E’,’DR2′,’DR3′,’DR4′,’DR5′,’KK’,’NN’,NULL,NULL,NULL)–DUPLICATE WITH ABOVE RECORD (pk=D)
    INSERT INTO dbo.tBLtEST VALUES
    (‘F’,’RF2′,’RF3′,’FR4′,’FR5′,’FFFF’,’OOO’,NULL,’MMM’,’KKL’)–NORMAL RECORD
    INSERT INTO dbo.tBLtEST VALUES
    (‘G’,’RGG2′,’GGGR3′,’RGGG4′,’GGGR5′,NULL,’GGG’,’GHH’,NULL,NULL)–NORMAL RECORD

    ———————————————————-

    what i need is to remove the duplicates, but always remove the ones which the
    R6 and R7 field are filled.
    here in this sample i want to be sure that the Records C and E are removed.
    hope it is clear now and you can guid me for that.
    PS: please consider that I have Huge amount of data in the DB.
    BRs,

    Like

  33. I think I found it.
    WITH CTE (R2,R3,R4,R5, DuplicateCount)
    AS
    (
    SELECT R2,R3,R4,R5,
    ROW_NUMBER() OVER(PARTITION BY R2,R3,R4,R5 ORDER BY R6,R7) AS DuplicateCount
    FROM tBLtEST
    )
    DELETE
    FROM CTE
    WHERE DuplicateCount > 1
    GO

    Can you just check this and make me sure it is correct?

    Like

  34. This is a nifty use of a CTE. My question is, how does using the CTE compare in terms of performance to using a SELECT DISTINCT?

    The tables I am looking at text import tables. They are all very wide with upwards of 50 columns, all of the columns are varchar(255).

    ( I didn’t create the table but I have to maintain it! )

    Duplicates would only exists if all 50 of the columns were identical.

    The current query is a SELECT DISTINCT, but I wonder whether or not a CTE would be more efficient.

    Maybe if I added a computed column that’s a hash of all the columns, and looked for duplicates based on that column?

    Like

  35. Pingback: SQL SERVER – Common Gotcha’s Associated with Common Table Expressions (CTE) – Quiz – Puzzle – 26 of 31 « SQL Server Journey with SQL Authority

  36. This is for finding the duplicate rows:

    duplicate rows=case 1-case2

    SELECT Personentityid,COUNT(NAMESNO) AS DUPLICATE INTO DBA_DB.dbo.duplicatevalues FROM DBA_DB.dbo.SpellingVariation
    GROUP BY Personentityid

    case 1:
    select sum(duplicate) from DBA_DB.dbo.duplicatevalues where duplicate>1

    Case 2:
    select Personentityid from DBA_DB.dbo.duplicatevalues where duplicate>1

    Like

  37. We have to insert 30,000 rows every 10 minutes…we do that for about an hour. What is the best way to prevent duplicates?

    a) Use a batch approach like the methods described in this article to remove duplicates
    b) Use a real-time approach by adding a unique index to prevent duplicates when you insert?

    Thanks,
    David

    Like

  38. Pingback: SQL SERVER – Convert Subquery to CTE – SQL in Sixty Seconds #001 – Video « SQL Server Journey with SQL Authority

  39. Dude I have to thank you.. have been using your posts very often.. felt compelled to leave a thank you note.. u r doing a gud service to ur fellow professionals….

    Like

  40. hi to all,

    Here i am having a case, Can we avoid duplicates at time while insertion done at two tables and if the query is re-run then also it should not insert the same data

    Like

  41. Very useful… But can u post to delete duplicates from approx. 9000000 records. I am using your solution but it is taking to much time in large data..

    Like

  42. Tremendous Solution.
    before delete duplicates, how to kept into Temporary Table those duplicates(to be deleted records).

    Like

  43. i want to delete multiple records selected by users … em working on c# and connected sql server rite now em using a stored procedure which deletes one record at a time (delete form abc where email=”etc” and group=”etc”) … can you tell me how can i deleted selected records …?

    Like

  44. hi i had one problem with removing duplicates pls help me out.
    i have 3 columns named as a col1,col2,col3 with different different values and some what unique values,now i want remove that data.
    ex…
    col1 col2 col3
    1 1 1
    1 1 1
    2 1 1
    3 2 1

    now i want remove second repeated row from above ex..

    Like

  45. Pingback: SQL SERVER – Select and Delete Duplicate Records – SQL in Sixty Seconds #036 – Video « SQL Server Journey with SQL Authority

  46. Pingback: SQL SERVER – Weekly Series – Memory Lane – #035 | Journey to SQL Authority with Pinal Dave

  47. Hello Pinal Sir ,
    I am little bit confuse in one thing that if we have thousands of records in a table then how can we find the duplicate records and delete it by CTE. In your example , u mention only 7 rows data…Please guide

    Like

  48. Pinal,

    This works great!!! In order to keep the records where that were changed most recently, I changed the order by so that the most recently changed row was the first row – order by lastname, firstname, changedate desc.

    As always I appreciate your blogs!

    Like

  49. i used this valuable query but its throw an error like “semi colon missing in previous line of query”. if i put semi colon, is there any problem that terminate the stored procedure. after this i using some queries whether it will execute or not ?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s