SQL SERVER – Delete Duplicate Records – Rows

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts
Previous Post
SQL SERVER – T-SQL Script to find the CD key from Registry
Next Post
SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

Related Posts

450 Comments. Leave new

  • Hi Guys

    The best method to delete duplicates is CTE’s

    code as follows:-

    select * from my table

    ;With my table2
    ( select row1,
    row_number over() ( partition by row1, order by row1 ) as a

    delete from table2 where a > 1

    Reply
  • hi , can anyone explain the sql statement to me
    since i am noobie and i not really understand the coding
    thank alot :)

    Reply
    • If there are duplicates in columns DuplicateColumn1, DuplicateColumn2, DuplicateColumn3 then the code will delete all related rows but keep the row with maximum id

      Reply
  • Assume i have a temp table with some columns (as in the code )and i loaded that temp table with 1 lac record.

    Table’s strucre is like below

    Below table have 1 lac records.

    Now there is one coloumn “PSGRES” also called as Sleep_stage_value, that having value btw 0 to 7 (0,1,2,3,4,5,6,7).

    Now what i want is i want to delete the record that are repeating again.

    Means lets assume First 10 records have “PSGRES” value as 7 and 11th to 14th record having value 1. so i want to delete the record from 2 to 10 as it having the same value 7 and after 12 to 14 as it have same value 2.

    So after deleting those record in temp table , remaing records will be like below.

    Result : (for all 1 lac record )

    1st record with PSGRES value = 7

    11 th record with PSGRES valie = 1

    15 th record with PSGRES value = 2

    and as on till end of the temp table.

    CREATE TABLE #CDISK_MNewFinalAfterForCompare (
    ROWID NUMERIC (8) IDENTITY (1, 1) NOT NULL,
    TRANS_SOURCE_ID VARCHAR(100),
    TRANS_DATE VARCHAR(100),
    STUDYID VARCHAR(50), – Protocol Short Name —
    TRANS_TYPE VARCHAR(50),
    SITE_ID VARCHAR(50), – SITE_NUMBER
    SCREEN_ID VARCHAR(50), – SCREEN_NUMBER
    ALLOCATIONNUMBER VARCHAR(50), – ALLOCATIONNUMBER
    PSG_VISIT_LABLE varchar(50), – VISIT_NUMBER
    VISIT_TYPE VARCHAR(50), – MDB FILE , CUSTOMIZE 4
    ACCESSIONID VARCHAR(100), – MDB FILE , Case_Number
    ACCTUAL_COLL_DATATIME VARCHAR(100),
    PSGTSCD VARCHAR(50), – PSGTSCD
    VEN_LAB_TEST_NAME VARCHAR(50),
    MER_LAB_TEST_ID VARCHAR(50),
    MER_LAB_TEST_NAME VARCHAR(50),
    TEST_STATUS VARCHAR(50),
    COMMENT varchar(255), – COMMENT
    PSGRES VARCHAR(50), – PSGRES
    TEXT_RESULT VARCHAR(50),
    REPORT_UNIT VARCHAR(50),
    RESULT_TYPE VARCHAR(50),
    TRANSFER_TYPE VARCHAR(50),
    DATA_TYPE VARCHAR(50)
    )

    So for that i am using loop and cursor to get this output but it is very slow.

    So can you guys help me out achieve this in query with out using loops and cursor.

    thanks.

    Reply
  • Hi Pinal,

    I have a 280 records, it contains repeated 40 records for 7 times. I need to have only one set of 40 records. How can I do this. The total contains nearly 60 Lakhs rows. Please suggest me a good option to do this. Thanks in advance.

    Regards,
    Prabhu

    Reply
  • Remove Duplicate Row without PK
    [ Just Copy & Paste ]

    declare @TBL table(EmpID int)

    insert into @TBL(EmpID)
    values(90020),(90020),(90021),(90022),(90023),(90022),(90025),(90022)

    select * from @TBL order by EmpID

    ;with TBLCTE(EmpID,Ranking)
    AS
    (
    select
    EmpID,
    Ranking = DENSE_RANK() over (PARTITION BY EmpID order by newID())
    from @TBL
    )
    delete from TBLCTE where Ranking > 1

    select * from @TBL order by EmpID

    Reply
  • Hi guys,

    Having trouble trying to figure out how I can get the ID column of my duplicate records.. ID is unique and required for delete.

    Table is like so:

    ID | Firstname | Surname
    1 Fred Smith
    2 Fred Jones
    3 Alan Thompson
    4 Mary Phillips
    5 Fred Smith
    6 David Jones

    Expected result shows:

    1 Fred Smith
    5 Fred Smith

    Currently I have the code below, works great, but no ID

    SELECT firstname,surname,count(*)
    FROM customer
    group by firstname,surname
    having count(*) > 1

    Is this possible?

    Appreciate any assistance..

    Reply
    • select t1.* from customer as t1 inner join
      (
      SELECT firstname,surname,count(*)
      FROM customer
      group by firstname,surname
      having count(*) > 1
      ) as t2 on t1.firstname=t2.firstname and t1.surname=t2.surname

      Reply
  • Hi all programmers…

    I am glad know this site,

    Reply
  • Hi,

    Let us assume that we have duplicate records in emp table.

    delete from emp
    where empid not in (select distinct(empid) from emp).

    Think this query works out in deleting the duplicate rows in a table.

    Reply
  • Shut up. This is correct.

    Reply
  • Brajesh Goswami
    July 4, 2011 6:25 pm

    DECLARE temp CURSOR FOR SELECT id FROM A GROUP BY A.ID
    DECLARE @id int
    OPEN temp
    FETCH next FROM temp
    INTO @id

    WHILE @@FETCH_STATUS=0
    BEGIN
    IF EXISTS (SELECT id FROM GROUP BY id HAVING count(*)>1 AND id=@id)
    DELETE TOP(SELECT count(*)-1 FROM WHERE id=@id )FROM WHERE id=@id

    FETCH next FROM temp
    INTO @id
    END

    CLOSE temp
    DEALLOCATE temp

    Reply
  • Hi,pinal,

    when we handling a large database like 10GB size of records.

    how can i find duplicates and delete that duplicates.

    finding duplicate is not a problem but how to remove those duplicates ?

    Please suggest me.

    Thank u!

    Reply
  • Hi,

    Find another method using loop.

    Consider the table #temp as below,
    Id Value
    ——————
    1 100
    2 200
    2 200
    3 300
    3 300
    3 300

    DECLARE @Cnt INT
    SET @Cnt = 1

    WHILE @Cnt IS NOT NULL
    BEGIN
    SELECT TOP 1 @Cnt = COUNT(Id) FROM #Test
    GROUP BY Id,Value
    HAVING COUNT(Id) > 1
    ORDER BY Id,Value

    IF @Cnt IS NULL OR @Cnt = 1 RETURN

    SET @Cnt = @Cnt-1
    SET ROWCOUNT @Cnt

    DELETE t1
    FROM #Test t1
    JOIN
    (SELECT TOP 1 Id,Value,COUNT(Id) Cnt FROM #Test
    GROUP BY Id,Value
    HAVING COUNT(Id) > 1
    ORDER BY Id,Value) t2 ON t2.Id = t1.Id AND t2.Value = t1.Value

    SET ROWCOUNT 0
    END

    Reply
  • CREATE TABLE [dbo].[BankMaster](
    [BankID] [int] NOT NULL,
    [Bank Name] [nvarchar](50) NOT NULL,
    [Address] [nvarchar](max) NULL,
    CONSTRAINT [PK_BankMaster] PRIMARY KEY CLUSTERED
    (
    [BankID] ASC
    )
    )

    INSERT INTO BankMaster
    VALUES
    (1,’ING Vysya’,’ING Vysya Eranamkulam’),
    (2,’SBT’,’SBT Eranamkulam’),
    (3,’Federal’,’Federal Bank Eranamkulam’),
    (4,’Canara’,’Canara Eranamkulam’),
    (5,’ING Vysya’,’ING Vysya Eranamkulam’),
    (6,’SIB’,’SIB Eranamkulam’),
    (7,’SBT’,’SBT Eranamkulam’),
    (8,’Canara’,’Canara Eranamkulam’),
    (9,’ING Vysya’,’ING Vysya Eranamkulam’),
    (10,’SBI’,’SBI Eranamkulam’)

    DECLARE @BankName NVARCHAR(50),@Cnt INT

    DECLARE Cursor_Bank CURSOR LOCAL FAST_FORWARD READ_ONLY FOR
    SELECT BankName,COUNT(BankName) – 1 from BankMaster
    GROUP BY BankName HAVING COUNT(BankName) > 1
    OPEN Cursor_Bank
    FETCH NEXT FROM Cursor_Bank INTO @bankname,@Cnt
    WHILE (@@FETCH_STATUS = 0)
    BEGIN
    SET ROWCOUNT @Cnt
    DELETE FROM BankMaster WHERE BankName like @bankname
    SET ROWCOUNT 0
    FETCH NEXT FROM Cursor_Bank INTO @bankname,@Cnt
    END
    CLOSE Cursor_Bank
    DEALLOCATE Cursor_Bank

    Reply
  • Rakesh Kumar Rautray
    September 6, 2011 12:51 pm

    delete from [BankMaster] where [BankID] in
    (
    select [BankID] from (select [BankID], row_number() over(partition by [Bank Name] order by [BankID])RNK from [BankMaster]) BankMaster
    where RNK > 1
    )

    Reply
  • josekutty varghese
    September 17, 2011 2:59 pm

    Dear Pinal,

    Select Distinct Row_Number() Over (Order By CandidateM.CanID Asc)As SlNo, CanCode,CanName from CandidateM Where CandidateM.CanCode like ‘09%’
    Using this query how can I take rownumber of a distinct record….?
    regards,

    josekutty varghese

    Reply
    • Try using rank function

      Select Rank() Over (Order By CandidateM.CanID Asc)As SlNo, CanCode,CanName from CandidateM Where CandidateM.CanCode like ‘09%’

      Reply
  • Riketa Salariya
    October 3, 2011 5:28 pm

    Thanks

    Riketa Salariya & Dhiresh jawale

    Reply
  • I have a quite different request :

    Below is the table with sample data that I have

    Date Name Product Quantity
    10/11/2011 Aname Abc1 20
    10/10/2011 Aname Abc1 20
    10/9/2011 Aname Abc1 20
    10/8/2011 AName Xyz1 30
    10/7/2011 AName Xyz1 20
    10/6/2011 AName Abc1 20
    10/5/2011 AName Abc1 20
    10/4/2011 BName Xyz1 30
    10/11/2011 Bname Abc2 20
    10/10/2011 Bname Abc2 20
    10/9/2011 Bname Abc2 20
    10/8/2011 BName Xyz2 30
    10/7/2011 BName Xyz2 20
    10/6/2011 BName Abc2 20
    10/5/2011 BName Abc2 20
    10/4/2011 BName Xyz2 30

    Output should be :

    Id Start Date EndDate Name Product Quantity
    1 10/9/2011 10/11/2011 Aname Abc1 20
    2 10/8/2011 10/9/2011 Aname Xyz1 30
    3 10/7/2011 10/8/2011 Aname Xyz1 20
    4 10/5/2011 10/7/2011 Aname Abc1 20
    5 10/4/2011 10/5/2011 Aname Xyz1 30
    6 10/9/2011 10/11/2011 Bname Abc2 20
    7 10/8/2011 10/9/2011 Bname Xyz2 30
    8 10/7/2011 10/8/2011 Bname Xyz2 20
    9 10/5/2011 10/7/2011 Bname Abc2 20
    10 10/4/2011 10/5/2011 Bname Xyz2 30

    I do not have primary key on my Table.

    If you see the number of rows have been reduced.

    I have more than 500 million records and by filtering this I would significantly the number of records in the table.

    Is there any efficient query or sproc to do this, if so could you please help me.

    I have tried using group by , partitions, ranking but nothing worked .

    Thanks & Regards,

    VPandit.

    Reply
  • The columns are date, name, product and quantity.

    Reply
  • Hi Sir

    I am in a situation where I have the same CustomerID’s but different Customer names. They belong to the same Subsidiary Entity so the phone numbers and address must remain the same..

    Please help

    Reply
  • …This is how it looks

    CustomerID Name Address …
    2101 ABC 123 blaa
    2101 WQH 123 blaa

    The Name and Address should remain the same but the CustomerID must be changed

    Thank you

    Reply

Leave a Reply