SQL SERVER – Delete Duplicate Records – Rows

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts
Previous Post
SQL SERVER – T-SQL Script to find the CD key from Registry
Next Post
SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

Related Posts

450 Comments. Leave new

  • Awesome man. I have used you delete duplicate records query. And it works great.

    Thanks

    Reply
  • Unfortunately, this query doesn’t seem to work when attempting to delete a row that has duplicate id’s but different data.

    Can’t believe how hard it is to find something simple that will work.

    Reply
    • Post some sample data with expected result so that you will get the proper query that would work in your case

      Reply
  • Why don’e we do a self union to the table.

    select * from mytable union select * from mytable,

    thsi query returns the exact records, and i guess we can use this data to build a new table or replacing the previous one.

    Thoughts on this please

    Reply
  • I am trying to design an query which functions like ICETOOL of mainframe it also requires this kind of duplicate deletions…

    ICETOOL in my application (sorts, merges and separates) duplicates and uniques from 4 input file into two output files(one containing unique ids and other containing its duplicate ids with different data) and discards the duplicates which occurs more than twice…

    i have the 4 files as tables now i need to get two output tables

    help will be much appreciated………

    Reply
  • Hi
    Sir can u please let me how to pass aliase name in Stored
    procedure when I am using table as Parameter in A SP

    Reply
  • How would your solution work if I had a single table called Customers that looks like this:

    ID LastName FirstName
    1 Smith John Adam
    2 Smith John A.
    3 Jones Jane Mary
    4 Jones Jane M.

    No other fields are in this table. Just what you see. The ID field is unique for each record. In this example, both records for Smith are the same person. The same goes for Jones. I want to view in a table the resulting unique records (i.e. only one record for John Smith, and one record for Jane Jones).

    I am a complete newbie and would appreciate guidance.

    Thanks!

    Reply
    • You can use any one of the methods specified here depending on the version you are using

      Reply
  • If you have Key already, you have seen many solutions in this post before, you need to group by on the repeating/columns that can be duplicated and delete them. The answered solution is if you dont have a unique key in the table.

    well to add more to that, once you do self union and insert into temp table, you can delete every thing form main table and insert every thing from temp table in to the original/main table. I think this should help..

    -Aditya

    Reply
  • SELECT id,name,mNumber,sNumber,partOfSpeech,type FROM keyword where id IN (Select max(id) from keyword GROUP BY name)

    The above is if you want to just select a column without duplicates in it.

    Reply
  • create table T1 (C1 int, C2 varchar(3))

    –Fill it with duplicate Records

    declare @as Table (c1 int,c2 varchar(3))
    insert into @as select Distinct * from T1
    Truncate table T1
    Alter Table T1 Disable Trigger All
    insert into T1 select * from @As
    Alter Table T1 Enable Trigger All

    Reply
  • Thanks Mr. Dave for the codes.
    I’ve searched inside out in the internet but still couldn’t find a solution for my problem.
    Your code manage to solved it, your truly a master in SQL. Thanks again

    Reply
  • Can you show how to make this a select Into, so that one can save the deleted records in another table

    DELETE
    FROM MyTable
    WHERE ID NOT IN
    (SELECT MAX(ID)
    FROM MyTable
    GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn2

    Reply
    • select * into newtable
      FROM MyTable
      WHERE ID NOT IN
      (SELECT MAX(ID)
      FROM MyTable
      GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn3

      Reply
  • I thing this one alos workout
    ====================
    set rowcount 1
    delete test
    from test a
    where(select count(*) from test b where b.age = a.age and b.name = a.name)
    > 1
    while @@rowcount>0
    delete test
    from test a
    where(select count(*) from test b where b.age = a.age and b.name = a.name)
    >1
    set rowcount 0

    Reply
    • Note that for large number of datasets it will take long time to finish the execution. You should stay away from using this method

      Reply
  • Dilip Mohapatro
    November 13, 2007 6:58 pm

    //Here n is maximum allowable number of a record
    create or replace procedure t_dili1_2
    (n in int,q in out sys_refcursor)
    as

    begin

    open q for

    select a,b
    from dil1 where (a,b) in(
    select a,b from (select a,b,count(*) from t group by a,b having count(*)<n));

    end t_dili1_2;
    /

    variable z refcursor

    exec t_dili1_2(4,:z)

    print z

    Reply
  • Dilip Kumar Mohapatro
    November 13, 2007 7:05 pm

    Plz Reffer to my (above) comment (PL/SQL Block)..
    Here dil1 is name of TABLE.
    a,b are columns with repeated records.
    I did is Using REFCURSOR.

    Reply
  • Dilip Kumar Mohapatro
    November 13, 2007 7:09 pm

    Q: Display all the records of a table t (where there are several duplicated & reduplicated records present) with restricted number (n) of distinct records. (Use REFCURSOR)

    Solution:
    create or replace procedure t_dili1_2
    (n in int,q in out sys_refcursor)
    as

    begin

    open q for

    select a,b
    from t where (a,b) in(
    select a,b from (select a,b,count(*) from t group by a,b having count(*)<n));

    end t_dilip1_2;
    /

    variable z refcursor

    exec t_dili1_2(4,:z)

    print z

    Reply
  • Pinal,

    You are truly a SQL guru and I love your website and your articles.

    The statement above that deletes duplicate records worked perfectly for me.

    Thanks.

    Reply
  • Hello All,
    I have a situation where I have two tables A & B. A has the actual data, table B maintains a sequence of data that is in table A. I have a process that polls table A every 5 seconds. Whenever a record is updated in table A with a TRANSID n, table B is updated with the same TRANSID. This said, what I would like to achieve is, when I poll table A, I should get all the records that were updated in the past 5 seconds. The catch is, same record can be updated more than once and end up updated with a greater TRANSID, table B is updated with the same TRANSID, but if same record is updated, one of their columns REPORT_ID, would be the same in table A(REPORT_ID not in table B). I should be able to get the most recent of the duplicate records and other records that were updated. Any help is greatly appreciated since writing SQL queries is not one of my fortes.

    Reply
  • YOu rock man!

    Reply
  • how to find second highest salary of the employee table
    please explain

    Reply
  • dear Pinaldave
    u r genius

    u have solved my problems on deleting duplicate records

    Reply

Leave a Reply