SQL SERVER – Delete Duplicate Records – Rows

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts
Previous Post
SQL SERVER – T-SQL Script to find the CD key from Registry
Next Post
SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

Related Posts

450 Comments. Leave new

  • Hi,
    I need to Get a primary key in a particular table in MS-SQL SERVER 2000. How Can i get. Is there any query is available???

    Thanks in advance
    Mohan.V

    Reply
  • Pinal, great information on your site!

    I think Senthilnathan in comment #16 has the right solution that can be used for deleting duplicates and for finding the nth row in a group. The key is ROW_NUMBER() function with a Partition.

    — EXAMPLE TO GET THE 5th highest salary of the “Manager” group.

    — CREATE Common Table Expression CTE
    With MyCTE AS (
    SELECT
    ROW_NUMBER() OVER (PARTITION EmployeeType
    ORDER BY EmployeeType, Salary) AS OrderedGroupId,
    EmployeeId,
    EmployeeType,
    Salary
    FROM SalaryHistory)
    SELECT EmployeeId, — Query results of CTE
    Salary
    FROM MyCTE
    WHERE EmployeeType = “Manager”
    AND OrderedGroupId = 5

    This could also be a DELETE statement deleting duplicate items in a PARTITION (DELETE FROM MyCTE WHERE OrderedGroupId > 1)

    Reply
  • Imran Mohammed
    August 22, 2008 7:29 am

    @Mohan,

    1. If you want to see the primary on a specific table then Execute this stored procedure,

    Sp_help table_name — you have to give table name

    It will display all the table properties, like columns, keys, indexes…
    In constraints section it will show you all the keys details, on which column primary key is made.

    2. If you want to see all the primary keys on all tables, then use this query,

    SELECT A.CONSTRAINT_NAME ‘CONSTRAINT NAME’ , A.COLUMN_NAME ‘COLUMN NAME’, A.TABLE_NAME’TABLE NAME’, A.ORDINAL_POSITION ‘POSITION OF COLUMN’
    FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS B, INFORMATION_SCHEMA.KEY_COLUMN_USAGE A
    WHERE B.CONSTRAINT_NAME =A.CONSTRAINT_NAME AND CONSTRAINT_TYPE = ‘PRIMARY KEY’
    ORDER BY A.TABLE_NAME, ORDINAL_POSITION
    — Run this script as is don’t change anything

    This will give you primary key name, on which column it is made and which table it is made and also position of the column.

    Sometimes, we create composite primary keys, meaning primary key on more than one column on one table in that case when you run the above query you will see table name more than once, dont get confuse, see the “position of the column” ( in the output) and then you will come to know that this table has a composite primary key.

    ** A table can have only one Primary key.

    Reply
  • Rahul ask how if there is no key in the table to delete the duplicate data,

    Please use this sql to do it, @ID, @Lname and @fname is depend on our field in the table:

    DECLARE @ID varchar(150), @lname varchar(150), @fname varchar(150), @MyCount int, @Counting Int

    DECLARE authors_cursor CURSOR FOR
    SELECT id, FNAME, LNAME, COUNT(*) AS myCount
    FROM dbo.Table1
    GROUP BY id, FNAME, LNAME HAVING (COUNT(*) > 1)

    open authors_cursor

    FETCH NEXT FROM authors_cursor
    INTO @ID, @fname, @LName, @MYCount

    WHILE @@FETCH_STATUS = 0 BEGIN
    DECLARE aut2 CURSOR local scroll dynamic OPTIMISTIC FOR
    SELECT *
    FROM table1
    WHERE id = @id AND lname = @lname AND fname = @fname
    OPEN aut2
    SET @counting = 1

    FETCH next FROM aut2
    begin

    WHILE @Counting < @MyCount BEGIN
    print @counting
    DELETE FROM table1 WHERE CURRENT OF aut2
    SET @Counting = @Counting + 1
    FETCH next FROM aut2
    END
    –end

    end
    CLOSE aut2
    DEALLOCATE aut2
    FETCH NEXT FROM authors_cursor
    INTO @ID, @Lname, @FName, @MYCount

    END
    CLOSE authors_cursor
    DEALLOCATE authors_cursor

    Reply
  • Answer for Rasmi, please study this one:

    declare @MyStr varchar(150), @pjg int

    DECLARE authors_cursor CURSOR FOR
    SELECT LNAME
    FROM dbo.Table1

    open authors_cursor
    FETCH NEXT FROM authors_cursor
    INTO @mystr

    WHILE @@FETCH_STATUS = 0 BEGIN
    set @pjg=len(@mystr)-1
    while @pjg>=0
    begin
    print substring(@mystr,len(@mystr)-@pjg,1)
    if isnumeric(substring(@mystr,len(@mystr)-@pjg,1))=0

    set @mystr=replace (@mystr,substring(@mystr,len(@mystr)-@pjg,1),”)

    set @pjg=@pjg-1
    end
    print @myStr
    update table1 set lname=@mystr where current of authors_cursor
    FETCH NEXT FROM authors_cursor
    INTO @mystr
    end
    print @myStr

    CLOSE authors_cursor
    DEALLOCATE authors_cursor

    Reply
  • In most cases, two duplicate records in a table may not be the same. For example, a customer may enter his name or address, or phone number in different ways, but only from his email, we know this is the same person.

    My question is, if a table contains duplicate records like this, and I only want to keep latest record of the dups. I mean, based on customer’s email, and the date of registration, can we have a way to delete the duplicate records? for example:

    table_customers contains “email”, “name”, and “date”:

    ab@c.com John Smith 24/8/2006
    ab@c.com John S. 12/9/2007

    I only wants to keep the latest record if two records have the same email, no matter if the names are the same or not.

    Thank you for your help.

    Mike

    Reply
  • For Mike may be you can use this Query:

    DELETE table_customers
    FROM (SELECT Email, MAX(Date) AS Date
    FROM table_customers
    GROUP BY Email
    HAVING Email = ‘ab@c.com’)) DERIVEDTBL
    WHERE table_customers.Email = DERIVEDTBL.Email AND table_customers.Date DERIVEDTBL.Date

    Reply
  • Hi

    I have a question. Below is the Employee table
    EmpID EmpName Salary
    ———– —————————– ———-
    1 Aashish 15000
    1 Aashish 15000
    3 Gunjan 25000
    3 Gunjan 25000
    5 Atul 35000
    6 Animesh 20000

    Now I have to keep only one record either of the two and remove the duplicate record. Condition is There should not be any use of temp tables, no primary keys, no identity columns, no cursors. Everything has to be handeled in a query . And Database is MS SQL Server 2000

    Reply
  • For Aashish Mangal, You Can Use Query Analyzer Like This:

    ALTER TABLE [Employee] ADD [MySpecialIdx] [int] IDENTITY (1, 1) NOT NULL

    Go

    DELETE [Employee]
    FROM (SELECT EmpID, EmpName, Salary, MAX(MySpecialIdx) AS mySpecialIdx
    FROM [Employee]
    GROUP BY EmpID, EmpName, Salary)) DERIVEDTBL
    WHERE table_customers.EmpID = DERIVEDTBL.EmpID AND [Employee].EmpName = DERIVEDTBL.EmpName AND AND
    [Employee].EmpName = DERIVEDTBL.EmpName AND Employee.myspecialIdx DerivedTbl.myspecialidx

    Go

    ALTER TABLE [Employee] DROP COLUMN MySpecialIdx

    Go

    Or You Can Use In View and Execute Step By Step

    Reply
  • The Next for Aashish Mangal, we can use this query to delete the duplicate record:

    WHILE (SELECT TOP 1 COUNT(id) AS Expr1
    FROM dbo.Table1
    GROUP BY id, FNAME, LNAME
    HAVING (COUNT(id) > 1)) > 1
    BEGIN
    set rowcount 1
    DELETE table1
    FROM (SELECT TOP 1 id, FNAME, LNAME
    FROM dbo.Table1
    GROUP BY id, FNAME, LNAME
    HAVING (COUNT(id) > 1)) DERIVEDTBL
    WHERE table1.id = derivedtbl.id AND table1.fname = derivedtbl.fname AND table1.lname = derivedtbl.lname
    END

    ==
    If this query will be saved, the first time you must add any table to query than remove the query text and replaced with the above query text, then you can saved this query with your desired name.

    Reply
  • Sorry, in the fact, that query only can be executed and can not be saved.

    Reply
  • Hi Pinal,

    I had the same query i tried using Rank() function.

    It worked for me.

    Attaching code here:

    WITH DeleteDuplicates
    AS
    (
    SELECT RANK() OVER (PARTITION BY DUPFIELD ORDER BY PKFIELD ) AS RANK, * FROM TABLENAME
    )
    DELETE FROM DeleteDuplicates WHERE RANK > 1

    Wish add more value to the topic.

    Thanks,

    Mit_2807

    Reply
  • hi sir i did like this to delete duplicate rows…

    sir i accomplished this using rank functions… and CTE

    1)create table emp (id int,name varchar(20))

    2) i inserted 1,’rakesh ‘ 3 times
    and 2,’sagar’ 2 times

    deleting duplicate rows…

    with cte
    as
    (
    select id,name,rank() over(order by id) r,row_number() over(order by id) rn from emp
    )
    delete from cte
    where r rn

    will this query degrade the performance or not ……

    Reply
  • hi sir

    could you please give script for sending SMTP mail

    USING SQLSERVER….

    Reply
  • Dear All,

    i have a table called Player_details in that i have a data’ like below without any key i want to delete one record and another in data how to do this.

    100 chennaiRoyals 9841998470
    101 DareDevils 9984725487
    101 DareDevils 9984725487

    Both are same as like.

    can u pls help me

    Reply
  • To Find the 2nd Highest Salary:

    2nd highest salary

    1)SELECT max(salary) FROM Employee
    WHERE salary < (SELECT max(salary) FROM employee)

    Purushot

    Reply
  • your query works great !!! finally I found an elegant way to solve the problem without exploiting new sql commands which in SQL 2005 don’t exist yet.

    Reply
  • Dear All,

    i have a table called Player_details in that i have a data’ like below without any key i want to delete one record and another in data how to do this.

    100 chennaiRoyals 9841998470
    101 DareDevils 9984725487
    101 DareDevils 9984725487

    Both are same as like.

    can u pls help me

    Reply
  • Dear Purushot

    Suppose that table Player_details contains fields idx, Name and ID, and the table contains records as follows:
    100 chennaiRoyals 9841998470
    101 DareDevils 9984725487
    101 DareDevils 9984725487
    103 DareDevilx 9984725487
    103 DareDevilx 9984725487
    104 DareDevily 9984725487
    104 DareDevily 9984725487
    104 DareDevily 9984725487

    The we want delete the duplicate record, those are the records with idx=101 and idx=103 and idx=104, so that just 1 record will be kept, we can follow these step:

    1. Please create this query and save as PlayerDetails_ForDeletingDuplicate
    SELECT *
    FROM dbo.Player_details a
    WHERE ((SELECT COUNT(*)
    FROM dbo.Player_details b
    WHERE a.idx = b.idx AND a.name = b.name AND a.id = b.id) > 1)

    2. Please run this query, if we found any record viewed, then go to next step

    3. Edit this query, change the line:
    select *
    with
    delete dbo.Player_details

    4. In the top this query add this line
    set rowcount 1

    5. Execute this query several times until no more records affected

    6. close this query without save

    Reply
  • Dear All,
    I have a table which can accept duplicate records, but it should not accept duplicate within 20 seconds.I want to delete duplicate row which has been created back to back in 20 seconds. Is there any way to delete these records? Please let me know if you need more info about this.

    Thanks in advance.
    Cheers,
    Saravanan

    Reply

Leave a Reply