SQL SERVER – Select and Delete Duplicate Records – SQL in Sixty Seconds #036 – Video

Developers often face situations when they find their column have duplicate records and they want to delete it. A good developer will never delete any data without observing it and making sure that what is being deleted is the absolutely fine to delete. Before deleting duplicate data, one should select it and see if the data is really duplicate.

In this video we are demonstrating two scripts – 1) selects duplicate records 2) deletes duplicate records.

We are assuming that the table has a unique incremental id. Additionally, we are assuming that in the case of the duplicate records we would like to keep the latest record. If there is really a business need to keep unique records, one should consider to create a unique index on the column. Unique index will prevent users entering duplicate data into the table from the beginning. This should be the best solution. However, deleting duplicate data is also a very valid request. If user realizes that they need to keep only unique records in the column and if they are willing to create unique constraint, the very first requirement of creating a unique constraint is to delete the duplicate records.

Let us see how to connect the values in Sixty Seconds:

Here is the script which is used in the video.

USE tempdb
GO
CREATE TABLE TestTable (ID INT, NameCol VARCHAR(100))
GO
INSERT INTO TestTable (ID, NameCol)
SELECT 1, 'First'
UNION ALL
SELECT 2, 'Second'
UNION ALL
SELECT 3, 'Second'
UNION ALL
SELECT 4, 'Second'
UNION ALL
SELECT 5, 'Second'
UNION ALL
SELECT 6, 'Third'
GO
-- Selecting Data
SELECT *
FROM TestTable
GO
-- Detecting Duplicate
SELECT NameCol, COUNT(*) TotalCount
FROM TestTable
GROUP BY NameCol
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
GO
-- Deleting Duplicate
DELETE
FROM
TestTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM TestTable
GROUP BY NameCol)
GO
-- Selecting Data
SELECT *
FROM TestTable
GO
DROP TABLE TestTable
GO

Related Tips in SQL in Sixty Seconds:

What would you like to see in the next SQL in Sixty Seconds video?

Reference: Pinal Dave (http://blog.sqlauthority.com)

18 thoughts on “SQL SERVER – Select and Delete Duplicate Records – SQL in Sixty Seconds #036 – Video

  1. In oracle we can use ROWID Pseudocolumn,
    It helps to delete records even if table not contains incremental unique id & without using another temp table.

    Is there any method in sql server to delete records in such situation without using another(temp) table?

    Like

  2. suppose single column table cantains records like below

    NameCol
    ——–
    First
    Second
    Second
    Second
    Second
    Third

    Then how its possible to delete duplicate records using row_number() function.
    even if SSMS not allows to delete single record from duplicate set & showing error.

    Like

    • Sanjay, I think you are overcomplicating this. If you have a single column table, do a SELECT DISTINCT into a temp table, truncate your single column table and INSERT the distinct values back from temp table.
      You may want to go one step further by adding a PK constraint for future proofing.
      Hope this helps.

      Pinal, nice post mate.

      Like

  3. You will need to append a new column in your table, and then update it with row_number over (partition by namecol order by namecol), and then execute a delete command, delete from tabel where row_num > 1

    Like

    • One doubt,
      we have to pass where condition to update records, then here possibility to update same row_num for duplicate records, isn’t it?

      Like

  4. You don’t need to append a new column in the table, that is incorrect. You can utilize the row_number() and delete the duplicate records without creating a new column on the table.

    Like

  5. You can also use newID():

    WITH cte(NameCol, RankField)
    AS (SELECT NameCol
    , RankField = DENSE_RANK()
    OVER (
    PARTITION BY NameCol
    ORDER BY newID())
    FROM
    TestTable)
    DELETE FROM cte
    WHERE RankField > 1

    Like

  6. Pingback: SQL SERVER – Fix: Error: 1505 The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name and the index name « SQL Server Journey with SQL Authority

  7. Hi Pinal

    I appreciate the info in this article.

    Removing duplicate records could be a tedious job especially when you have hundreds of millions of records in one table. When removing duplicate records you may want to combine different fields’ values in one good record. Say you have a Phone Number in one records and DOB in another record but they are duplicate based on First, Last and Address.

    Here is an article that will discuss the above issues and how to resolve the problem while integrating all required fields in one good record. http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx
    Any feedback is appreciated.

    Regards,

    Doron

    Like

  8. DELETE
    FROM TestTable
    WHERE ID NOT IN
    (
    SELECT MAX(ID)
    FROM TestTable
    GROUP BY NameCol)
    this is not working ……………

    Like

  9. I found a easy way to remove duplicate records from tables
    take script of all the Functions, constraints, triggers etc.
    1.select distinct col,col2,col3,col4.. into dbname.dbo.TablewithoutDuplicates from dbname.dbo.TablewithDuplicates
    2.create function,keys,triggers on new table “tablewithoutDuplicates”
    3.Delete old table “TableWithDuplicates”
    4. Rename the new table

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s