SQL SERVER – Select and Delete Duplicate Records – SQL in Sixty Seconds #036 – Video

December 19, 2012

SQL, SQL Server, SQL Tips and Tricks, SQL Video

SQL SERVER - Select and Delete Duplicate Records - SQL in Sixty Seconds #036 - Video 60x60 Developers often face situations when they find their column have duplicate records and they want to delete it. A good developer will never delete any data without observing it and making sure that what is being deleted is the absolutely fine to delete. Before deleting duplicate data, one should select it and see if the data is really duplicate.

In this video we are demonstrating two scripts – 1) selects duplicate records 2) deletes duplicate records.

We are assuming that the table has a unique incremental id. Additionally, we are assuming that in the case of the duplicate records we would like to keep the latest record. If there is really a business need to keep unique records, one should consider to create a unique index on the column. Unique index will prevent users entering duplicate data into the table from the beginning. This should be the best solution. However, deleting duplicate data is also a very valid request. If user realizes that they need to keep only unique records in the column and if they are willing to create unique constraint, the very first requirement of creating a unique constraint is to delete the duplicate records.

Let us see how to connect the values in Sixty Seconds:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Here is the script which is used in the video.

USE tempdb GO CREATE TABLE TestTable (ID INT, NameCol VARCHAR(100)) GO INSERT INTO TestTable (ID, NameCol) SELECT 1, 'First' UNION ALL SELECT 2, 'Second' UNION ALL SELECT 3, 'Second' UNION ALL SELECT 4, 'Second' UNION ALL SELECT 5, 'Second' UNION ALL SELECT 6, 'Third' GO -- Selecting Data SELECT * FROM TestTable GO -- Detecting Duplicate SELECT NameCol, COUNT(*) TotalCount FROM TestTable GROUP BY NameCol HAVING COUNT(*) > 1 ORDER BY COUNT(*) DESC GO -- Deleting Duplicate DELETE FROM TestTable WHERE ID NOT IN ( SELECT MAX(ID) FROM TestTable GROUP BY NameCol) GO -- Selecting Data SELECT * FROM TestTable GO DROP TABLE TestTable GO

Related Tips in SQL in Sixty Seconds:

What would you like to see in the next SQL in Sixty Seconds video?

Reference: Pinal Dave (https://blog.sqlauthority.com)

Database, SQL in Sixty Seconds, SQL Scripts, SQL Server Management Studio

SQL SERVER – Select the Most Optimal Backup Methods for Server

SQL SERVER – Difference Between CURRENT_TIMESTAMP and GETDATE() – CURRENT_TIMESTAMP Equivalent in SQL Server

25 Comments. Leave new

Sanjay Monpara
December 19, 2012 12:16 pm
In oracle we can use ROWID Pseudocolumn,
It helps to delete records even if table not contains incremental unique id & without using another temp table.
Is there any method in sql server to delete records in such situation without using another(temp) table?
Reply
Naren
December 19, 2012 4:14 pm
@Sanjay
You can use ROW_NUMBER() function in this case
Reply
Sanjay Monpara
December 19, 2012 6:22 pm
suppose single column table cantains records like below
NameCol
——–
First
Second
Second
Second
Second
Third
Then how its possible to delete duplicate records using row_number() function.
even if SSMS not allows to delete single record from duplicate set & showing error.
Reply
- Mandar Shindagi
  March 7, 2013 6:01 pm
  Sanjay, I think you are overcomplicating this. If you have a single column table, do a SELECT DISTINCT into a temp table, truncate your single column table and INSERT the distinct values back from temp table.
  You may want to go one step further by adding a PK constraint for future proofing.
  Hope this helps.
  Pinal, nice post mate.
  Reply
Kaz
December 19, 2012 9:27 pm
You will need to append a new column in your table, and then update it with row_number over (partition by namecol order by namecol), and then execute a delete command, delete from tabel where row_num > 1
Reply
- Sanjay Monpara
  December 20, 2012 10:57 am
  One doubt,
  we have to pass where condition to update records, then here possibility to update same row_num for duplicate records, isn’t it?
  Reply
Garrett
December 20, 2012 8:33 am
You don’t need to append a new column in the table, that is incorrect. You can utilize the row_number() and delete the duplicate records without creating a new column on the table.
Reply
- Ali Kolahdoozan
  December 20, 2012 9:50 am
  How could I user Row_Number in Delete Statement ?.If we did not have ID column , we are not able to delete duplicated record.
  Reply
Scott McPhaed
December 21, 2012 12:43 am
You can also use newID():
WITH cte(NameCol, RankField)
AS (SELECT NameCol
, RankField = DENSE_RANK()
OVER (
PARTITION BY NameCol
ORDER BY newID())
FROM
TestTable)
DELETE FROM cte
WHERE RankField > 1
Reply
- Sanjay Monpara
  December 21, 2012 11:04 am
  Very good Scott
  Its perfect!!!
  Reply
DoronF
March 7, 2013 4:06 am
Hi Pinal
I appreciate the info in this article.
Removing duplicate records could be a tedious job especially when you have hundreds of millions of records in one table. When removing duplicate records you may want to combine different fields’ values in one good record. Say you have a Phone Number in one records and DOB in another record but they are duplicate based on First, Last and Address.
Here is an article that will discuss the above issues and how to resolve the problem while integrating all required fields in one good record.
Any feedback is appreciated.
Regards,
Doron
Reply
Anuj Grewal
April 12, 2013 4:11 pm
DELETE
FROM TestTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM TestTable
GROUP BY NameCol)
this is not working ……………
Reply
Ramesh
September 23, 2013 3:30 pm
If the ID 6 has also duplicated records, I am unable to delete this record,Can anyone please say how to delete it.
Reply
avp
October 29, 2013 3:22 am
I found a easy way to remove duplicate records from tables
take script of all the Functions, constraints, triggers etc.
1.select distinct col,col2,col3,col4.. into dbname.dbo.TablewithoutDuplicates from dbname.dbo.TablewithDuplicates
2.create function,keys,triggers on new table “tablewithoutDuplicates”
3.Delete old table “TableWithDuplicates”
4. Rename the new table
Reply
Mr. Mann
November 7, 2013 3:58 am
Thanks. It worked fine.
Reply
João Paulo
January 22, 2014 4:57 pm
for 1 column is easy. for 2 columns?
Reply
Mathew
February 1, 2014 7:04 pm
Thank for this nice post. I got a great solution here that will be helpful to me for delete duplicate records in sql.
Reply
Aashika
July 15, 2015 12:57 pm
Thanks for the informative solution. Does it will work for huge data, I mean would it take long time if records are in millions. I am searching a technique to delete duplicates with very few time. Thanks
Reply
- Pinal Dave
  July 16, 2015 5:25 am
  Aashika – I have not tested it. If you can do that, please share the results.
  Reply
narmu
July 21, 2015 9:11 am
CREATE PROCEDURE [dbo].[Proc_deletedups] (@tablename varchar(30))
As — Add the parameters for the stored procedure here
BEGIN
SET NOCOUNT ON;
declare @deletequery varchar(max), @insertquery varchar(max)
— SET NOCOUNT ON added to prevent extra result sets from
— interfering with SELECT statements.
— delete dups from dbo.billtoparty table
Select @deletequery = ‘
DELETE FROM dbo.’+@tablename + ‘ WHERE ID NOT IN
(
SELECT MAX(ID)
FROM dbo.’ + @tablename +
‘ GROUP BY QuoteNumber, Quotedate, Quoteversion)’;
exec (@deletequery);
–insert unique rows into tar.billtoparty table
select @insertquery = ‘
insert into tar.’ + @tablename + ‘ (QuoteNumber, QuoteDate, QuoteVersion, AccountNumber, AccountName, SuperUserFirstName, SuperUserLastName, SuperUserEmail, ProcurementUserFirstName, ProcurementUserLastName, ProcurementUserEmail, CountryDialing, AreaDialing, LandNumber, Extension, Mobile, AddressLine1, AddressLine2, AddressLine3, Province, Zipcode, City, State, County, Country)
(select QuoteNumber, QuoteDate, QuoteVersion, AccountNumber, AccountName, SuperUserFirstName, SuperUserLastName, SuperUserEmail, ProcurementUserFirstName, ProcurementUserLastName, ProcurementUserEmail, CountryDialing, AreaDialing, LandNumber, Extension, Mobile, AddressLine1, AddressLine2, AddressLine3, Province, Zipcode, City, State, County, Country from dbo.’ + @tablename + ‘)’;
exec (@insertquery);
END
GO
i tried the above stored procedure. where i have to do it for 7 similar tables. so i created one and in a ssis task i will write the command with exec sp with table name and pass the parameters. but i guess it will affect the performance. is there any way i can do it better?
Reply
Roshan Maharjan
December 24, 2015 4:38 pm
you can’t specify target table ‘student’ for update in FROM clause
Reply
hari
September 30, 2016 2:04 pm
How to select the duplicate records , without using > symbol ?
Reply
Ragavendhran N
January 13, 2017 4:33 pm
As for in the above statement it would delete the only one instance of duplicate. What if there are many?
May I suggest for this.
WITH T1 AS
(
SELECT LASTNAME FROM TestTable GROUP BY NameCol HAVING COUNT(1) > 1
)
DELTE FROM TestTable T INNER JOIN T1 ON T1.NameCol = E.NameCol
DELETE, SELECT we can do whatever we wish
Reply
- Pinal Dave
  February 7, 2017 5:43 am
  Ragavendhran – Very interesting.
  Reply
111@hotmail.com
September 20, 2018 11:31 pm
awesome. worked great. cheers, brother
Reply