SQL SERVER – Delete Duplicate Records – Rows

March 1, 2007

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts

SQL SERVER – T-SQL Script to find the CD key from Registry

SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

SQL SERVER – FIX: Msg 15281 – SQL Server Blocked Access to STATEMENT ‘OpenRowset/OpenDatasource’ of Component ‘Ad Hoc Distributed Queries’

December 8, 2018

SQL SERVER – Difference Between Database Mail and SQLMail

July 5, 2008

SQL SERVER – Difference TempTable and Table Variable – Table Variable in Memory a Myth

December 15, 2009

450 Comments. Leave new

Nilanka
December 23, 2010 8:35 pm
Hi Guys
The best method to delete duplicates is CTE’s
code as follows:-
select * from my table
;With my table2
( select row1,
row_number over() ( partition by row1, order by row1 ) as a
delete from table2 where a > 1
Reply
koo
December 28, 2010 3:54 am
hi , can anyone explain the sql statement to me
since i am noobie and i not really understand the coding
thank alot :)
Reply
- madhivanan
  August 18, 2011 12:33 pm
  If there are duplicates in columns DuplicateColumn1, DuplicateColumn2, DuplicateColumn3 then the code will delete all related rows but keep the row with maximum id
  Reply
Chintan Dave
January 26, 2011 9:22 pm
Assume i have a temp table with some columns (as in the code )and i loaded that temp table with 1 lac record.
Table’s strucre is like below
Below table have 1 lac records.
Now there is one coloumn “PSGRES” also called as Sleep_stage_value, that having value btw 0 to 7 (0,1,2,3,4,5,6,7).
Now what i want is i want to delete the record that are repeating again.
Means lets assume First 10 records have “PSGRES” value as 7 and 11th to 14th record having value 1. so i want to delete the record from 2 to 10 as it having the same value 7 and after 12 to 14 as it have same value 2.
So after deleting those record in temp table , remaing records will be like below.
Result : (for all 1 lac record )
1st record with PSGRES value = 7
11 th record with PSGRES valie = 1
15 th record with PSGRES value = 2
and as on till end of the temp table.
CREATE TABLE #CDISK_MNewFinalAfterForCompare (
ROWID NUMERIC (8) IDENTITY (1, 1) NOT NULL,
TRANS_SOURCE_ID VARCHAR(100),
TRANS_DATE VARCHAR(100),
STUDYID VARCHAR(50), – Protocol Short Name —
TRANS_TYPE VARCHAR(50),
SITE_ID VARCHAR(50), – SITE_NUMBER
SCREEN_ID VARCHAR(50), – SCREEN_NUMBER
ALLOCATIONNUMBER VARCHAR(50), – ALLOCATIONNUMBER
PSG_VISIT_LABLE varchar(50), – VISIT_NUMBER
VISIT_TYPE VARCHAR(50), – MDB FILE , CUSTOMIZE 4
ACCESSIONID VARCHAR(100), – MDB FILE , Case_Number
ACCTUAL_COLL_DATATIME VARCHAR(100),
PSGTSCD VARCHAR(50), – PSGTSCD
VEN_LAB_TEST_NAME VARCHAR(50),
MER_LAB_TEST_ID VARCHAR(50),
MER_LAB_TEST_NAME VARCHAR(50),
TEST_STATUS VARCHAR(50),
COMMENT varchar(255), – COMMENT
PSGRES VARCHAR(50), – PSGRES
TEXT_RESULT VARCHAR(50),
REPORT_UNIT VARCHAR(50),
RESULT_TYPE VARCHAR(50),
TRANSFER_TYPE VARCHAR(50),
DATA_TYPE VARCHAR(50)
)
So for that i am using loop and cursor to get this output but it is very slow.
So can you guys help me out achieve this in query with out using loops and cursor.
thanks.
Reply
Prabhu
February 21, 2011 12:20 pm
Hi Pinal,
I have a 280 records, it contains repeated 40 records for 7 times. I need to have only one set of 40 records. How can I do this. The total contains nearly 60 Lakhs rows. Please suggest me a good option to do this. Thanks in advance.
Regards,
Prabhu
Reply
- madhivanan
  February 22, 2011 8:18 pm
  If all columns are repeated use distinct in the SELECT clause
  Reply
Manish
February 22, 2011 11:32 am
Remove Duplicate Row without PK
[ Just Copy & Paste ]
declare @TBL table(EmpID int)
insert into @TBL(EmpID)
values(90020),(90020),(90021),(90022),(90023),(90022),(90025),(90022)
select * from @TBL order by EmpID
;with TBLCTE(EmpID,Ranking)
AS
(
select
EmpID,
Ranking = DENSE_RANK() over (PARTITION BY EmpID order by newID())
from @TBL
)
delete from TBLCTE where Ranking > 1
select * from @TBL order by EmpID
Reply
Mark
April 8, 2011 11:39 am
Hi guys,
Having trouble trying to figure out how I can get the ID column of my duplicate records.. ID is unique and required for delete.
Table is like so:
ID | Firstname | Surname
1 Fred Smith
2 Fred Jones
3 Alan Thompson
4 Mary Phillips
5 Fred Smith
6 David Jones
Expected result shows:
1 Fred Smith
5 Fred Smith
Currently I have the code below, works great, but no ID
SELECT firstname,surname,count(*)
FROM customer
group by firstname,surname
having count(*) > 1
Is this possible?
Appreciate any assistance..
Reply
- madhivanan
  August 18, 2011 12:31 pm
  select t1.* from customer as t1 inner join
  (
  SELECT firstname,surname,count(*)
  FROM customer
  group by firstname,surname
  having count(*) > 1
  ) as t2 on t1.firstname=t2.firstname and t1.surname=t2.surname
  Reply
Syelina
April 27, 2011 7:59 am
Hi all programmers…
I am glad know this site,
Reply
NandaGanesh
April 28, 2011 12:22 pm
Hi,
Let us assume that we have duplicate records in emp table.
delete from emp
where empid not in (select distinct(empid) from emp).
Think this query works out in deleting the duplicate rows in a table.
Reply
- madhivanan
  April 28, 2011 7:35 pm
  This is not correct. It will not delete any row in the table
  Reply
Sanket
May 23, 2011 2:35 pm
Shut up. This is correct.
Reply
Brajesh Goswami
July 4, 2011 6:25 pm
DECLARE temp CURSOR FOR SELECT id FROM A GROUP BY A.ID
DECLARE @id int
OPEN temp
FETCH next FROM temp
INTO @id
WHILE @@FETCH_STATUS=0
BEGIN
IF EXISTS (SELECT id FROM GROUP BY id HAVING count(*)>1 AND id=@id)
DELETE TOP(SELECT count(*)-1 FROM WHERE id=@id )FROM WHERE id=@id
FETCH next FROM temp
INTO @id
END
CLOSE temp
DEALLOCATE temp
Reply
- madhivanan
  July 5, 2011 8:00 pm
  You dont need to use a cursor for this. Refer point 6 at
  Reply
Rajesh
August 11, 2011 8:48 pm
Hi,pinal,
when we handling a large database like 10GB size of records.
how can i find duplicates and delete that duplicates.
finding duplicate is not a problem but how to remove those duplicates ?
Please suggest me.
Thank u!
Reply
- madhivanan
  August 18, 2011 12:28 pm
  Refer method 6
  Reply
Rojo
August 18, 2011 11:36 am
Hi,
Find another method using loop.
Consider the table #temp as below,
Id Value
——————
1 100
2 200
2 200
3 300
3 300
3 300
DECLARE @Cnt INT
SET @Cnt = 1
WHILE @Cnt IS NOT NULL
BEGIN
SELECT TOP 1 @Cnt = COUNT(Id) FROM #Test
GROUP BY Id,Value
HAVING COUNT(Id) > 1
ORDER BY Id,Value
IF @Cnt IS NULL OR @Cnt = 1 RETURN
SET @Cnt = @Cnt-1
SET ROWCOUNT @Cnt
DELETE t1
FROM #Test t1
JOIN
(SELECT TOP 1 Id,Value,COUNT(Id) Cnt FROM #Test
GROUP BY Id,Value
HAVING COUNT(Id) > 1
ORDER BY Id,Value) t2 ON t2.Id = t1.Id AND t2.Value = t1.Value
SET ROWCOUNT 0
END
Reply
- madhivanan
  August 18, 2011 12:27 pm
  This will be very ineffecient for large table. You can use other methods like point 6 here
  Reply
Nikhildas
August 26, 2011 11:42 am
CREATE TABLE [dbo].[BankMaster](
[BankID] [int] NOT NULL,
[Bank Name] [nvarchar](50) NOT NULL,
[Address] [nvarchar](max) NULL,
CONSTRAINT [PK_BankMaster] PRIMARY KEY CLUSTERED
(
[BankID] ASC
)
)
INSERT INTO BankMaster
VALUES
(1,’ING Vysya’,’ING Vysya Eranamkulam’),
(2,’SBT’,’SBT Eranamkulam’),
(3,’Federal’,’Federal Bank Eranamkulam’),
(4,’Canara’,’Canara Eranamkulam’),
(5,’ING Vysya’,’ING Vysya Eranamkulam’),
(6,’SIB’,’SIB Eranamkulam’),
(7,’SBT’,’SBT Eranamkulam’),
(8,’Canara’,’Canara Eranamkulam’),
(9,’ING Vysya’,’ING Vysya Eranamkulam’),
(10,’SBI’,’SBI Eranamkulam’)
DECLARE @BankName NVARCHAR(50),@Cnt INT
DECLARE Cursor_Bank CURSOR LOCAL FAST_FORWARD READ_ONLY FOR
SELECT BankName,COUNT(BankName) – 1 from BankMaster
GROUP BY BankName HAVING COUNT(BankName) > 1
OPEN Cursor_Bank
FETCH NEXT FROM Cursor_Bank INTO @bankname,@Cnt
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET ROWCOUNT @Cnt
DELETE FROM BankMaster WHERE BankName like @bankname
SET ROWCOUNT 0
FETCH NEXT FROM Cursor_Bank INTO @bankname,@Cnt
END
CLOSE Cursor_Bank
DEALLOCATE Cursor_Bank
Reply
- madhivanan
  August 26, 2011 3:20 pm
  This will be slower for large table. Use other set based methods like point 6 specified here
  Reply
Rakesh Kumar Rautray
September 6, 2011 12:51 pm
delete from [BankMaster] where [BankID] in
(
select [BankID] from (select [BankID], row_number() over(partition by [Bank Name] order by [BankID])RNK from [BankMaster]) BankMaster
where RNK > 1
)
Reply
josekutty varghese
September 17, 2011 2:59 pm
Dear Pinal,
Select Distinct Row_Number() Over (Order By CandidateM.CanID Asc)As SlNo, CanCode,CanName from CandidateM Where CandidateM.CanCode like ‘09%’
Using this query how can I take rownumber of a distinct record….?
regards,
josekutty varghese
Reply
- madhivanan
  April 9, 2018 4:53 pm
  Try using rank function
  Select Rank() Over (Order By CandidateM.CanID Asc)As SlNo, CanCode,CanName from CandidateM Where CandidateM.CanCode like ‘09%’
  Reply
Riketa Salariya
October 3, 2011 5:28 pm
Thanks
Riketa Salariya & Dhiresh jawale
Reply
Varun
October 13, 2011 5:08 pm
I have a quite different request :
Below is the table with sample data that I have
Date Name Product Quantity
10/11/2011 Aname Abc1 20
10/10/2011 Aname Abc1 20
10/9/2011 Aname Abc1 20
10/8/2011 AName Xyz1 30
10/7/2011 AName Xyz1 20
10/6/2011 AName Abc1 20
10/5/2011 AName Abc1 20
10/4/2011 BName Xyz1 30
10/11/2011 Bname Abc2 20
10/10/2011 Bname Abc2 20
10/9/2011 Bname Abc2 20
10/8/2011 BName Xyz2 30
10/7/2011 BName Xyz2 20
10/6/2011 BName Abc2 20
10/5/2011 BName Abc2 20
10/4/2011 BName Xyz2 30
Output should be :
Id Start Date EndDate Name Product Quantity
1 10/9/2011 10/11/2011 Aname Abc1 20
2 10/8/2011 10/9/2011 Aname Xyz1 30
3 10/7/2011 10/8/2011 Aname Xyz1 20
4 10/5/2011 10/7/2011 Aname Abc1 20
5 10/4/2011 10/5/2011 Aname Xyz1 30
6 10/9/2011 10/11/2011 Bname Abc2 20
7 10/8/2011 10/9/2011 Bname Xyz2 30
8 10/7/2011 10/8/2011 Bname Xyz2 20
9 10/5/2011 10/7/2011 Bname Abc2 20
10 10/4/2011 10/5/2011 Bname Xyz2 30
I do not have primary key on my Table.
If you see the number of rows have been reduced.
I have more than 500 million records and by filtering this I would significantly the number of records in the table.
Is there any efficient query or sproc to do this, if so could you please help me.
I have tried using group by , partitions, ranking but nothing worked .
Thanks & Regards,
VPandit.
Reply
Varun
October 13, 2011 5:13 pm
The columns are date, name, product and quantity.
Reply
stuff
November 24, 2011 2:12 pm
Hi Sir
I am in a situation where I have the same CustomerID’s but different Customer names. They belong to the same Subsidiary Entity so the phone numbers and address must remain the same..
Please help
Reply
stuff
November 24, 2011 2:45 pm
…This is how it looks
CustomerID Name Address …
2101 ABC 123 blaa
2101 WQH 123 blaa
The Name and Address should remain the same but the CustomerID must be changed
Thank you
Reply