Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
Watch the view to see the above concept in action:
[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]
Reference : Pinal Dave (https://blog.sqlauthority.com)





450 Comments. Leave new
Hi Guys
The best method to delete duplicates is CTE’s
code as follows:-
select * from my table
;With my table2
( select row1,
row_number over() ( partition by row1, order by row1 ) as a
delete from table2 where a > 1
hi , can anyone explain the sql statement to me
since i am noobie and i not really understand the coding
thank alot :)
If there are duplicates in columns DuplicateColumn1, DuplicateColumn2, DuplicateColumn3 then the code will delete all related rows but keep the row with maximum id
Assume i have a temp table with some columns (as in the code )and i loaded that temp table with 1 lac record.
Table’s strucre is like below
Below table have 1 lac records.
Now there is one coloumn “PSGRES” also called as Sleep_stage_value, that having value btw 0 to 7 (0,1,2,3,4,5,6,7).
Now what i want is i want to delete the record that are repeating again.
Means lets assume First 10 records have “PSGRES” value as 7 and 11th to 14th record having value 1. so i want to delete the record from 2 to 10 as it having the same value 7 and after 12 to 14 as it have same value 2.
So after deleting those record in temp table , remaing records will be like below.
Result : (for all 1 lac record )
1st record with PSGRES value = 7
11 th record with PSGRES valie = 1
15 th record with PSGRES value = 2
and as on till end of the temp table.
CREATE TABLE #CDISK_MNewFinalAfterForCompare (
ROWID NUMERIC (8) IDENTITY (1, 1) NOT NULL,
TRANS_SOURCE_ID VARCHAR(100),
TRANS_DATE VARCHAR(100),
STUDYID VARCHAR(50), – Protocol Short Name —
TRANS_TYPE VARCHAR(50),
SITE_ID VARCHAR(50), – SITE_NUMBER
SCREEN_ID VARCHAR(50), – SCREEN_NUMBER
ALLOCATIONNUMBER VARCHAR(50), – ALLOCATIONNUMBER
PSG_VISIT_LABLE varchar(50), – VISIT_NUMBER
VISIT_TYPE VARCHAR(50), – MDB FILE , CUSTOMIZE 4
ACCESSIONID VARCHAR(100), – MDB FILE , Case_Number
ACCTUAL_COLL_DATATIME VARCHAR(100),
PSGTSCD VARCHAR(50), – PSGTSCD
VEN_LAB_TEST_NAME VARCHAR(50),
MER_LAB_TEST_ID VARCHAR(50),
MER_LAB_TEST_NAME VARCHAR(50),
TEST_STATUS VARCHAR(50),
COMMENT varchar(255), – COMMENT
PSGRES VARCHAR(50), – PSGRES
TEXT_RESULT VARCHAR(50),
REPORT_UNIT VARCHAR(50),
RESULT_TYPE VARCHAR(50),
TRANSFER_TYPE VARCHAR(50),
DATA_TYPE VARCHAR(50)
)
So for that i am using loop and cursor to get this output but it is very slow.
So can you guys help me out achieve this in query with out using loops and cursor.
thanks.
Hi Pinal,
I have a 280 records, it contains repeated 40 records for 7 times. I need to have only one set of 40 records. How can I do this. The total contains nearly 60 Lakhs rows. Please suggest me a good option to do this. Thanks in advance.
Regards,
Prabhu
If all columns are repeated use distinct in the SELECT clause
Remove Duplicate Row without PK
[ Just Copy & Paste ]
declare @TBL table(EmpID int)
insert into @TBL(EmpID)
values(90020),(90020),(90021),(90022),(90023),(90022),(90025),(90022)
select * from @TBL order by EmpID
;with TBLCTE(EmpID,Ranking)
AS
(
select
EmpID,
Ranking = DENSE_RANK() over (PARTITION BY EmpID order by newID())
from @TBL
)
delete from TBLCTE where Ranking > 1
select * from @TBL order by EmpID
Hi guys,
Having trouble trying to figure out how I can get the ID column of my duplicate records.. ID is unique and required for delete.
Table is like so:
ID | Firstname | Surname
1 Fred Smith
2 Fred Jones
3 Alan Thompson
4 Mary Phillips
5 Fred Smith
6 David Jones
Expected result shows:
1 Fred Smith
5 Fred Smith
Currently I have the code below, works great, but no ID
SELECT firstname,surname,count(*)
FROM customer
group by firstname,surname
having count(*) > 1
Is this possible?
Appreciate any assistance..
select t1.* from customer as t1 inner join
(
SELECT firstname,surname,count(*)
FROM customer
group by firstname,surname
having count(*) > 1
) as t2 on t1.firstname=t2.firstname and t1.surname=t2.surname
Hi all programmers…
I am glad know this site,
Hi,
Let us assume that we have duplicate records in emp table.
delete from emp
where empid not in (select distinct(empid) from emp).
Think this query works out in deleting the duplicate rows in a table.
This is not correct. It will not delete any row in the table
Shut up. This is correct.
DECLARE temp CURSOR FOR SELECT id FROM A GROUP BY A.ID
DECLARE @id int
OPEN temp
FETCH next FROM temp
INTO @id
WHILE @@FETCH_STATUS=0
BEGIN
IF EXISTS (SELECT id FROM GROUP BY id HAVING count(*)>1 AND id=@id)
DELETE TOP(SELECT count(*)-1 FROM WHERE id=@id )FROM WHERE id=@id
FETCH next FROM temp
INTO @id
END
CLOSE temp
DEALLOCATE temp
You dont need to use a cursor for this. Refer point 6 at
Hi,pinal,
when we handling a large database like 10GB size of records.
how can i find duplicates and delete that duplicates.
finding duplicate is not a problem but how to remove those duplicates ?
Please suggest me.
Thank u!
Refer method 6
Hi,
Find another method using loop.
Consider the table #temp as below,
Id Value
——————
1 100
2 200
2 200
3 300
3 300
3 300
DECLARE @Cnt INT
SET @Cnt = 1
WHILE @Cnt IS NOT NULL
BEGIN
SELECT TOP 1 @Cnt = COUNT(Id) FROM #Test
GROUP BY Id,Value
HAVING COUNT(Id) > 1
ORDER BY Id,Value
IF @Cnt IS NULL OR @Cnt = 1 RETURN
SET @Cnt = @Cnt-1
SET ROWCOUNT @Cnt
DELETE t1
FROM #Test t1
JOIN
(SELECT TOP 1 Id,Value,COUNT(Id) Cnt FROM #Test
GROUP BY Id,Value
HAVING COUNT(Id) > 1
ORDER BY Id,Value) t2 ON t2.Id = t1.Id AND t2.Value = t1.Value
SET ROWCOUNT 0
END
This will be very ineffecient for large table. You can use other methods like point 6 here
CREATE TABLE [dbo].[BankMaster](
[BankID] [int] NOT NULL,
[Bank Name] [nvarchar](50) NOT NULL,
[Address] [nvarchar](max) NULL,
CONSTRAINT [PK_BankMaster] PRIMARY KEY CLUSTERED
(
[BankID] ASC
)
)
INSERT INTO BankMaster
VALUES
(1,’ING Vysya’,’ING Vysya Eranamkulam’),
(2,’SBT’,’SBT Eranamkulam’),
(3,’Federal’,’Federal Bank Eranamkulam’),
(4,’Canara’,’Canara Eranamkulam’),
(5,’ING Vysya’,’ING Vysya Eranamkulam’),
(6,’SIB’,’SIB Eranamkulam’),
(7,’SBT’,’SBT Eranamkulam’),
(8,’Canara’,’Canara Eranamkulam’),
(9,’ING Vysya’,’ING Vysya Eranamkulam’),
(10,’SBI’,’SBI Eranamkulam’)
DECLARE @BankName NVARCHAR(50),@Cnt INT
DECLARE Cursor_Bank CURSOR LOCAL FAST_FORWARD READ_ONLY FOR
SELECT BankName,COUNT(BankName) – 1 from BankMaster
GROUP BY BankName HAVING COUNT(BankName) > 1
OPEN Cursor_Bank
FETCH NEXT FROM Cursor_Bank INTO @bankname,@Cnt
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET ROWCOUNT @Cnt
DELETE FROM BankMaster WHERE BankName like @bankname
SET ROWCOUNT 0
FETCH NEXT FROM Cursor_Bank INTO @bankname,@Cnt
END
CLOSE Cursor_Bank
DEALLOCATE Cursor_Bank
This will be slower for large table. Use other set based methods like point 6 specified here
delete from [BankMaster] where [BankID] in
(
select [BankID] from (select [BankID], row_number() over(partition by [Bank Name] order by [BankID])RNK from [BankMaster]) BankMaster
where RNK > 1
)
Dear Pinal,
Select Distinct Row_Number() Over (Order By CandidateM.CanID Asc)As SlNo, CanCode,CanName from CandidateM Where CandidateM.CanCode like ‘09%’
Using this query how can I take rownumber of a distinct record….?
regards,
josekutty varghese
Try using rank function
Select Rank() Over (Order By CandidateM.CanID Asc)As SlNo, CanCode,CanName from CandidateM Where CandidateM.CanCode like ‘09%’
Thanks
Riketa Salariya & Dhiresh jawale
I have a quite different request :
Below is the table with sample data that I have
Date Name Product Quantity
10/11/2011 Aname Abc1 20
10/10/2011 Aname Abc1 20
10/9/2011 Aname Abc1 20
10/8/2011 AName Xyz1 30
10/7/2011 AName Xyz1 20
10/6/2011 AName Abc1 20
10/5/2011 AName Abc1 20
10/4/2011 BName Xyz1 30
10/11/2011 Bname Abc2 20
10/10/2011 Bname Abc2 20
10/9/2011 Bname Abc2 20
10/8/2011 BName Xyz2 30
10/7/2011 BName Xyz2 20
10/6/2011 BName Abc2 20
10/5/2011 BName Abc2 20
10/4/2011 BName Xyz2 30
Output should be :
Id Start Date EndDate Name Product Quantity
1 10/9/2011 10/11/2011 Aname Abc1 20
2 10/8/2011 10/9/2011 Aname Xyz1 30
3 10/7/2011 10/8/2011 Aname Xyz1 20
4 10/5/2011 10/7/2011 Aname Abc1 20
5 10/4/2011 10/5/2011 Aname Xyz1 30
6 10/9/2011 10/11/2011 Bname Abc2 20
7 10/8/2011 10/9/2011 Bname Xyz2 30
8 10/7/2011 10/8/2011 Bname Xyz2 20
9 10/5/2011 10/7/2011 Bname Abc2 20
10 10/4/2011 10/5/2011 Bname Xyz2 30
I do not have primary key on my Table.
If you see the number of rows have been reduced.
I have more than 500 million records and by filtering this I would significantly the number of records in the table.
Is there any efficient query or sproc to do this, if so could you please help me.
I have tried using group by , partitions, ranking but nothing worked .
Thanks & Regards,
VPandit.
The columns are date, name, product and quantity.
Hi Sir
I am in a situation where I have the same CustomerID’s but different Customer names. They belong to the same Subsidiary Entity so the phone numbers and address must remain the same..
Please help
…This is how it looks
CustomerID Name Address …
2101 ABC 123 blaa
2101 WQH 123 blaa
The Name and Address should remain the same but the CustomerID must be changed
Thank you