SQL SERVER – Delete Duplicate Records – Rows

March 1, 2007

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts

SQL SERVER – T-SQL Script to find the CD key from Registry

SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

SQL SERVER – Practical SQL Server XML: Part One – Query Plan Cache and Cost of Operations in the Cache

March 17, 2009

SQL SERVER – SUM(1) vs COUNT(*) – Performance Observation

June 25, 2021

SQL SERVER – Return Specific Row to at the Bottom of the Resultset – T-SQL Script – Part 2

April 25, 2013

450 Comments. Leave new

Jaanoo
September 7, 2007 8:24 am
Awesome man. I have used you delete duplicate records query. And it works great.
Thanks
Reply
Scollege
September 14, 2007 11:30 am
Unfortunately, this query doesn’t seem to work when attempting to delete a row that has duplicate id’s but different data.
Can’t believe how hard it is to find something simple that will work.
Reply
- Madhivanan
  May 17, 2010 1:26 pm
  Post some sample data with expected result so that you will get the proper query that would work in your case
  Reply
Aditya
September 23, 2007 2:07 am
Why don’e we do a self union to the table.
select * from mytable union select * from mytable,
thsi query returns the exact records, and i guess we can use this data to build a new table or replacing the previous one.
Thoughts on this please
Reply
- Madhivanan
  May 12, 2010 4:51 pm
  Note that this is equivalent to using
  select * from mytable
  Reply
bingo
September 25, 2007 10:38 am
I am trying to design an query which functions like ICETOOL of mainframe it also requires this kind of duplicate deletions…
ICETOOL in my application (sorts, merges and separates) duplicates and uniques from 4 input file into two output files(one containing unique ids and other containing its duplicate ids with different data) and discards the duplicates which occurs more than twice…
i have the 4 files as tables now i need to get two output tables
help will be much appreciated………
Reply
Santosh
September 26, 2007 7:05 am
Hi
Sir can u please let me how to pass aliase name in Stored
procedure when I am using table as Parameter in A SP
Reply
- Madhivanan
  May 17, 2010 1:30 pm
  This is very bad design
  You need to make use of dynamic SQL
  Refer this
  http://www.sommarskog.se/dynamic_sql.html
  Reply
Emil
September 26, 2007 11:51 am
How would your solution work if I had a single table called Customers that looks like this:
ID LastName FirstName
1 Smith John Adam
2 Smith John A.
3 Jones Jane Mary
4 Jones Jane M.
No other fields are in this table. Just what you see. The ID field is unique for each record. In this example, both records for Smith are the same person. The same goes for Jones. I want to view in a table the resulting unique records (i.e. only one record for John Smith, and one record for Jane Jones).
I am a complete newbie and would appreciate guidance.
Thanks!
Reply
- Madhivanan
  May 17, 2010 1:44 pm
  You can use any one of the methods specified here depending on the version you are using
  Reply
Aditya
September 27, 2007 5:48 am
If you have Key already, you have seen many solutions in this post before, you need to group by on the repeating/columns that can be duplicated and delete them. The answered solution is if you dont have a unique key in the table.
well to add more to that, once you do self union and insert into temp table, you can delete every thing form main table and insert every thing from temp table in to the original/main table. I think this should help..
-Aditya
Reply
Joe
October 8, 2007 4:38 am
SELECT id,name,mNumber,sNumber,partOfSpeech,type FROM keyword where id IN (Select max(id) from keyword GROUP BY name)
The above is if you want to just select a column without duplicates in it.
Reply
Ravish Bharti
October 11, 2007 4:15 am
create table T1 (C1 int, C2 varchar(3))
–Fill it with duplicate Records
declare @as Table (c1 int,c2 varchar(3))
insert into @as select Distinct * from T1
Truncate table T1
Alter Table T1 Disable Trigger All
insert into T1 select * from @As
Alter Table T1 Enable Trigger All
Reply
Firdaus Juzup
October 19, 2007 10:13 pm
Thanks Mr. Dave for the codes.
I’ve searched inside out in the internet but still couldn’t find a solution for my problem.
Your code manage to solved it, your truly a master in SQL. Thanks again
Reply
Mhlove
November 12, 2007 12:18 am
Can you show how to make this a select Into, so that one can save the deleted records in another table
DELETE
FROM MyTable
WHERE ID NOT IN
(SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn2
Reply
- Madhivanan
  May 17, 2010 1:46 pm
  select * into newtable
  FROM MyTable
  WHERE ID NOT IN
  (SELECT MAX(ID)
  FROM MyTable
  GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn3
  Reply
raja
November 13, 2007 5:46 pm
I thing this one alos workout
====================
set rowcount 1
delete test
from test a
where(select count(*) from test b where b.age = a.age and b.name = a.name)
> 1
while @@rowcount>0
delete test
from test a
where(select count(*) from test b where b.age = a.age and b.name = a.name)
>1
set rowcount 0
Reply
- Madhivanan
  May 17, 2010 1:47 pm
  Note that for large number of datasets it will take long time to finish the execution. You should stay away from using this method
  Reply
Dilip Mohapatro
November 13, 2007 6:58 pm
//Here n is maximum allowable number of a record
create or replace procedure t_dili1_2
(n in int,q in out sys_refcursor)
as
begin
open q for
select a,b
from dil1 where (a,b) in(
select a,b from (select a,b,count(*) from t group by a,b having count(*)<n));
end t_dili1_2;
/
variable z refcursor
exec t_dili1_2(4,:z)
print z
Reply
Dilip Kumar Mohapatro
November 13, 2007 7:05 pm
Plz Reffer to my (above) comment (PL/SQL Block)..
Here dil1 is name of TABLE.
a,b are columns with repeated records.
I did is Using REFCURSOR.
Reply
- Madhivanan
  May 17, 2010 1:48 pm
  Note that this site is for MS SQL Server
  For ORACLE, post at http://www.orafaq.com
  Reply
Dilip Kumar Mohapatro
November 13, 2007 7:09 pm
Q: Display all the records of a table t (where there are several duplicated & reduplicated records present) with restricted number (n) of distinct records. (Use REFCURSOR)
Solution:
create or replace procedure t_dili1_2
(n in int,q in out sys_refcursor)
as
begin
open q for
select a,b
from t where (a,b) in(
select a,b from (select a,b,count(*) from t group by a,b having count(*)<n));
end t_dilip1_2;
/
variable z refcursor
exec t_dili1_2(4,:z)
print z
Reply
SQL Novice
November 13, 2007 8:13 pm
Pinal,
You are truly a SQL guru and I love your website and your articles.
The statement above that deletes duplicate records worked perfectly for me.
Thanks.
Reply
maybemedic
November 14, 2007 9:11 am
Hello All,
I have a situation where I have two tables A & B. A has the actual data, table B maintains a sequence of data that is in table A. I have a process that polls table A every 5 seconds. Whenever a record is updated in table A with a TRANSID n, table B is updated with the same TRANSID. This said, what I would like to achieve is, when I poll table A, I should get all the records that were updated in the past 5 seconds. The catch is, same record can be updated more than once and end up updated with a greater TRANSID, table B is updated with the same TRANSID, but if same record is updated, one of their columns REPORT_ID, would be the same in table A(REPORT_ID not in table B). I should be able to get the most recent of the duplicate records and other records that were updated. Any help is greatly appreciated since writing SQL queries is not one of my fortes.
Reply
Mark
November 17, 2007 6:23 am
YOu rock man!
Reply
haneef
November 19, 2007 6:47 pm
how to find second highest salary of the employee table
please explain
Reply
- Madhivanan
  May 17, 2010 1:50 pm
  There are many number of methods available
  Choose the effecient method
  Refer this pos
  Reply
Atul Saxena
November 20, 2007 7:38 pm
dear Pinaldave
u r genius
u have solved my problems on deleting duplicate records
Reply