SQL SERVER – Delete Duplicate Records – Rows

March 1, 2007

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts

SQL SERVER – T-SQL Script to find the CD key from Registry

SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

SQL SERVER – 2005 – List Tables in Database Without Primary Key

August 7, 2007

SQL SERVER – Difference Between Database Mail and SQLMail

July 5, 2008

SQL SERVER – 2005 – Display Fragmentation Information of Data and Indexes of Database Table

January 10, 2008

450 Comments. Leave new

Rahul
April 16, 2007 5:28 am
if there is no key in the table then what ?
I have on table named [Duplicate] {ID int,FNAME varchar(10),MNAME varchar(10)}
Here there is no key and here are duplicate rows. so hoca can i delete this duplicate rows.
Check Data
ID FNAME LNAME
1 AAA CCC
2 BBB DDD
1 AAA CCC
2 BBB DDD
1 AAA CCC
2 BBB DDD
3 BCB DGD
Remove duplicate rows and keep the data in to the table like this using single query.
ID FNAME LNAME
1 AAA CCC
2 BBB DDD
3 BCB DGD
PLease if possible help because i faced this question in many interviews.
Reply me
Reply
- krishna
  June 25, 2010 5:40 pm
  select distinct * into #temp from urtbl truncate table urtbl insert urtbl select * from #temp drop table #temp
  Reply
  - Swathi
    September 2, 2010 5:22 pm
    If we do as per u suggestion,can performance decreases
  - Madhivanan
    September 2, 2010 7:48 pm
    Yes. It will be a performance hit query
  - Senthil
    October 27, 2010 5:04 pm
    rno sname
    —————-
    1 Frieda
    1 Frieda
    1 Frieda
    4 Nathan
    6 senthil
    6 senthil
    6 senthil
    2 Senthil
    6 senthil
    1 Shanoj
    2 Shanoj
    4 Varun
    Step 1: Create duplicate table and move the duplicate value records of the original table to a duplicate table.
    SELECT DISTINCT * INTO DuplicateTb FROM dublicatetest GROUP BY sname,rno HAVING COUNT(rno) > 1
    After executed the above Query if you look the table the below result come.
    select * from DuplicateTb
    1 Frieda
    6 senthil
    select * from dublicatetest
    2 Shanoj
    1 Frieda
    2 Senthil
    4 Varun
    6 senthil
    1 Shanoj
    6 senthil
    4 Nathan
    6 senthil
    6 senthil
    1 Frieda
    1 Frieda
    Step 2: Delete all rows from the original table that also reside in the duplicate table.
    DELETE dublicatetest WHERE sname IN (SELECT sname FROM DuplicateTb)
    After executed the above Query if you look the table the below result come.
    select * from DuplicateTb
    1 Frieda
    6 senthil
    select * from dublicatetest
    2 Shanoj
    4 Varun
    1 Shanoj
    4 Nathan
    Step 3 : Move back the rows from duplicate table to original table.
    INSERT dublicatetest SELECT * FROM DuplicateTb
    After executed the above Query if you look the table the below result come.
    select * from DuplicateTb
    1 Frieda
    6 senthil
    select * from dublicatetest
    2 Shanoj
    1 Frieda
    6 senthil
    4 Varun
    1 Shanoj
    4 Nathan
    Step4: Drop the duplicate table.
    DROP TABLE DuplicateTb
    Dear friends now i hope u all are understood and got cleared.
    Good Luck
    Senthilkumar.T
    Be happy and try to make others happy
  - ivan mohapatra
    March 28, 2011 4:41 pm
    dear friends,
    here with i am sending a sample of code which might help u out in a easy manner
    select distinct * into newtable from oldtable
    truncate table oldtable
    insert into oldtable select * from newtable
    here we can have the distinct records or rows.
  - laxmi
    April 20, 2011 4:30 pm
    Thank for Ur code. Because it help to my interview.
  - Ajay
    September 28, 2013 2:52 pm
    but if Is Identity is set, in that condition, this code fails……
  - Rajeshkumar
    August 3, 2011 3:48 pm
    Nice code.I learn much more from your code
  - mahesh
    December 28, 2011 1:00 pm
    hi,
    how can we improve query performence. with same query dont use indexes view . for example select * from mytable query execute 2 min but the same query i want execute with in 1 min. hw can we do that one , please give me reply
  - gvnreddy
    November 1, 2010 7:22 pm
    dont worry man;
    make a call to me [phone number removed]
    select * from t1 where rowid=(select min(rowid) from
    t1 group by t1.c1)
  - madhivanan
    November 1, 2010 7:53 pm
    This will throw error. You need to relate the common key from the outer table
    select * from t1 as t where rowid=(select min(rowid) from
    t1 where c1=t.c1)
  - Noor Ul Islam
    January 5, 2013 5:45 pm
    delete from duplicate
    where fname in
    (
    select d1.fname
    from duplicate d1
    inner join duplicate d2
    on d1.fname=d2.fname and d1.Lname=d2.Lname
    group by d1.fname, d1.Lname
    having count (d1.fname)>1 and count(d1.Lname)>1
    )
  - madhivanan
    January 8, 2013 7:10 pm
    It can be easily done with row_number() function. Refer point 6 at
  - Jeff Rivera
    March 25, 2013 1:01 am
    If we assume two duplicates doesn’t this delete both records? How does your delete statement only remove one duplicate or is your goal to delete all duplicates? You can do a delete top but you must know how many duplicates
- Manish
  March 3, 2011 3:08 pm
  [ Just Copy & Paste ]
  declare @TBL table(EmpID int)
  insert into @TBL(EmpID)
  values(90020),(90020),(90021),(90022),(90023),(90022),(90025),(90022)
  select * from @TBL order by EmpID
  ;with TBLCTE(EmpID,Ranking)
  AS
  (
  select
  EmpID,
  Ranking = DENSE_RANK() over (PARTITION BY EmpID order by newID())
  from @TBL
  )
  delete from TBLCTE where Ranking > 1
  select * from @TBL order by EmpID
  ————-
  Reply
- Ivan Mohapatra
  August 18, 2011 12:36 pm
  Dear pinal,
  If i am wrong then correct me.
  As per identity column in a table u will never have a duplicate rows in a table
  because it grows as 1,2,3,4,5
  so,we can’nt have duplicate rows in a identity column.
  but,we can have duplicate data in other column of a identity tables.
  Reply
  - madhivanan
    August 18, 2011 4:43 pm
    That is the point of this post. That is why it is used to keep only one row for each combination of columns
- Mahesh
  August 17, 2012 3:44 pm
  Value1 | Value2
  1 | ‘Mahesh’
  1 | ‘Mahesh’
  2 | ‘Rahul’
  2 | ‘Rahul’
  With NewData
  as
  (
  select s.*,DENSE_RANK() OVER
  (PARTITION BY s.Value1 order by newID() ) r from SampleData s
  )
  DELETE FROM NewData where r=2
  Reply
Dan W
April 20, 2007 2:11 pm
Wouldn’t it be a little more simple to use Group By or Distinct to get the unique records in that situation?
SELECT ID, FNAME, LNAME
FROM Duplicate
GROUP BY ID, FNAME, LNAME
would give you the unique rows of Duplicate
Rough Steps of one possibility to fix the table:
1) Save the results of the group by query to a temp table
2) Delete rows from the original table
3) Fill original table with rows of temp table
That you way you don’t have to use a cursor or change the design of table Duplicate. However, in the real world, you’d want to add a unique constraint to the ID column of table Duplicate to make sure it doesn’t get any more duplicates by ID.
Reply
- Madhivanan
  June 25, 2010 4:26 pm
  If the version is 2005 or avoe, you can simply do
  delete t from
  (
  select row_number() over (partition by ID, FNAME, LNAME
  order by ID) , * from table
  ) as t
  where sno>1
  Reply
  - Naushad
    October 7, 2010 5:06 pm
    Hi All,
    I have used table expression to remove duplicate records.
    Here is the solution
    Create Table emp
    (
    id int identity,
    empname varchar(20),
    empdept varchar(10)
    )
    insert into emp values(‘A’,’X’)
    insert into emp values(‘A’,’X’)
    insert into emp values(‘A’,’X’)
    insert into emp values(‘B’,’X’)
    insert into emp values(‘B’,’X’)
    insert into emp values(‘C’,’X’)
    select * from emp
    WITH Dublicates_Emp(empname, empdept,id)
    AS
    (
    SELECT empname,empdept,max(id)
    FROM EMP
    GROUP BY empname,empdept
    HAVING COUNT(empname)>1
    )
    delete from emp where id in
    (
    select emp.id
    from emp inner join Dublicates_Emp as d
    on emp.empname = d.empname
    and emp.empdept = d.empdept
    and emp.id d.id
    )
Dan W
April 20, 2007 2:31 pm
Whoops, I almost forgot…
Thanks for the post about deleting duplicates! Your code was easy find with a Google search. It’s exactly what I was needing. It’s simple and elegant! I sometimes find it difficult to think of how to do things like that, so it was nice to find your post about it!
Reply
Ben M
May 1, 2007 8:19 pm
While searching for this solution I found many others which use cursors and complicated SQL statements but in the end they do the same thing as your simple & powerful solution. Thank you for posting it online!
Reply
Jelle
May 24, 2007 4:23 am
Seriously thanks,
Keep plopping my head why i didnt come up with simplicity.
— note
Plopping is a merely profound and uncatastrophic way of activating the tiny brain members that are usually on vacation.
— note 2
nevermind the note
Reply
jaort2007
May 24, 2007 9:29 am
Pinal,
Once again you are the man! I looked at other sites to answer this question and I found this SQL statement to be the best and easiest. Now I know to just come straight here.
Reply
Ravi
May 31, 2007 5:01 am
If you are looking for deleting multiple records (duplicate with more than 2 records like 3,4,5 duplicate records, etc) and without key column then the following method can be adopted.
Lets say your table is tab1.
SELECT DISTINCT * INTO tab2 FROM tab1 — distinct of all columns
DROP TABLE tab1
EXEC sp_rename ‘tab2′,’tab1’
Note:
Take a back up of your original table before using dropping for your reference. You may need to grant permission of tab2 as given for tab1
Reply
- Madhivanan
  May 12, 2010 4:43 pm
  I would say” dont drop the table”
  You can truncate it
  Reply
  - Virendra Yaduvanshi
    November 2, 2010 7:10 am
    Yes, truncate is good idea.
- Rahul
  June 9, 2012 2:21 am
  hey guys u shld try it—————————–
  ————delete duplicate record————-if we have table Male wd duplicate record just
  step 1– transfer distinct recoed of male table to female
  insert into Female(Name)
  select distinct Name from Male
  step 2–drop table Male
  step 3– Rename table female as Male
  Reply
Ravi
May 31, 2007 4:56 am
If you are looking for deleting duplicates of more than 2 records, then the following code can be used…
set nocount on
drop table tbl1
create table tbl1
(
col1 int
)
insert into tbl1 values(1)
insert into tbl1 values(1)
insert into tbl1 values(1)
insert into tbl1 values(2)
insert into tbl1 values(2)
insert into tbl1 values(2)
insert into tbl1 values(2)
insert into tbl1 values(3)
insert into tbl1 values(3)
set rowcount 1
select ‘start’
while @@rowcount > 0 delete a from tbl1 a where (select count(*) from tbl1 b where a.col1 = b.col1)>1
set rowcount 0
select * from tbl1
set nocount off
Reply
- Madhivanan
  May 12, 2010 4:41 pm
  This will be a time-consuming method
  Alternates are move distinct data to temp table or use row_number() function
  Reply
Izzak
May 31, 2007 2:51 pm
Simple is good! You Rock!
(I found so many answers elsewhere and it’s incredible how 6 lines of code and one push of a button can do with your code)
Reply
Amit Jain
June 14, 2007 12:23 am
Ravi,
your solution seems to be excelent but what about triggers and indexes defined on that table tab1.
will they remain intact??
Please repsond ….
Amit
Reply
Vamshi
June 26, 2007 3:36 pm
Well to answer Rahul’s question and modify Ravi’s and Pinal Dave query…
Here is a single query which will answer all your questions but this still needs Primary key or identity column to be added to the table though.
DELETE FROM MyTable
WHERE EXISTS (
SELECT * FROM MyTable AS b
WHERE
b.[col1] = MyTable.[col1]
AND b.[col2] = MyTable.[col2]
AND b.[col3] = MyTable.[col3]
GROUP BY
b.[col1], b.[col2], b.[col3]
HAVING
MyTable.[ID] > MIN(b.[ID])
)
If you want most recent records to be present in your DB and delete multiple Old duplicate records, You can make use of
DELETE FROM MyTable
WHERE EXISTS (
SELECT * FROM MyTable AS b
WHERE
b.[col1] = MyTable.[col1]
AND b.[col2] = MyTable.[col2]
AND b.[col3] = MyTable.[col3]
GROUP BY
b.[col1], b.[col2], b.[col3]
HAVING
MyTable.[ID] MIN(b.[ID])
)
Hope this answers all your questions.
If you dont have Primary key or Identity Column then you can always create it and run this query and delete the Primary Key or identity column.
“there are always better things to do to make lives easier.”
enjoy your day
Vamshi
Reply
Senthilnathan
July 3, 2007 1:49 am
IN SQL SERVER 2005, This can be easily achieved without crating unique identifier by using CTE and ROW_NUMBER (), the modified query for sql server 2005 goes here
***********************************************
WITH T1 AS (SELECT ROW_NUMBER ( ) OVER ( PARTITION BY ID, FNAME, LNAME ORDER BY ID ) AS RNUM FROM DUPLICATE )
DELETE FROM T1 WHERE RNUM > 1
***********************************************
To get a detail grasp on ROW_NUMBER () OVER () … Refer MSDN https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017 for.
Reply
- Madhivanan
  May 12, 2010 4:46 pm
  Also see how you can effectively make use of row_number() function for various purposes
  Reply
Ankur
July 9, 2007 5:02 am
I have by mistake duplicated all my records.
Through export/import wizard I selected my database (e.g pension) and copied tables to the desination, later I came to know that source and destination where the same.
Therefore, in all the table all rows are duplicated.
Can someone solve the problem.
Reply
- Madhivanan
  June 15, 2010 6:20 pm
  Do you have a primary or unique key in the table?
  Which version SQL Server are you using?
  Reply
satish ramahnujan
July 13, 2007 4:28 pm
Thanks Ravi, your information helped. I modified the query in the following manner and it worked fine:
Select distinct * into temp from tab1;
Delete tab1;
Insert tab1 Select * from temp;
drop table temp;
Reply
Dipak
July 24, 2007 10:23 am
how can we delete duplicate rows if table doesn’t have any identity column and requirement is you don’t have to insert one column as identity and you don’t have to use cursor and you don’t have to use temp table. so query should be a single
Please reply
Reply
- Madhivanan
  May 12, 2010 4:47 pm
  Which version of SQL Server are you using?
  If it is later than 2000, use row_number() function
  Reply
Dipak
July 24, 2007 10:33 am
this question i have asked in one of my interview, and requirement is like delete record should be in single query its ok if you can write subquery, but without use of temp table, without use cursor, and without use of insert identity column. still i am waiting for the answer,
Reply
- Madhivanan
  June 15, 2010 6:24 pm
  One option is to use row_number() function if the version is 2005 or above
  Refer point 6
  Reply
Tayeb
July 24, 2007 12:05 pm
I’m trying to keep only the recent SaleDate in the following table. How can I do that?
ID SaleDate SaleAmount
40 6/23/2003 242
40 12/28/2001 212
40 6/13/1994 111
41 11/30/2001 233
41 10/25/1996 15
41 4/21/1994 132
42 6/17/2005 2765
42 5/14/1994 147
43 5/9/1994 145
Thanks much
Reply
- Madhivanan
  May 17, 2010 2:43 pm
  select t1.* from table as t1 inner join
  (
  select id,max(saledate) as saledate from table
  group by id
  ) as t2
  on t1.id=t2.id and t1.saledate=t2.saledate
  Reply
Vamshi
July 31, 2007 11:42 am
Tayeb below query should work for you
DELETE FROM #tempTab WHERE
EXISTS
(SELECT ID FROM #tempTab AS b WHERE #tempTab.ID = b.ID
GROUP BY b.ID
HAVING #tempTab.SaleDate < MAX(b.SaleDate))
Reply
Vamshi
August 3, 2007 10:00 am
Depak, to answer your question here is a solution in SQL Server 2005
CREATE TABLE #Table1 (col1 int, col2 int)
INSERT INTO #Table1 VALUES (1, 1000)
INSERT INTO #Table1 VALUES (2, 2000)
INSERT INTO #Table1 VALUES (2, 2000)
INSERT INTO #Table1 VALUES (3, 3000)
INSERT INTO #Table1 VALUES (3, 3000)
INSERT INTO #Table1 VALUES (4, 4000)
SELECT * FROM #Table1;
WITH T1 AS (SELECT (ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col1)) AS RNum FROM #Table1)
DELETE FROM T1 WHERE RNum IN (SELECT a.RNum FROM T1 AS a, T1 AS b WHERE a.RNum > b.RNum GROUP BY a.RNum)
SELECT * FROM #Table1
DROP TABLE #Table1
ROW_NUMBER() will solve your purpose.
Reply
- Vanthiya
  September 6, 2010 2:50 pm
  Hi Vamshi,
  As a beginner to SQL Server, it would be more useful for my career. Thank You.
  Reply
- Suvasish
  September 22, 2011 8:56 am
  Hi Vamshi,
  Simply you rock..
  Reply
Geetha
August 10, 2007 8:12 am
Hi
I had problem with deleting dupilicate rows. i have used your code. work fine. thanks a lot.
Geetha
Reply