Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
Watch the view to see the above concept in action:
[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]
Reference : Pinal Dave (https://blog.sqlauthority.com)
450 Comments. Leave new
if there is no key in the table then what ?
I have on table named [Duplicate] {ID int,FNAME varchar(10),MNAME varchar(10)}
Here there is no key and here are duplicate rows. so hoca can i delete this duplicate rows.
Check Data
ID FNAME LNAME
1 AAA CCC
2 BBB DDD
1 AAA CCC
2 BBB DDD
1 AAA CCC
2 BBB DDD
3 BCB DGD
Remove duplicate rows and keep the data in to the table like this using single query.
ID FNAME LNAME
1 AAA CCC
2 BBB DDD
3 BCB DGD
PLease if possible help because i faced this question in many interviews.
Reply me
select distinct * into #temp from urtbl truncate table urtbl insert urtbl select * from #temp drop table #temp
If we do as per u suggestion,can performance decreases
Yes. It will be a performance hit query
rno sname
—————-
1 Frieda
1 Frieda
1 Frieda
4 Nathan
6 senthil
6 senthil
6 senthil
2 Senthil
6 senthil
1 Shanoj
2 Shanoj
4 Varun
Step 1: Create duplicate table and move the duplicate value records of the original table to a duplicate table.
SELECT DISTINCT * INTO DuplicateTb FROM dublicatetest GROUP BY sname,rno HAVING COUNT(rno) > 1
After executed the above Query if you look the table the below result come.
select * from DuplicateTb
1 Frieda
6 senthil
select * from dublicatetest
2 Shanoj
1 Frieda
2 Senthil
4 Varun
6 senthil
1 Shanoj
6 senthil
4 Nathan
6 senthil
6 senthil
1 Frieda
1 Frieda
Step 2: Delete all rows from the original table that also reside in the duplicate table.
DELETE dublicatetest WHERE sname IN (SELECT sname FROM DuplicateTb)
After executed the above Query if you look the table the below result come.
select * from DuplicateTb
1 Frieda
6 senthil
select * from dublicatetest
2 Shanoj
4 Varun
1 Shanoj
4 Nathan
Step 3 : Move back the rows from duplicate table to original table.
INSERT dublicatetest SELECT * FROM DuplicateTb
After executed the above Query if you look the table the below result come.
select * from DuplicateTb
1 Frieda
6 senthil
select * from dublicatetest
2 Shanoj
1 Frieda
6 senthil
4 Varun
1 Shanoj
4 Nathan
Step4: Drop the duplicate table.
DROP TABLE DuplicateTb
Dear friends now i hope u all are understood and got cleared.
Good Luck
Senthilkumar.T
Be happy and try to make others happy
dear friends,
here with i am sending a sample of code which might help u out in a easy manner
select distinct * into newtable from oldtable
truncate table oldtable
insert into oldtable select * from newtable
here we can have the distinct records or rows.
Thank for Ur code. Because it help to my interview.
but if Is Identity is set, in that condition, this code fails……
Nice code.I learn much more from your code
hi,
how can we improve query performence. with same query dont use indexes view . for example select * from mytable query execute 2 min but the same query i want execute with in 1 min. hw can we do that one , please give me reply
dont worry man;
make a call to me [phone number removed]
select * from t1 where rowid=(select min(rowid) from
t1 group by t1.c1)
This will throw error. You need to relate the common key from the outer table
select * from t1 as t where rowid=(select min(rowid) from
t1 where c1=t.c1)
delete from duplicate
where fname in
(
select d1.fname
from duplicate d1
inner join duplicate d2
on d1.fname=d2.fname and d1.Lname=d2.Lname
group by d1.fname, d1.Lname
having count (d1.fname)>1 and count(d1.Lname)>1
)
It can be easily done with row_number() function. Refer point 6 at
If we assume two duplicates doesn’t this delete both records? How does your delete statement only remove one duplicate or is your goal to delete all duplicates? You can do a delete top but you must know how many duplicates
[ Just Copy & Paste ]
declare @TBL table(EmpID int)
insert into @TBL(EmpID)
values(90020),(90020),(90021),(90022),(90023),(90022),(90025),(90022)
select * from @TBL order by EmpID
;with TBLCTE(EmpID,Ranking)
AS
(
select
EmpID,
Ranking = DENSE_RANK() over (PARTITION BY EmpID order by newID())
from @TBL
)
delete from TBLCTE where Ranking > 1
select * from @TBL order by EmpID
————-
Dear pinal,
If i am wrong then correct me.
As per identity column in a table u will never have a duplicate rows in a table
because it grows as 1,2,3,4,5
so,we can’nt have duplicate rows in a identity column.
but,we can have duplicate data in other column of a identity tables.
That is the point of this post. That is why it is used to keep only one row for each combination of columns
Value1 | Value2
1 | ‘Mahesh’
1 | ‘Mahesh’
2 | ‘Rahul’
2 | ‘Rahul’
With NewData
as
(
select s.*,DENSE_RANK() OVER
(PARTITION BY s.Value1 order by newID() ) r from SampleData s
)
DELETE FROM NewData where r=2
Wouldn’t it be a little more simple to use Group By or Distinct to get the unique records in that situation?
SELECT ID, FNAME, LNAME
FROM Duplicate
GROUP BY ID, FNAME, LNAME
would give you the unique rows of Duplicate
Rough Steps of one possibility to fix the table:
1) Save the results of the group by query to a temp table
2) Delete rows from the original table
3) Fill original table with rows of temp table
That you way you don’t have to use a cursor or change the design of table Duplicate. However, in the real world, you’d want to add a unique constraint to the ID column of table Duplicate to make sure it doesn’t get any more duplicates by ID.
If the version is 2005 or avoe, you can simply do
delete t from
(
select row_number() over (partition by ID, FNAME, LNAME
order by ID) , * from table
) as t
where sno>1
Hi All,
I have used table expression to remove duplicate records.
Here is the solution
Create Table emp
(
id int identity,
empname varchar(20),
empdept varchar(10)
)
insert into emp values(‘A’,’X’)
insert into emp values(‘A’,’X’)
insert into emp values(‘A’,’X’)
insert into emp values(‘B’,’X’)
insert into emp values(‘B’,’X’)
insert into emp values(‘C’,’X’)
select * from emp
WITH Dublicates_Emp(empname, empdept,id)
AS
(
SELECT empname,empdept,max(id)
FROM EMP
GROUP BY empname,empdept
HAVING COUNT(empname)>1
)
delete from emp where id in
(
select emp.id
from emp inner join Dublicates_Emp as d
on emp.empname = d.empname
and emp.empdept = d.empdept
and emp.id d.id
)
Whoops, I almost forgot…
Thanks for the post about deleting duplicates! Your code was easy find with a Google search. It’s exactly what I was needing. It’s simple and elegant! I sometimes find it difficult to think of how to do things like that, so it was nice to find your post about it!
While searching for this solution I found many others which use cursors and complicated SQL statements but in the end they do the same thing as your simple & powerful solution. Thank you for posting it online!
Seriously thanks,
Keep plopping my head why i didnt come up with simplicity.
— note
Plopping is a merely profound and uncatastrophic way of activating the tiny brain members that are usually on vacation.
— note 2
nevermind the note
Pinal,
Once again you are the man! I looked at other sites to answer this question and I found this SQL statement to be the best and easiest. Now I know to just come straight here.
If you are looking for deleting multiple records (duplicate with more than 2 records like 3,4,5 duplicate records, etc) and without key column then the following method can be adopted.
Lets say your table is tab1.
SELECT DISTINCT * INTO tab2 FROM tab1 — distinct of all columns
DROP TABLE tab1
EXEC sp_rename ‘tab2′,’tab1’
Note:
Take a back up of your original table before using dropping for your reference. You may need to grant permission of tab2 as given for tab1
I would say” dont drop the table”
You can truncate it
Yes, truncate is good idea.
hey guys u shld try it—————————–
————delete duplicate record————-if we have table Male wd duplicate record just
step 1– transfer distinct recoed of male table to female
insert into Female(Name)
select distinct Name from Male
step 2–drop table Male
step 3– Rename table female as Male
If you are looking for deleting duplicates of more than 2 records, then the following code can be used…
set nocount on
drop table tbl1
create table tbl1
(
col1 int
)
insert into tbl1 values(1)
insert into tbl1 values(1)
insert into tbl1 values(1)
insert into tbl1 values(2)
insert into tbl1 values(2)
insert into tbl1 values(2)
insert into tbl1 values(2)
insert into tbl1 values(3)
insert into tbl1 values(3)
set rowcount 1
select ‘start’
while @@rowcount > 0 delete a from tbl1 a where (select count(*) from tbl1 b where a.col1 = b.col1)>1
set rowcount 0
select * from tbl1
set nocount off
This will be a time-consuming method
Alternates are move distinct data to temp table or use row_number() function
Simple is good! You Rock!
(I found so many answers elsewhere and it’s incredible how 6 lines of code and one push of a button can do with your code)
Ravi,
your solution seems to be excelent but what about triggers and indexes defined on that table tab1.
will they remain intact??
Please repsond ….
Amit
Well to answer Rahul’s question and modify Ravi’s and Pinal Dave query…
Here is a single query which will answer all your questions but this still needs Primary key or identity column to be added to the table though.
DELETE FROM MyTable
WHERE EXISTS (
SELECT * FROM MyTable AS b
WHERE
b.[col1] = MyTable.[col1]
AND b.[col2] = MyTable.[col2]
AND b.[col3] = MyTable.[col3]
GROUP BY
b.[col1], b.[col2], b.[col3]
HAVING
MyTable.[ID] > MIN(b.[ID])
)
If you want most recent records to be present in your DB and delete multiple Old duplicate records, You can make use of
DELETE FROM MyTable
WHERE EXISTS (
SELECT * FROM MyTable AS b
WHERE
b.[col1] = MyTable.[col1]
AND b.[col2] = MyTable.[col2]
AND b.[col3] = MyTable.[col3]
GROUP BY
b.[col1], b.[col2], b.[col3]
HAVING
MyTable.[ID] MIN(b.[ID])
)
Hope this answers all your questions.
If you dont have Primary key or Identity Column then you can always create it and run this query and delete the Primary Key or identity column.
“there are always better things to do to make lives easier.”
enjoy your day
Vamshi
IN SQL SERVER 2005, This can be easily achieved without crating unique identifier by using CTE and ROW_NUMBER (), the modified query for sql server 2005 goes here
***********************************************
WITH T1 AS (SELECT ROW_NUMBER ( ) OVER ( PARTITION BY ID, FNAME, LNAME ORDER BY ID ) AS RNUM FROM DUPLICATE )
DELETE FROM T1 WHERE RNUM > 1
***********************************************
To get a detail grasp on ROW_NUMBER () OVER () … Refer MSDN https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017 for.
Also see how you can effectively make use of row_number() function for various purposes
I have by mistake duplicated all my records.
Through export/import wizard I selected my database (e.g pension) and copied tables to the desination, later I came to know that source and destination where the same.
Therefore, in all the table all rows are duplicated.
Can someone solve the problem.
Do you have a primary or unique key in the table?
Which version SQL Server are you using?
Thanks Ravi, your information helped. I modified the query in the following manner and it worked fine:
Select distinct * into temp from tab1;
Delete tab1;
Insert tab1 Select * from temp;
drop table temp;
how can we delete duplicate rows if table doesn’t have any identity column and requirement is you don’t have to insert one column as identity and you don’t have to use cursor and you don’t have to use temp table. so query should be a single
Please reply
Which version of SQL Server are you using?
If it is later than 2000, use row_number() function
this question i have asked in one of my interview, and requirement is like delete record should be in single query its ok if you can write subquery, but without use of temp table, without use cursor, and without use of insert identity column. still i am waiting for the answer,
One option is to use row_number() function if the version is 2005 or above
Refer point 6
I’m trying to keep only the recent SaleDate in the following table. How can I do that?
ID SaleDate SaleAmount
40 6/23/2003 242
40 12/28/2001 212
40 6/13/1994 111
41 11/30/2001 233
41 10/25/1996 15
41 4/21/1994 132
42 6/17/2005 2765
42 5/14/1994 147
43 5/9/1994 145
Thanks much
select t1.* from table as t1 inner join
(
select id,max(saledate) as saledate from table
group by id
) as t2
on t1.id=t2.id and t1.saledate=t2.saledate
Tayeb below query should work for you
DELETE FROM #tempTab WHERE
EXISTS
(SELECT ID FROM #tempTab AS b WHERE #tempTab.ID = b.ID
GROUP BY b.ID
HAVING #tempTab.SaleDate < MAX(b.SaleDate))
Depak, to answer your question here is a solution in SQL Server 2005
CREATE TABLE #Table1 (col1 int, col2 int)
INSERT INTO #Table1 VALUES (1, 1000)
INSERT INTO #Table1 VALUES (2, 2000)
INSERT INTO #Table1 VALUES (2, 2000)
INSERT INTO #Table1 VALUES (3, 3000)
INSERT INTO #Table1 VALUES (3, 3000)
INSERT INTO #Table1 VALUES (4, 4000)
SELECT * FROM #Table1;
WITH T1 AS (SELECT (ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col1)) AS RNum FROM #Table1)
DELETE FROM T1 WHERE RNum IN (SELECT a.RNum FROM T1 AS a, T1 AS b WHERE a.RNum > b.RNum GROUP BY a.RNum)
SELECT * FROM #Table1
DROP TABLE #Table1
ROW_NUMBER() will solve your purpose.
Hi Vamshi,
As a beginner to SQL Server, it would be more useful for my career. Thank You.
Hi Vamshi,
Simply you rock..
Hi
I had problem with deleting dupilicate rows. i have used your code. work fine. thanks a lot.
Geetha