Here is another very common question I often received, which candidates face in the interview.
Question: How to delete duplicate rows in SQL Server (considering there is an identity column in table)?
Answer: Well, there is no better answer than following a script. If you ask me this question in the interview, I will even have to look up online and find following script:
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
Here is the quick video I have recorded where I demonstrate the same functionality in simple words.
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
8 Comments. Leave new
Hi sir,
I believe that this script is good where there is primary key in table. When there is no primary key in table the approach should be changed.
Please provide clarification and best approach.
Create a serial number using row_number() function and delete the duplicates
delete t from (select *,row_number() over (partition by keycol order by keycol) as sno from table) as t where sno>1
Hi,
I have a table where there is no any identity column or any unique column and i want delete duplicate rows. Please help. This table contains about 30 millions data.
Create a serial number using row_number() function and delete the duplicates
delete t from (select *,row_number() over (partition by keycol order by keycol) as sno from table) as t where sno>1
Hi pinal,
I tried above script to delete my duplicate data,but I could not delete my data.
we can delete duplicate data using cte expression thats easy and fast.
Hi pinal,
I tried above script to delete my duplicate data,but I could not delete my data.
we can delete duplicate data using cte expression thats easy and fast.
WITH CTE AS(
SELECT id,name,ROW_NUMBER()OVER(PARTITION BY id ORDER BY id)as rn
FROM dupdata
)
DELETE FROM CTE WHERE rn > 1
Sure mekalanaresh0404