SQL SERVER – Delete Duplicate Records – Rows

March 1, 2007

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts

SQL SERVER – T-SQL Script to find the CD key from Registry

SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

How to Find Median in SQL Server? – Interview Question of the Week #116

April 2, 2017

SQL SERVER – Fix: Error 130: Cannot perform an aggregate function on an expression containing an aggregate or a subquery

May 16, 2007

SQL SERVER – Finding User Who Dropped Database Table

November 23, 2018

450 Comments. Leave new

Mohan
August 21, 2008 6:29 pm
Hi,
I need to Get a primary key in a particular table in MS-SQL SERVER 2000. How Can i get. Is there any query is available???
Thanks in advance
Mohan.V
Reply
Dave
August 21, 2008 10:09 pm
Pinal, great information on your site!
I think Senthilnathan in comment #16 has the right solution that can be used for deleting duplicates and for finding the nth row in a group. The key is ROW_NUMBER() function with a Partition.
— EXAMPLE TO GET THE 5th highest salary of the “Manager” group.
— CREATE Common Table Expression CTE
With MyCTE AS (
SELECT
ROW_NUMBER() OVER (PARTITION EmployeeType
ORDER BY EmployeeType, Salary) AS OrderedGroupId,
EmployeeId,
EmployeeType,
Salary
FROM SalaryHistory)
SELECT EmployeeId, — Query results of CTE
Salary
FROM MyCTE
WHERE EmployeeType = “Manager”
AND OrderedGroupId = 5
This could also be a DELETE statement deleting duplicate items in a PARTITION (DELETE FROM MyCTE WHERE OrderedGroupId > 1)
Reply
Imran Mohammed
August 22, 2008 7:29 am
@Mohan,
1. If you want to see the primary on a specific table then Execute this stored procedure,
Sp_help table_name — you have to give table name
It will display all the table properties, like columns, keys, indexes…
In constraints section it will show you all the keys details, on which column primary key is made.
2. If you want to see all the primary keys on all tables, then use this query,
SELECT A.CONSTRAINT_NAME ‘CONSTRAINT NAME’ , A.COLUMN_NAME ‘COLUMN NAME’, A.TABLE_NAME’TABLE NAME’, A.ORDINAL_POSITION ‘POSITION OF COLUMN’
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS B, INFORMATION_SCHEMA.KEY_COLUMN_USAGE A
WHERE B.CONSTRAINT_NAME =A.CONSTRAINT_NAME AND CONSTRAINT_TYPE = ‘PRIMARY KEY’
ORDER BY A.TABLE_NAME, ORDINAL_POSITION
— Run this script as is don’t change anything
This will give you primary key name, on which column it is made and which table it is made and also position of the column.
Sometimes, we create composite primary keys, meaning primary key on more than one column on one table in that case when you run the above query you will see table name more than once, dont get confuse, see the “position of the column” ( in the output) and then you will come to know that this table has a composite primary key.
** A table can have only one Primary key.
Reply
Jamil
September 3, 2008 12:32 pm
Rahul ask how if there is no key in the table to delete the duplicate data,
Please use this sql to do it, @ID, @Lname and @fname is depend on our field in the table:
DECLARE @ID varchar(150), @lname varchar(150), @fname varchar(150), @MyCount int, @Counting Int
DECLARE authors_cursor CURSOR FOR
SELECT id, FNAME, LNAME, COUNT(*) AS myCount
FROM dbo.Table1
GROUP BY id, FNAME, LNAME HAVING (COUNT(*) > 1)
open authors_cursor
FETCH NEXT FROM authors_cursor
INTO @ID, @fname, @LName, @MYCount
WHILE @@FETCH_STATUS = 0 BEGIN
DECLARE aut2 CURSOR local scroll dynamic OPTIMISTIC FOR
SELECT *
FROM table1
WHERE id = @id AND lname = @lname AND fname = @fname
OPEN aut2
SET @counting = 1
FETCH next FROM aut2
begin
WHILE @Counting < @MyCount BEGIN
print @counting
DELETE FROM table1 WHERE CURRENT OF aut2
SET @Counting = @Counting + 1
FETCH next FROM aut2
END
–end
end
CLOSE aut2
DEALLOCATE aut2
FETCH NEXT FROM authors_cursor
INTO @ID, @Lname, @FName, @MYCount
END
CLOSE authors_cursor
DEALLOCATE authors_cursor
Reply
Jamil
September 3, 2008 1:25 pm
Answer for Rasmi, please study this one:
declare @MyStr varchar(150), @pjg int
DECLARE authors_cursor CURSOR FOR
SELECT LNAME
FROM dbo.Table1
open authors_cursor
FETCH NEXT FROM authors_cursor
INTO @mystr
WHILE @@FETCH_STATUS = 0 BEGIN
set @pjg=len(@mystr)-1
while @pjg>=0
begin
print substring(@mystr,len(@mystr)-@pjg,1)
if isnumeric(substring(@mystr,len(@mystr)-@pjg,1))=0
set @mystr=replace (@mystr,substring(@mystr,len(@mystr)-@pjg,1),”)
set @pjg=@pjg-1
end
print @myStr
update table1 set lname=@mystr where current of authors_cursor
FETCH NEXT FROM authors_cursor
INTO @mystr
end
print @myStr
CLOSE authors_cursor
DEALLOCATE authors_cursor
Reply
Mike
September 12, 2008 8:09 am
In most cases, two duplicate records in a table may not be the same. For example, a customer may enter his name or address, or phone number in different ways, but only from his email, we know this is the same person.
My question is, if a table contains duplicate records like this, and I only want to keep latest record of the dups. I mean, based on customer’s email, and the date of registration, can we have a way to delete the duplicate records? for example:
table_customers contains “email”, “name”, and “date”:
ab@c.com John Smith 24/8/2006
ab@c.com John S. 12/9/2007
…
I only wants to keep the latest record if two records have the same email, no matter if the names are the same or not.
Thank you for your help.
Mike
Reply
Jamil
September 18, 2008 2:29 pm
For Mike may be you can use this Query:
DELETE table_customers
FROM (SELECT Email, MAX(Date) AS Date
FROM table_customers
GROUP BY Email
HAVING Email = ‘ab@c.com’)) DERIVEDTBL
WHERE table_customers.Email = DERIVEDTBL.Email AND table_customers.Date DERIVEDTBL.Date
Reply
Aashish Mangal
September 18, 2008 3:47 pm
Hi
I have a question. Below is the Employee table
EmpID EmpName Salary
———– —————————– ———-
1 Aashish 15000
1 Aashish 15000
3 Gunjan 25000
3 Gunjan 25000
5 Atul 35000
6 Animesh 20000
Now I have to keep only one record either of the two and remove the duplicate record. Condition is There should not be any use of temp tables, no primary keys, no identity columns, no cursors. Everything has to be handeled in a query . And Database is MS SQL Server 2000
Reply
Jamil
September 19, 2008 6:17 am
For Aashish Mangal, You Can Use Query Analyzer Like This:
ALTER TABLE [Employee] ADD [MySpecialIdx] [int] IDENTITY (1, 1) NOT NULL
Go
DELETE [Employee]
FROM (SELECT EmpID, EmpName, Salary, MAX(MySpecialIdx) AS mySpecialIdx
FROM [Employee]
GROUP BY EmpID, EmpName, Salary)) DERIVEDTBL
WHERE table_customers.EmpID = DERIVEDTBL.EmpID AND [Employee].EmpName = DERIVEDTBL.EmpName AND AND
[Employee].EmpName = DERIVEDTBL.EmpName AND Employee.myspecialIdx DerivedTbl.myspecialidx
Go
ALTER TABLE [Employee] DROP COLUMN MySpecialIdx
Go
Or You Can Use In View and Execute Step By Step
Reply
jamil
September 24, 2008 9:32 am
The Next for Aashish Mangal, we can use this query to delete the duplicate record:
WHILE (SELECT TOP 1 COUNT(id) AS Expr1
FROM dbo.Table1
GROUP BY id, FNAME, LNAME
HAVING (COUNT(id) > 1)) > 1
BEGIN
set rowcount 1
DELETE table1
FROM (SELECT TOP 1 id, FNAME, LNAME
FROM dbo.Table1
GROUP BY id, FNAME, LNAME
HAVING (COUNT(id) > 1)) DERIVEDTBL
WHERE table1.id = derivedtbl.id AND table1.fname = derivedtbl.fname AND table1.lname = derivedtbl.lname
END
==
If this query will be saved, the first time you must add any table to query than remove the query text and replaced with the above query text, then you can saved this query with your desired name.
Reply
jamil
September 24, 2008 9:36 am
Sorry, in the fact, that query only can be executed and can not be saved.
Reply
Mit_2807
October 1, 2008 3:10 pm
Hi Pinal,
I had the same query i tried using Rank() function.
It worked for me.
Attaching code here:
WITH DeleteDuplicates
AS
(
SELECT RANK() OVER (PARTITION BY DUPFIELD ORDER BY PKFIELD ) AS RANK, * FROM TABLENAME
)
DELETE FROM DeleteDuplicates WHERE RANK > 1
Wish add more value to the topic.
Thanks,
Mit_2807
Reply
rakesh
October 2, 2008 7:40 pm
hi sir i did like this to delete duplicate rows…
sir i accomplished this using rank functions… and CTE
1)create table emp (id int,name varchar(20))
2) i inserted 1,’rakesh ‘ 3 times
and 2,’sagar’ 2 times
deleting duplicate rows…
with cte
as
(
select id,name,rank() over(order by id) r,row_number() over(order by id) rn from emp
)
delete from cte
where r rn
will this query degrade the performance or not ……
Reply
rakesh
October 3, 2008 11:43 am
hi sir
could you please give script for sending SMTP mail
USING SQLSERVER….
Reply
Purushot
October 3, 2008 5:29 pm
Dear All,
i have a table called Player_details in that i have a data’ like below without any key i want to delete one record and another in data how to do this.
100 chennaiRoyals 9841998470
101 DareDevils 9984725487
101 DareDevils 9984725487
Both are same as like.
can u pls help me
Reply
Purushot
October 3, 2008 5:57 pm
To Find the 2nd Highest Salary:
2nd highest salary
1)SELECT max(salary) FROM Employee
WHERE salary < (SELECT max(salary) FROM employee)
Purushot
Reply
stefano
October 5, 2008 4:45 pm
your query works great !!! finally I found an elegant way to solve the problem without exploiting new sql commands which in SQL 2005 don’t exist yet.
Reply
Purushot
October 6, 2008 12:56 pm
Dear All,
i have a table called Player_details in that i have a data’ like below without any key i want to delete one record and another in data how to do this.
100 chennaiRoyals 9841998470
101 DareDevils 9984725487
101 DareDevils 9984725487
Both are same as like.
can u pls help me
Reply
jamil
October 7, 2008 2:31 pm
Dear Purushot
Suppose that table Player_details contains fields idx, Name and ID, and the table contains records as follows:
100 chennaiRoyals 9841998470
101 DareDevils 9984725487
101 DareDevils 9984725487
103 DareDevilx 9984725487
103 DareDevilx 9984725487
104 DareDevily 9984725487
104 DareDevily 9984725487
104 DareDevily 9984725487
The we want delete the duplicate record, those are the records with idx=101 and idx=103 and idx=104, so that just 1 record will be kept, we can follow these step:
1. Please create this query and save as PlayerDetails_ForDeletingDuplicate
SELECT *
FROM dbo.Player_details a
WHERE ((SELECT COUNT(*)
FROM dbo.Player_details b
WHERE a.idx = b.idx AND a.name = b.name AND a.id = b.id) > 1)
2. Please run this query, if we found any record viewed, then go to next step
3. Edit this query, change the line:
select *
with
delete dbo.Player_details
4. In the top this query add this line
set rowcount 1
5. Execute this query several times until no more records affected
6. close this query without save
Reply
Saravanan
October 10, 2008 7:36 pm
Dear All,
I have a table which can accept duplicate records, but it should not accept duplicate within 20 seconds.I want to delete duplicate row which has been created back to back in 20 seconds. Is there any way to delete these records? Please let me know if you need more info about this.
Thanks in advance.
Cheers,
Saravanan
Reply