SQL SERVER – Delete Duplicate Records – Rows

March 1, 2007

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts

SQL SERVER – T-SQL Script to find the CD key from Registry

SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

SQL SERVER – Execution Plan Ignores Tabs, Spaces and Comments

July 14, 2018

SQL SERVER – 2008 – Policy-Based Management – Create, Evaluate and Fix Policies

June 30, 2009

SQL SERVER – Find All The User Defined Functions (UDF) in a Database

February 2, 2008

450 Comments. Leave new

mgwalk
April 1, 2008 11:01 pm
Wow this is nice and simple!
Reply
Pravin Phatangare
April 4, 2008 10:26 pm
Hi All,
Those who have requested for
How to get 1st, 2nd, 3rd, 4th, nth topmost salary from an Employee table
The following solution is for getting 6th highest salary from Employee table ,
SELECT TOP 1 salary FROM (SELECT DISTINCT TOP 6 salary FROM employee
ORDER BY salary DESC) a ORDER BY salary
You can change and use it for getting nth highest salary from Employee table as follows
SELECT TOP 1 salary FROM (SELECT DISTINCT TOP n salary FROM employee
ORDER BY salary DESC) a ORDER BY salary
where n > 1 (n is always greater than one)
Reply
Jay
April 13, 2008 6:24 pm
I need to be able to select only the record with the max(datetime) of the same guid. Is this possible? I only want the record with the date of 6:53:02 and not 5:07:32 and I want to keep the record with 5:07:19 (different guid)
b16569a1-e3dc-4f03-b2a0-bf3e2677c466 rptProviders_Provider2Location 4/13/2008 5:07:19 AM 4/11/2008 6:46:29 PM
36b6419a-a3a5-49ad-b84d-942c89c4754c rptProviders_Provider2Location 4/13/2008 5:07:32 AM 4/11/2008 6:46:29 PM
36b6419a-a3a5-49ad-b84d-942c89c4754c rptProviders_Provider2Location 4/13/2008 6:53:02 AM 4/11/2008 6:46:29 PM
Reply
Jay
April 13, 2008 6:26 pm
Here is the query I have so far:
SELECT P2L.ProviderId, dbo.AUDIT.TableName, dbo.AUDIT.OccurredAt, P2L.xLastUpdated
FROM dbo.rptProviders_Provider2Location AS P2L WITH (NOLOCK) INNER JOIN
dbo.rptProviders_Provider2LocationCategory AS P2LC WITH (NOLOCK) ON P2L.xId = P2LC.Provider2LocationId INNER JOIN
dbo.AUDIT ON P2L.xId = dbo.AUDIT.RowId
Reply
Dave
April 23, 2008 4:16 am
I’ve got a slightly different problem. I’m creating a stored procedure which will retrieve several different columns from a hospital table (month, year, patient name, intervention hours, and date changed) Those aren’t the real column names; the idea is that records are kept when a patient’s behavior requires staff intervention. The records are entered monthly, but frequently–in a given time range–the reports show patients’ names more than once, because people update the intervention time numbers (stuff is loaded in wrong, then corrected, or loaded in as 0 to meet a deadline, then updated, etc). I need to get rid of the duplications, meaning that when names are shown more than once for a given month and year, I only want one of the duplicated intervention records to show up. I want to filter it by “date changed,” meaning the most recent entry for each patient.
So when I run an “interventions” report for specified time frame at a given hospital, I want to see the intervention hours broken down by patient name (but I only want to see the most recently entered record for each patient who had intervention activity during the time parameters of the report).
How can I modify my SELECT statement to do this?
Reply
saravanan.p
April 23, 2008 10:57 am
how to solve this problem?
ID(identity column) name
——————————————
1 david
2 john
3 ram
4 paul
5 aish
6 raj
7 mani
8 madu
9 cotter
now i deleted the 4′th row
4 paul
ok
now how can i display the remaining names as given below
ID(identity column) name
——————————————
1 david
2 john
3 ram
4 aish
5 raj
6 mani
7 madu
8 cotter
any body can help me sir?
saranpselvam@gmail.com
Reply
Mayank Mishra
April 23, 2008 3:11 pm
Is it possible to delete a single column from a table in SQL 2000.
Thanks
Reply
Ray McCoy
May 13, 2008 1:45 am
This works in 2000. WARNING: According to MS, SET ROWCOUNT will not work with INSERT, DELETE, and UPDATE in later versions.
— Create table with a number of values between zero and nine
select a+b+c as val
into dbo.rmtemp
from (select 0 a union all select 1 union all select 2 union all select 3) a
, (select 0 b union all select 1 union all select 2 union all select 3) b
, (select 0 c union all select 1 union all select 2 union all select 3) c
— Show what you’ve got
select val,count(*) row_count from dbo.rmtemp group by val
— Limit processing to a single row
set rowcount 1
— While you’ve got duplicates, delete a row
while (select top 1 val from dbo.rmtemp group by val having count(*) > 1) is not null
delete from dbo.rmtemp where val in (select top 1 val from dbo.rmtemp group by val having count(*) > 1);
— Remove single row processing limit
set rowcount 0
— Confirm that only uniques remain
select val,count(*) row_count from dbo.rmtemp group by val
— Clean up
drop table dbo.rmtemp
— Comment
I knew there had to be a better way.
Reply
micheal
June 10, 2008 5:50 pm
hi pinal dave ,
i have a question for u , i want to retrive the rows from a table by avoiding duplicate rows with all coloumns .i.e; duplicate row sholud be retrieved only once . any help reg…
Reply
Smitha
June 17, 2008 4:14 pm
Its really a nice
Reply
EstebanD
June 18, 2008 8:51 pm
Hi Pinal,
Thanks for your tips.
I have a question. We have a table that have many contacts that are duplicated more than once. And we need a query that excludes all duplicated people who has already registered is there a more efficient way of doing it than this?
SELECT CD.email
, CD.contactid
FROM contact CD
WHERE (CD.opt_out = 0)
AND (CD.email ‘NoEmail’)
AND (CD.email IS NOT NULL)
AND (CD.email ”)
AND (CD.registered = 0)
AND Lower (CD.email) + ‘||’ + Lower (cd.firstname) + ‘||’ + Lower (CD.lastname) NOT IN
(SELECT Lower (CD1.email) + ‘||’ + Lower (CD1.firstname) + ‘||’ + Lower (CD1.lastname)
FROM contact CD1
WHERE (CD.registered = 1)
)
GROUP BY CD.contactid
, CD.email
Thank you,
EstebanD
Reply
preethi
June 19, 2008 7:04 pm
Dear Sir,
i a have a table with id as one column,name as another column , in name there r diffrent names is there is possible to display all the alphabets in the name column.it is very urgent.where i can see mu solution if else post it to my id.
Reply
Jegan
June 20, 2008 12:39 pm
I am having a query inside the view which is returning more than one value. I need all the values to be returned in a separate row while running the view. Can ypu provide me a solution?
Reply
Asit
June 23, 2008 5:10 pm
Dear sir
I am faceing a problem in sql query.
I have two table.
1st table is maste table, In master table total 180 records are stored.
our requirement total 180 rows – 20 rows are stored in 2nd table, then display 160 mark as ‘N’ bcos stored in master table and other records are mark as ‘Y’ bcos its stored in 2nd table.
Table 1st
AppID FunCode SubFuncode FunName SubFunName Mark
Dis 1 30 Main 0 N
Depos 2 30 &Main 0 N
Lox 3 30 &Main 0 N
LCredit 4 30 &Main 0 N
In 2nd table same records but only 20 records are stored.
AppID FunCode SubFuncode FunName SubFunName Mark
Dis 1 30 &Main 0 Y
Depos 2 30 &Main 0 Y
In query we try to fetch all Y marked row and N marked with Union query.
SELECT TBLAPPLICATIONS.Name, TBLGROUPFUNCTIONS.AppID, TBLGROUPFUNCTIONS.FunctionCode, TBLALLFUNCTIONS.DefaultSubFunctionName,
TBLGROUPFUNCTIONS.SubFunctionCode, ‘Y’ AS AssignedFunction
FROM TBLGROUPFUNCTIONS INNER JOIN
TBLALLFUNCTIONS ON TBLGROUPFUNCTIONS.AppID = TBLALLFUNCTIONS.AppID AND
TBLGROUPFUNCTIONS.FunctionCode = TBLALLFUNCTIONS.FunctionCode AND
TBLGROUPFUNCTIONS.SubFunctionCode = TBLALLFUNCTIONS.SubFunctionCode INNER JOIN
TBLAPPLICATIONS ON TBLGROUPFUNCTIONS.AppID = TBLAPPLICATIONS.AppID
WHERE TBLGROUPFUNCTIONS.GroupID = 12
UNION ALL
SELECT TBLAPPLICATIONS.Name, TBLALLFUNCTIONS.AppID, TBLALLFUNCTIONS.FunctionCode,
TBLALLFUNCTIONS.DefaultSubFunctionName, TBLALLFUNCTIONS.SubFunctionCode, ‘N’ AS AssignedFunction
FROM TBLALLFUNCTIONS INNER JOIN TBLAPPLICATIONS ON TBLALLFUNCTIONS.AppID = TBLAPPLICATIONS.AppID
WHERE
(TBLALLFUNCTIONS.AppID NOT IN (SELECT AppID FROM TBLGROUPFUNCTIONS WHERE GroupID = 12) OR
TBLALLFUNCTIONS.FunctionCode NOT IN (SELECT FunctionCode FROM TBLGROUPFUNCTIONS WHERE GroupID = 12) OR
TBLALLFUNCTIONS.SubFunctionCode NOT IN (SELECT SubFunctionCode FROM TBLGROUPFUNCTIONS WHERE GroupID = 12))
Order BY aPPID,FunctionCode,sUBfUNCTIONcODE
From this query record is not show correctley.
Please please immediate reply me.
I am waiting your reply please suggest me.
Asit Sinha
Reply
maxtrebor
June 28, 2008 5:47 am
Thanx Mhlove:
Your solution was succesfull for me, for delete duplicate data in secondary column but the same ID.
Regards.
Reply
bogdan
July 28, 2008 3:21 pm
Hi,
I have an issue with and sql query i’m trying to get right and I seem not to find the answer.
I have a table with 4 colums: id1,id2,value1,value2.
I need to select all rows with a unique pair of (id1,id2) and use a 3rd column in the select for something like : sum(value1*value2) for duplicate rows.
So basicaly, show 1 row for each pair and next get a sum of all the values for that pair (sum the duplicate rows on value1,value 2).
I only been able to select the unique pairs until now using a group by id1,id2 but i’m having troble with the sums of the duplicate rows for each pair.
I’d appreciate your help
Reply
Vamshi
August 16, 2008 1:35 am
Hi Bogdon,
It would have been easy to provide solution if you would have provided proper Input and desired output. Anyways see it the below works for you based on my understanding
CREATE TABLE #T1 (ID1 INT, ID2 INT, Value1 INT, Value2 INT)
INSERT INTO #T1 VALUES (1,1,5,10)
INSERT INTO #T1 VALUES (1,1,6,12)
INSERT INTO #T1 VALUES (1,1,50,100)
INSERT INTO #T1 VALUES (1,2,7,14)
INSERT INTO #T1 VALUES (1,2,8,16)
INSERT INTO #T1 VALUES (2,1,2,5)
INSERT INTO #T1 VALUES (2,1,3,6)
INSERT INTO #T1 VALUES (2,3,10,20)
INSERT INTO #T1 VALUES (2,3,25,50)
SELECT * FROM #T1;
SELECT ID1, ID2, SUM(Value1*Value2) FROM #T1 GROUP BY ID1, ID2;
WITH T1 AS (SELECT (ROW_NUMBER() OVER (PARTITION BY ID1 ORDER BY ID1)) AS RNum, * FROM #T1 )
DELETE FROM T1 WHERE Rnum IN (SELECT a.RNum AS S FROM T1 AS a, T1 AS b WHERE a.RNum > b.RNum GROUP BY a.RNum)
SELECT * FROM #T1
DROP TABLE #T1
Post the input and desired output if the above is not what you want.
Vamshi
Reply
Vamshi
August 16, 2008 2:42 am
Sorry WITH statement is not yours…
CREATE TABLE #T1 (ID1 INT, ID2 INT, Value1 INT, Value2 INT)
INSERT INTO #T1 VALUES (1,1,5,10)
INSERT INTO #T1 VALUES (1,1,6,12)
INSERT INTO #T1 VALUES (1,1,50,100)
INSERT INTO #T1 VALUES (1,2,7,14)
INSERT INTO #T1 VALUES (1,2,8,16)
INSERT INTO #T1 VALUES (2,1,2,5)
INSERT INTO #T1 VALUES (2,1,3,6)
INSERT INTO #T1 VALUES (2,3,10,20)
INSERT INTO #T1 VALUES (2,3,25,50)
SELECT * FROM #T1;
SELECT ID1, ID2, SUM(Value1*Value2) FROM #T1 GROUP BY ID1, ID2;
I dont know why you said GROUP BY didnt worked for you if the above is what you are expecting
Reply
Vamshi
August 16, 2008 3:20 am
Hi Haneef and Preveen
Hope this would help you in getting all those employees with Nth Highest salary
DECLARE @n INT
SELECT @n = 2 — Change N value to get Nth highest value
CREATE TABLE #T1 (ID1 INT, [Name] NVARCHAR(50), Salary INT)
INSERT INTO #T1 VALUES (1, ‘Vamshi’, 1000)
INSERT INTO #T1 VALUES (2, ‘xxxxx’, 2000)
INSERT INTO #T1 VALUES (3, ‘yyyyy’, 3000)
INSERT INTO #T1 VALUES (4, ‘zzzzz’, 4000)
INSERT INTO #T1 VALUES (5, ‘sssss’, 5000)
INSERT INTO #T1 VALUES (6, ‘ccccc’, 6000)
INSERT INTO #T1 VALUES (7, ‘ppppp’, 2000)
INSERT INTO #T1 VALUES (8, ‘aaaaa’, 4000)
INSERT INTO #T1 VALUES (9, ‘bbbbb’, 5000)
SELECT a.ID1, a.[Name], a.Salary
FROM #T1 AS a
WHERE (@n-1) = (
SELECT COUNT(DISTINCT(b.Salary))
FROM #T1 AS b
WHERE b.Salary > a.Salary)
DROP TABLE #T1
Vamshi
Reply
Ahmed
August 18, 2008 8:36 pm
This is amazing Pinal.
It resolved my problem.
Cheers.
Reply