SQL SERVER – Delete Duplicate Records – Rows

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]

Reference : Pinal Dave (https://blog.sqlauthority.com)

Duplicate Records, SQL Scripts
Previous Post
SQL SERVER – T-SQL Script to find the CD key from Registry
Next Post
SQL SERVER – QUOTED_IDENTIFIER ON/OFF and ANSI_NULL ON/OFF Explanation

Related Posts

450 Comments. Leave new

  • Wow this is nice and simple!

    Reply
  • Pravin Phatangare
    April 4, 2008 10:26 pm

    Hi All,

    Those who have requested for
    How to get 1st, 2nd, 3rd, 4th, nth topmost salary from an Employee table

    The following solution is for getting 6th highest salary from Employee table ,

    SELECT TOP 1 salary FROM (SELECT DISTINCT TOP 6 salary FROM employee
    ORDER BY salary DESC) a ORDER BY salary

    You can change and use it for getting nth highest salary from Employee table as follows

    SELECT TOP 1 salary FROM (SELECT DISTINCT TOP n salary FROM employee
    ORDER BY salary DESC) a ORDER BY salary

    where n > 1 (n is always greater than one)

    Reply
  • I need to be able to select only the record with the max(datetime) of the same guid. Is this possible? I only want the record with the date of 6:53:02 and not 5:07:32 and I want to keep the record with 5:07:19 (different guid)

    b16569a1-e3dc-4f03-b2a0-bf3e2677c466 rptProviders_Provider2Location 4/13/2008 5:07:19 AM 4/11/2008 6:46:29 PM

    36b6419a-a3a5-49ad-b84d-942c89c4754c rptProviders_Provider2Location 4/13/2008 5:07:32 AM 4/11/2008 6:46:29 PM

    36b6419a-a3a5-49ad-b84d-942c89c4754c rptProviders_Provider2Location 4/13/2008 6:53:02 AM 4/11/2008 6:46:29 PM

    Reply
  • Here is the query I have so far:

    SELECT P2L.ProviderId, dbo.AUDIT.TableName, dbo.AUDIT.OccurredAt, P2L.xLastUpdated
    FROM dbo.rptProviders_Provider2Location AS P2L WITH (NOLOCK) INNER JOIN
    dbo.rptProviders_Provider2LocationCategory AS P2LC WITH (NOLOCK) ON P2L.xId = P2LC.Provider2LocationId INNER JOIN
    dbo.AUDIT ON P2L.xId = dbo.AUDIT.RowId

    Reply
  • I’ve got a slightly different problem. I’m creating a stored procedure which will retrieve several different columns from a hospital table (month, year, patient name, intervention hours, and date changed) Those aren’t the real column names; the idea is that records are kept when a patient’s behavior requires staff intervention. The records are entered monthly, but frequently–in a given time range–the reports show patients’ names more than once, because people update the intervention time numbers (stuff is loaded in wrong, then corrected, or loaded in as 0 to meet a deadline, then updated, etc). I need to get rid of the duplications, meaning that when names are shown more than once for a given month and year, I only want one of the duplicated intervention records to show up. I want to filter it by “date changed,” meaning the most recent entry for each patient.

    So when I run an “interventions” report for specified time frame at a given hospital, I want to see the intervention hours broken down by patient name (but I only want to see the most recently entered record for each patient who had intervention activity during the time parameters of the report).

    How can I modify my SELECT statement to do this?

    Reply
  • how to solve this problem?

    ID(identity column) name
    ——————————————

    1 david
    2 john
    3 ram
    4 paul
    5 aish
    6 raj
    7 mani
    8 madu
    9 cotter

    now i deleted the 4′th row

    4 paul

    ok

    now how can i display the remaining names as given below

    ID(identity column) name
    ——————————————

    1 david
    2 john
    3 ram
    4 aish
    5 raj
    6 mani
    7 madu
    8 cotter

    any body can help me sir?

    saranpselvam@gmail.com

    Reply
  • Mayank Mishra
    April 23, 2008 3:11 pm

    Is it possible to delete a single column from a table in SQL 2000.

    Thanks

    Reply
  • This works in 2000. WARNING: According to MS, SET ROWCOUNT will not work with INSERT, DELETE, and UPDATE in later versions.

    — Create table with a number of values between zero and nine
    select a+b+c as val
    into dbo.rmtemp
    from (select 0 a union all select 1 union all select 2 union all select 3) a
    , (select 0 b union all select 1 union all select 2 union all select 3) b
    , (select 0 c union all select 1 union all select 2 union all select 3) c

    — Show what you’ve got
    select val,count(*) row_count from dbo.rmtemp group by val

    — Limit processing to a single row
    set rowcount 1
    — While you’ve got duplicates, delete a row
    while (select top 1 val from dbo.rmtemp group by val having count(*) > 1) is not null
    delete from dbo.rmtemp where val in (select top 1 val from dbo.rmtemp group by val having count(*) > 1);
    — Remove single row processing limit
    set rowcount 0

    — Confirm that only uniques remain
    select val,count(*) row_count from dbo.rmtemp group by val

    — Clean up
    drop table dbo.rmtemp

    — Comment
    I knew there had to be a better way.

    Reply
  • hi pinal dave ,

    i have a question for u , i want to retrive the rows from a table by avoiding duplicate rows with all coloumns .i.e; duplicate row sholud be retrieved only once . any help reg…

    Reply
  • Its really a nice

    Reply
  • Hi Pinal,

    Thanks for your tips.

    I have a question. We have a table that have many contacts that are duplicated more than once. And we need a query that excludes all duplicated people who has already registered is there a more efficient way of doing it than this?

    SELECT CD.email
    , CD.contactid
    FROM contact CD
    WHERE (CD.opt_out = 0)
    AND (CD.email ‘NoEmail’)
    AND (CD.email IS NOT NULL)
    AND (CD.email ”)
    AND (CD.registered = 0)
    AND Lower (CD.email) + ‘||’ + Lower (cd.firstname) + ‘||’ + Lower (CD.lastname) NOT IN
    (SELECT Lower (CD1.email) + ‘||’ + Lower (CD1.firstname) + ‘||’ + Lower (CD1.lastname)
    FROM contact CD1
    WHERE (CD.registered = 1)
    )
    GROUP BY CD.contactid
    , CD.email

    Thank you,

    EstebanD

    Reply
  • Dear Sir,
    i a have a table with id as one column,name as another column , in name there r diffrent names is there is possible to display all the alphabets in the name column.it is very urgent.where i can see mu solution if else post it to my id.

    Reply
  • I am having a query inside the view which is returning more than one value. I need all the values to be returned in a separate row while running the view. Can ypu provide me a solution?

    Reply
  • Dear sir
    I am faceing a problem in sql query.
    I have two table.
    1st table is maste table, In master table total 180 records are stored.
    our requirement total 180 rows – 20 rows are stored in 2nd table, then display 160 mark as ‘N’ bcos stored in master table and other records are mark as ‘Y’ bcos its stored in 2nd table.

    Table 1st
    AppID FunCode SubFuncode FunName SubFunName Mark
    Dis 1 30 Main 0 N
    Depos 2 30 &Main 0 N
    Lox 3 30 &Main 0 N
    LCredit 4 30 &Main 0 N

    In 2nd table same records but only 20 records are stored.
    AppID FunCode SubFuncode FunName SubFunName Mark
    Dis 1 30 &Main 0 Y
    Depos 2 30 &Main 0 Y

    In query we try to fetch all Y marked row and N marked with Union query.

    SELECT TBLAPPLICATIONS.Name, TBLGROUPFUNCTIONS.AppID, TBLGROUPFUNCTIONS.FunctionCode, TBLALLFUNCTIONS.DefaultSubFunctionName,
    TBLGROUPFUNCTIONS.SubFunctionCode, ‘Y’ AS AssignedFunction
    FROM TBLGROUPFUNCTIONS INNER JOIN
    TBLALLFUNCTIONS ON TBLGROUPFUNCTIONS.AppID = TBLALLFUNCTIONS.AppID AND
    TBLGROUPFUNCTIONS.FunctionCode = TBLALLFUNCTIONS.FunctionCode AND
    TBLGROUPFUNCTIONS.SubFunctionCode = TBLALLFUNCTIONS.SubFunctionCode INNER JOIN
    TBLAPPLICATIONS ON TBLGROUPFUNCTIONS.AppID = TBLAPPLICATIONS.AppID
    WHERE TBLGROUPFUNCTIONS.GroupID = 12
    UNION ALL
    SELECT TBLAPPLICATIONS.Name, TBLALLFUNCTIONS.AppID, TBLALLFUNCTIONS.FunctionCode,
    TBLALLFUNCTIONS.DefaultSubFunctionName, TBLALLFUNCTIONS.SubFunctionCode, ‘N’ AS AssignedFunction
    FROM TBLALLFUNCTIONS INNER JOIN TBLAPPLICATIONS ON TBLALLFUNCTIONS.AppID = TBLAPPLICATIONS.AppID
    WHERE
    (TBLALLFUNCTIONS.AppID NOT IN (SELECT AppID FROM TBLGROUPFUNCTIONS WHERE GroupID = 12) OR
    TBLALLFUNCTIONS.FunctionCode NOT IN (SELECT FunctionCode FROM TBLGROUPFUNCTIONS WHERE GroupID = 12) OR
    TBLALLFUNCTIONS.SubFunctionCode NOT IN (SELECT SubFunctionCode FROM TBLGROUPFUNCTIONS WHERE GroupID = 12))
    Order BY aPPID,FunctionCode,sUBfUNCTIONcODE

    From this query record is not show correctley.
    Please please immediate reply me.
    I am waiting your reply please suggest me.
    Asit Sinha

    Reply
  • Thanx Mhlove:

    Your solution was succesfull for me, for delete duplicate data in secondary column but the same ID.

    Regards.

    Reply
  • Hi,

    I have an issue with and sql query i’m trying to get right and I seem not to find the answer.

    I have a table with 4 colums: id1,id2,value1,value2.
    I need to select all rows with a unique pair of (id1,id2) and use a 3rd column in the select for something like : sum(value1*value2) for duplicate rows.

    So basicaly, show 1 row for each pair and next get a sum of all the values for that pair (sum the duplicate rows on value1,value 2).

    I only been able to select the unique pairs until now using a group by id1,id2 but i’m having troble with the sums of the duplicate rows for each pair.

    I’d appreciate your help

    Reply
  • Hi Bogdon,

    It would have been easy to provide solution if you would have provided proper Input and desired output. Anyways see it the below works for you based on my understanding

    CREATE TABLE #T1 (ID1 INT, ID2 INT, Value1 INT, Value2 INT)
    INSERT INTO #T1 VALUES (1,1,5,10)
    INSERT INTO #T1 VALUES (1,1,6,12)
    INSERT INTO #T1 VALUES (1,1,50,100)
    INSERT INTO #T1 VALUES (1,2,7,14)
    INSERT INTO #T1 VALUES (1,2,8,16)
    INSERT INTO #T1 VALUES (2,1,2,5)
    INSERT INTO #T1 VALUES (2,1,3,6)
    INSERT INTO #T1 VALUES (2,3,10,20)
    INSERT INTO #T1 VALUES (2,3,25,50)

    SELECT * FROM #T1;

    SELECT ID1, ID2, SUM(Value1*Value2) FROM #T1 GROUP BY ID1, ID2;

    WITH T1 AS (SELECT (ROW_NUMBER() OVER (PARTITION BY ID1 ORDER BY ID1)) AS RNum, * FROM #T1 )
    DELETE FROM T1 WHERE Rnum IN (SELECT a.RNum AS S FROM T1 AS a, T1 AS b WHERE a.RNum > b.RNum GROUP BY a.RNum)

    SELECT * FROM #T1

    DROP TABLE #T1

    Post the input and desired output if the above is not what you want.

    Vamshi

    Reply
  • Sorry WITH statement is not yours…

    CREATE TABLE #T1 (ID1 INT, ID2 INT, Value1 INT, Value2 INT)
    INSERT INTO #T1 VALUES (1,1,5,10)
    INSERT INTO #T1 VALUES (1,1,6,12)
    INSERT INTO #T1 VALUES (1,1,50,100)
    INSERT INTO #T1 VALUES (1,2,7,14)
    INSERT INTO #T1 VALUES (1,2,8,16)
    INSERT INTO #T1 VALUES (2,1,2,5)
    INSERT INTO #T1 VALUES (2,1,3,6)
    INSERT INTO #T1 VALUES (2,3,10,20)
    INSERT INTO #T1 VALUES (2,3,25,50)

    SELECT * FROM #T1;

    SELECT ID1, ID2, SUM(Value1*Value2) FROM #T1 GROUP BY ID1, ID2;

    I dont know why you said GROUP BY didnt worked for you if the above is what you are expecting

    Reply
  • Hi Haneef and Preveen

    Hope this would help you in getting all those employees with Nth Highest salary

    DECLARE @n INT

    SELECT @n = 2 — Change N value to get Nth highest value

    CREATE TABLE #T1 (ID1 INT, [Name] NVARCHAR(50), Salary INT)
    INSERT INTO #T1 VALUES (1, ‘Vamshi’, 1000)
    INSERT INTO #T1 VALUES (2, ‘xxxxx’, 2000)
    INSERT INTO #T1 VALUES (3, ‘yyyyy’, 3000)
    INSERT INTO #T1 VALUES (4, ‘zzzzz’, 4000)
    INSERT INTO #T1 VALUES (5, ‘sssss’, 5000)
    INSERT INTO #T1 VALUES (6, ‘ccccc’, 6000)
    INSERT INTO #T1 VALUES (7, ‘ppppp’, 2000)
    INSERT INTO #T1 VALUES (8, ‘aaaaa’, 4000)
    INSERT INTO #T1 VALUES (9, ‘bbbbb’, 5000)

    SELECT a.ID1, a.[Name], a.Salary
    FROM #T1 AS a
    WHERE (@n-1) = (
    SELECT COUNT(DISTINCT(b.Salary))
    FROM #T1 AS b
    WHERE b.Salary > a.Salary)

    DROP TABLE #T1

    Vamshi

    Reply
  • This is amazing Pinal.

    It resolved my problem.

    Cheers.

    Reply

Leave a Reply