SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

I had previously penned down two popular snippets regarding deleting duplicate rows and counting duplicate rows. Today, we will examine another very quick code snippet where we will delete duplicate rows using CTE and ROW_NUMBER() feature of SQL Server 2005 and SQL Server 2008.

This method is improved over the earlier method as it not only uses CTE and ROW_NUMBER, but also demonstrates the power of CTE with DELETE statement. We will have a comprehensive discussion about it later in this article. For now, let us first create a sample table from which we will delete records.

/* Create Table with 7 entries - 3 are duplicate entries */
CREATE TABLE DuplicateRcordTable (Col1 INT, Col2 INT)
INSERT INTO DuplicateRcordTable
SELECT 1, 1
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 1 --duplicate
UNION ALL
SELECT 1, 2
UNION ALL
SELECT 1, 2 --duplicate
UNION ALL
SELECT 1, 3
UNION ALL
SELECT 1, 4
GO

The above table has total 7 records, out of which 3 are duplicate records. Once the duplicates are removed we will have only 4 records left.

/* It should give you 7 rows */
SELECT *
FROM DuplicateRcordTable
GO


The most interesting part of this is yet to come. We will use CTE that will re-generate the same table with additional column, which is row number. In our case, we have Col1 and Col2 and both the columns qualify as duplicate rows. It may be a different set of rows for each different query like this. Another point to note here is that once CTE is created  DELETE statement can be run on it. We will put a condition here – when we receive more than one rows of record, we will remove the row which is not the first one. When DELETE command is executed over CTE it in fact deletes from the base table used in CTE.

/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM
CTE
WHERE DuplicateCount > 1
GO

It is apparent that after delete command has been run, we will have only 4 records, which is almost the same result which we would have got with DISTINCT, with this resultset. If we had more than 2 columns and we had to run unique on only two columns, our distinct might have not worked here . In this case, we would have to use above the mentioned method.

/* It should give you Distinct 4 records */
SELECT *
FROM DuplicateRcordTable
GO


This method is a breeze and we can use this for SQL Server version 2005 and the later versions.

Click to Download Scripts

Watch the view to see the above concept in action:

Reference : Pinal Dave (http://blog.SQLAuthority.com)

About these ads

SQL SERVER – UDF – Remove Duplicate Chars From String

Few days ago, I received following wonderful UDF from one of this blog reader. This UDF is written for specific purpose of removing duplicate chars string from one large string. Virendra Chauhan, author of this UDF is working as DBA in Lutheran Health Network.

CREATE FUNCTION dbo.REMOVE_DUPLICATE_INSTR
(@datalen_tocheck INT,@string VARCHAR(255))
RETURNS VARCHAR(255)
AS
BEGIN
DECLARE
@str VARCHAR(255)
DECLARE @count INT
DECLARE
@start INT
DECLARE
@result VARCHAR(255)
DECLARE @end INT
SET
@start=1
SET @end=@datalen_tocheck
SET @count=@datalen_tocheck
SET @str = @string
WHILE (@count <=255)
BEGIN
IF
(@result IS NULL)
BEGIN
SET
@result=''
END
SET
@result=@result+SUBSTRING(@str,@start,@end)
SET @str=REPLACE(@str,SUBSTRING(@str,@start,@end),'')
SET @count=@count+@datalen_tocheck
END
RETURN
@result
END
GO>, 1)

Usage:
SET CONCAT_NULL_YIELDS_NULL OFF

SELECT dbo.Remove_duplicate_instr(<CHARacter length OF a

duplicate SUBSTRING >,<string contain duplicate>)
Example:
To keep char set in a string unique and remove duplicate 3 char long string run this UDF as inline function.

SET CONCAT_NULL_YIELDS_NULL OFF

SELECT dbo.Remove_duplicate_instr(3,f123456789123456456
Resultset:
123456789

Reference : Pinal Dave (http://blog.SQLAuthority.com), Virendra Chauhan (DBA)

SQL SERVER – Count Duplicate Records – Rows

In my previous article SQL SERVER – Delete Duplicate Records – Rows, we have seen how we can delete all the duplicate records in one simple query. In this article we will see how to find count of all the duplicate records in the table. Following query demonstrates usage of GROUP BY, HAVING, ORDER BY in one query and returns the results with duplicate column and its count in descending order.

SELECT YourColumn, COUNT(*) TotalCount
FROM YourTable
GROUP BY YourColumn
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

Watch the view to see the above concept in action:

Reference : Pinal Dave (http://blog.SQLAuthority.com) , SQL SERVER – Delete Duplicate Records – Rows

SQL SERVER – Delete Duplicate Records – Rows

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.

DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Watch the view to see the above concept in action:

Reference : Pinal Dave (http://blog.SQLAuthority.com)