Earlier, I had written on SQL SERVER – Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON, and I received a comment regarding when this option can be useful. On the same day, I met Jacob Sebastian—my close friend and SQL Server MVP, I discussed this question with him. During our discussion, we came up with following example.
When we have situation where we are dealing with INSERT and TRANSACTION, we can see this feature in action. Let us consider an example where we have two tables. One table has all the data and the second table has partial data. If you want to insert all the data from the first table to the second table and insert all nonduplicate values, you can use this feature.
Let us first create a table that has a duplicate record based on the FirstName.
Option 1: With IGNORE_DUP_KEY set OFF
Once a table is created, create another table and a nonclustered index on it with the option IGNORE_DUP_KEY set OFF. Try to insert data from another table into this table by using the INSERT… SELECT statement.
As there are duplicate values, the insert will fail with error and not insert any single row in the table.
Option 2: With IGNORE_DUP_KEY set ON
Now, once again create new table and create nonclustered index on it with the option GNORE_DUP_KEY set ON. Try to insert data from another table into this table using INSERT… SELECT statement.
Even though there are duplicate values in the table, the rows in the table will be inserted; however, the duplicate values will not be inserted and ignored. A warning message will be generated with the script, but the process will not be stopped.
I hope that this example clarifies the use of this feature. Again, as I said earlier, this is a business logic related issue, and it should be used based on your need.
The complete script from the example is given here.
USE tempdb
GO
CREATE TABLE DupData (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(1,'John','Doe','LA');
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(2,'Joe','Doe','LA');
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(3,'John','Doe','LA');
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(4,'Joseph','Doe','LA');
GO
-- Check data from Test Table.
SELECT *
FROM DupData
GO
-- Option 1 - With IGNORE_DUP_KEY set OFF
CREATE TABLE DupIndex (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
--Create Unique Index on Table
CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
ON [dbo].[DupIndex]
(
[FirstName] ASC
) ON [PRIMARY]
GO
-- Insert will throw an error
-- Error will rollback transaction
INSERT INTO DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
-- Table with contain NO row
SELECT *
FROM DupIndex
GO
DROP TABLE DupIndex
GO
-- Option 2 - With IGNORE_DUP_KEY set ON
USE tempdb
GO
CREATE TABLE DupIndex (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
--Create Unique Index on Table with IGNORE_DUP_KEY set ON
CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
ON [dbo].[DupIndex]
(
[FirstName] ASC
) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
GO
-- Insert will throw warning
-- Warning will not insert duplicate row
INSERT INTO DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
-- Table with contain three rows
SELECT *
FROM DupIndex
GO
-- Clean up
DROP TABLE DupIndex
DROP TABLE DupData
GO
Let me know what you think of this option and the type of situation in which you will use this.
Reference: Pinal Dave (https://blog.sqlauthority.com)
20 Comments. Leave new
Hi Pinal,
Nice article.
I have one question,
— Insert will throw warning
— Warning will not insert duplicate row
INSERT INTO DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
From above query, WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] key, why ID =1 is ignored from table DupData while insertion, why not ignore ID = 3?
Hi Pinal,
I tried tweaking the same query using a clustered index and it gives a different result altogether. I am not sure but why is it that the SQL server engine works differently for different indexes. I’m not too much in sql server so I’m not sure whether I am asking the right question as clustered index on First Name sounds wierd.
Hi Paresh,
The data specific query can be answered only if you provide table design and existing data in source and destination tables.
Regards,
Pinal Dave
Hi Pinal,
Interesting use for this option. I always wondered how it might come in handy. The same result could be accomplished by using SELECT DISTINCT on the first table, but that would require sorting whereas the IGNORE_DUP_KEY option does not.
Aaron
You mean like this?
SELECT MIN(ID) AS ID, FirstName, LastName, City
FROM DupData
GROUP BY FirstName, LastName, City
ORDER BY ID
It works but it’s not very generic (or efficient) way to do it, I would say.
Oh, now I see where this might come handy :)
But I think this is really limited situation where you’d want to copy data from one table to another. Good to know that this kind of possibility exists though.
Hi,
Shall i consider this as a new way for removing duplicate records from a table.
Not directly but with temporary table (ordinary will do also if you clean it up afterwards) it is possible. Create temporary table with unique index with IGNORE_DUP_KEY = ON. Select all the rows from original table to temporary table. Truncate original table and select all the rows from temporary table to (now clean) original table.
— There’s duplicate rows now
SELECT * FROM DupData ORDER BY ID
GO
— Create temp table
CREATE TABLE #DupIndex (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
— And index
CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
ON [dbo].[#DupIndex]
(
[FirstName] ASC
) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
GO
— Select rows into temp table (duplicates are ignored)
INSERT INTO #DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
— Truncate table
TRUNCATE TABLE DupData;
GO
— Select distinct rows back to table
INSERT INTO DupData (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM #DupIndex
ORDER BY ID
GO
— Now there’s now duplicate data
SELECT * FROM DupData ORDER BY ID
GO
— Drop temp table
drop table #DupIndex;
GO
I actually found use for this!
I have a situation where I need to select data joining many tables, filter them using various rules and calculate some values out of them. I select data into table valued variable. After that I must update other tables depending what’s in that temp table.
Now I don’t want to have duplicate entries in that temp table. It wouldn’t break anything but there could be A LOT of rows so I would have to do a lot of unnecessary updates and cause major havoc in the db with write locks.
I figured out that I could extract distinct rows out of the temp table but that would cause some extra work elsewhere in my procedure. So instead of that I just added “with (IGNORE_DUP_KEY = ON)” to the tables primary key and it seems to work like a charm :)
awesome.. helpful.
Hi,
I have a table as Test (VouDate Date, Narration Varchar(50)) where both are PK
Data
——–
insert into test values (‘2012-10-01′,’First-1’);
insert into test values (‘2012-10-01′,’First-2’);
insert into test values (‘2012-11-01′,’First-1’);
insert into test values (‘2012-11-01′,’First-2’);
When I select (select * from test order by 1) it returns
2012-10-01 First-1
2012-10-01 First-2
2012-11-01 First-1
2012-11-01 First-2
And when I again insert
insert into test values (‘2012-10-01′,’First-3’);
insert into test values (‘2012-10-01′,’First-0’);
and select (select * from test order by 1), it returns
2012-10-01 First-0
2012-10-01 First-1
2012-10-01 First-2
2012-10-01 First-3
2012-11-01 First-1
2012-11-01 First-2
I want to get the data as
2012-10-01 First-1
2012-10-01 First-2
2012-10-01 First-3
2012-10-01 First-0
2012-11-01 First-1
2012-11-01 First-2
Please let me know if this is possible to sort the data by first column and second column get sorted by the insert sequence.
Thanks,
Yunus
I have tried this without creating the Primary Key in the table, It works fine.
Hi.. this is my point.. i have 200 tables with million of rows each one, we want one unique table with unique email addresses, on that unique mail table i create 2columns pk_id (incremental), and email address , then i created an unique index with ignore_dup_key = ON , everything is fine, but when that table grow.. the insert of the remaining table is extremely slow… any suggestions? of how to accelerate the insert on that huge unique column table???
Hello Pinal,
I am using threading in my .net application and which is causing duplicates in my database, so when two threads are creating duplicates at the same time, in this situation will this IGNORE_DUP_KEY = ON work successfully?
I believe I’ve got the answer to the question posed in the first 1/6/10 reply in the thread… “WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] key, why ID =1 is ignored from table DupData while insertion, why not ignore ID = 3?”. I believe/guess/intuit that it is because SQL Server does NOT “ignore” subsequent duplicates, rather it “writes over” the prior record with the most recent duplicate record’s data. By changing the ORDER in which the records are inserted using ORDER BY ID “DESC” (rather than the default BY ID “ASC”), the ID=1 is kept rather than ID=3 — the former being the more recent insertion of the two. If, indeed, this can be generalized, then it might be a GREAT way to eliminate duplicate keyed entries while keeping the most recent one — just sort the to-be-inserted data in such a way that the most recent duplicate is inserted last.
Thank you. You answered my question. Do you think, this is an efficient method( in terms of time) than getting the most recent record using the Rank functions into a temp table and inserting into the final table?
If there is a business need with same logic then Yes.
Sir,could you please expain what is the difference between unique non clustered index and clusterd index.?