SQL SERVER – Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON – A Transactional Behavior

January 6, 2010

Earlier, I had written on SQL SERVER – Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON, and I received a comment regarding when this option can be useful. On the same day, I met Jacob Sebastian—my close friend and SQL Server MVP, I discussed this question with him. During our discussion, we came up with following example.

When we have situation where we are dealing with INSERT and TRANSACTION, we can see this feature in action. Let us consider an example where we have two tables. One table has all the data and the second table has partial data. If you want to insert all the data from the first table to the second table and insert all nonduplicate values, you can use this feature.

Let us first create a table that has a duplicate record based on the FirstName.

SQL SERVER - Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON - A Transactional Behavior duptr1

Option 1: With IGNORE_DUP_KEY set OFF

Once a table is created, create another table and a nonclustered index on it with the option IGNORE_DUP_KEY set OFF. Try to insert data from another table into this table by using the INSERT… SELECT statement.

SQL SERVER - Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON - A Transactional Behavior duptr2

As there are duplicate values, the insert will fail with error and not insert any single row in the table.

Option 2: With IGNORE_DUP_KEY set ON

Now, once again create new table and create nonclustered index on it with the option GNORE_DUP_KEY set ON. Try to insert data from another table into this table using INSERT… SELECT statement.

SQL SERVER - Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON - A Transactional Behavior duptr3

Even though there are duplicate values in the table, the rows in the table will be inserted; however, the duplicate values will not be inserted and ignored. A warning message will be generated with the script, but the process will not be stopped.

I hope that this example clarifies the use of this feature. Again, as I said earlier, this is a business logic related issue, and it should be used based on your need.

The complete script from the example is given here.

USE tempdb GO CREATE TABLE DupData (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO INSERT INTO DupData (ID,FirstName,LastName,City) VALUES(1,'John','Doe','LA'); INSERT INTO DupData (ID,FirstName,LastName,City) VALUES(2,'Joe','Doe','LA'); INSERT INTO DupData (ID,FirstName,LastName,City) VALUES(3,'John','Doe','LA'); INSERT INTO DupData (ID,FirstName,LastName,City) VALUES(4,'Joseph','Doe','LA'); GO -- Check data from Test Table. SELECT * FROM DupData GO -- Option 1 - With IGNORE_DUP_KEY set OFF CREATE TABLE DupIndex (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO --Create Unique Index on Table CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName] ON [dbo].[DupIndex] ( [FirstName] ASC ) ON [PRIMARY] GO -- Insert will throw an error -- Error will rollback transaction INSERT INTO DupIndex (ID,FirstName,LastName,City) SELECT ID,FirstName,LastName,City FROM DupData ORDER BY ID GO -- Table with contain NO row SELECT * FROM DupIndex GO DROP TABLE DupIndex GO -- Option 2 - With IGNORE_DUP_KEY set ON USE tempdb GO CREATE TABLE DupIndex (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO --Create Unique Index on Table with IGNORE_DUP_KEY set ON CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName] ON [dbo].[DupIndex] ( [FirstName] ASC ) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] GO -- Insert will throw warning -- Warning will not insert duplicate row INSERT INTO DupIndex (ID,FirstName,LastName,City) SELECT ID,FirstName,LastName,City FROM DupData ORDER BY ID GO -- Table with contain three rows SELECT * FROM DupIndex GO -- Clean up DROP TABLE DupIndex DROP TABLE DupData GO

Let me know what you think of this option and the type of situation in which you will use this.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Index, SQL Scripts, SQL Transactions

SQL SERVER – SQL Server RDL Specification

SQLAuthority News – Webcasts – Resources for IT Managers and their Teams

20 Comments. Leave new

Paresh Prajapati
January 6, 2010 11:55 am
Hi Pinal,
Nice article.
I have one question,
— Insert will throw warning
— Warning will not insert duplicate row
INSERT INTO DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
From above query, WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] key, why ID =1 is ignored from table DupData while insertion, why not ignore ID = 3?
Reply
- Ivan
  November 3, 2012 9:54 am
  Hi Pinal,
  I tried tweaking the same query using a clustered index and it gives a different result altogether. I am not sure but why is it that the SQL server engine works differently for different indexes. I’m not too much in sql server so I’m not sure whether I am asking the right question as clustered index on First Name sounds wierd.
  Reply
Pinal Dave
January 6, 2010 12:37 pm
Hi Paresh,
The data specific query can be answered only if you provide table design and existing data in source and destination tables.
Regards,
Pinal Dave
Reply
Aaron Sentell
January 6, 2010 7:03 pm
Hi Pinal,
Interesting use for this option. I always wondered how it might come in handy. The same result could be accomplished by using SELECT DISTINCT on the first table, but that would require sorting whereas the IGNORE_DUP_KEY option does not.
Aaron
Reply
- Marko Parkkola
  January 7, 2010 2:03 am
  You mean like this?
  SELECT MIN(ID) AS ID, FirstName, LastName, City
  FROM DupData
  GROUP BY FirstName, LastName, City
  ORDER BY ID
  It works but it’s not very generic (or efficient) way to do it, I would say.
  Reply
Marko Parkkola
January 7, 2010 1:55 am
Oh, now I see where this might come handy :)
But I think this is really limited situation where you’d want to copy data from one table to another. Good to know that this kind of possibility exists though.
Reply
vijayan
January 8, 2010 1:59 pm
Hi,
Shall i consider this as a new way for removing duplicate records from a table.
Reply
- Marko Parkkola
  January 8, 2010 4:26 pm
  Not directly but with temporary table (ordinary will do also if you clean it up afterwards) it is possible. Create temporary table with unique index with IGNORE_DUP_KEY = ON. Select all the rows from original table to temporary table. Truncate original table and select all the rows from temporary table to (now clean) original table.
  — There’s duplicate rows now
  SELECT * FROM DupData ORDER BY ID
  GO
  — Create temp table
  CREATE TABLE #DupIndex (ID INT,
  FirstName VARCHAR(100),
  LastName VARCHAR(100),
  City VARCHAR(100))
  GO
  — And index
  CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
  ON [dbo].[#DupIndex]
  (
  [FirstName] ASC
  ) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
  GO
  — Select rows into temp table (duplicates are ignored)
  INSERT INTO #DupIndex (ID,FirstName,LastName,City)
  SELECT ID,FirstName,LastName,City
  FROM DupData
  ORDER BY ID
  GO
  — Truncate table
  TRUNCATE TABLE DupData;
  GO
  — Select distinct rows back to table
  INSERT INTO DupData (ID,FirstName,LastName,City)
  SELECT ID,FirstName,LastName,City
  FROM #DupIndex
  ORDER BY ID
  GO
  — Now there’s now duplicate data
  SELECT * FROM DupData ORDER BY ID
  GO
  — Drop temp table
  drop table #DupIndex;
  GO
  Reply
Marko Parkkola
January 22, 2010 11:58 am
I actually found use for this!
I have a situation where I need to select data joining many tables, filter them using various rules and calculate some values out of them. I select data into table valued variable. After that I must update other tables depending what’s in that temp table.
Now I don’t want to have duplicate entries in that temp table. It wouldn’t break anything but there could be A LOT of rows so I would have to do a lot of unnecessary updates and cause major havoc in the db with write locks.
I figured out that I could extract distinct rows out of the temp table but that would cause some extra work elsewhere in my procedure. So instead of that I just added “with (IGNORE_DUP_KEY = ON)” to the tables primary key and it seems to work like a charm :)
Reply
deepak mariswamy
September 16, 2010 10:18 am
awesome.. helpful.
Reply
Yunus
January 12, 2012 1:14 pm
Hi,
I have a table as Test (VouDate Date, Narration Varchar(50)) where both are PK
Data
——–
insert into test values (‘2012-10-01′,’First-1’);
insert into test values (‘2012-10-01′,’First-2’);
insert into test values (‘2012-11-01′,’First-1’);
insert into test values (‘2012-11-01′,’First-2’);
When I select (select * from test order by 1) it returns
2012-10-01 First-1
2012-10-01 First-2
2012-11-01 First-1
2012-11-01 First-2
And when I again insert
insert into test values (‘2012-10-01′,’First-3’);
insert into test values (‘2012-10-01′,’First-0’);
and select (select * from test order by 1), it returns
2012-10-01 First-0
2012-10-01 First-1
2012-10-01 First-2
2012-10-01 First-3
2012-11-01 First-1
2012-11-01 First-2
I want to get the data as
2012-10-01 First-1
2012-10-01 First-2
2012-10-01 First-3
2012-10-01 First-0
2012-11-01 First-1
2012-11-01 First-2
Please let me know if this is possible to sort the data by first column and second column get sorted by the insert sequence.
Thanks,
Yunus
Reply
Yunus
January 12, 2012 1:43 pm
I have tried this without creating the Primary Key in the table, It works fine.
Reply
Brian P. Hernandez
September 14, 2012 11:55 pm
Hi.. this is my point.. i have 200 tables with million of rows each one, we want one unique table with unique email addresses, on that unique mail table i create 2columns pk_id (incremental), and email address , then i created an unique index with ignore_dup_key = ON , everything is fine, but when that table grow.. the insert of the remaining table is extremely slow… any suggestions? of how to accelerate the insert on that huge unique column table???
Reply
Anita
January 17, 2013 8:02 pm
Hello Pinal,
I am using threading in my .net application and which is causing duplicates in my database, so when two threads are creating duplicates at the same time, in this situation will this IGNORE_DUP_KEY = ON work successfully?
Reply
Dick Borden
May 8, 2013 12:43 am
I believe I’ve got the answer to the question posed in the first 1/6/10 reply in the thread… “WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] key, why ID =1 is ignored from table DupData while insertion, why not ignore ID = 3?”. I believe/guess/intuit that it is because SQL Server does NOT “ignore” subsequent duplicates, rather it “writes over” the prior record with the most recent duplicate record’s data. By changing the ORDER in which the records are inserted using ORDER BY ID “DESC” (rather than the default BY ID “ASC”), the ID=1 is kept rather than ID=3 — the former being the more recent insertion of the two. If, indeed, this can be generalized, then it might be a GREAT way to eliminate duplicate keyed entries while keeping the most recent one — just sort the to-be-inserted data in such a way that the most recent duplicate is inserted last.
Reply
- Ashok Maganti
  April 4, 2017 9:46 pm
  Thank you. You answered my question. Do you think, this is an efficient method( in terms of time) than getting the most recent record using the Rank functions into a temp table and inserting into the final table?
  Reply
  - Pinal Dave
    April 5, 2017 4:14 am
    If there is a business need with same logic then Yes.
meera
January 22, 2015 11:34 am
Sir,could you please expain what is the difference between unique non clustered index and clusterd index.?
Reply