SQL SERVER – Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON – A Transactional Behavior

Earlier, I had written on SQL SERVER – Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON, and I received a comment regarding when this option can be useful. On the same day, I met Jacob Sebastian—my close friend and SQL Server MVP, I discussed this question with him. During our discussion, we came up with following example.

When we have situation where we are dealing with INSERT and TRANSACTION, we can see this feature in action. Let us consider an example where we have two tables. One table has all the data and the second table has partial data. If you want to insert all the data from the first table to the second table and insert all nonduplicate values, you can use this feature.

Let us first create a table that has a duplicate record based on the FirstName.

Option 1: With IGNORE_DUP_KEY set OFF

Once a table is created, create another table and a nonclustered index on it with the option IGNORE_DUP_KEY set OFF. Try to insert data from another table into this table by using the INSERT… SELECT statement.

As there are duplicate values, the insert will fail with error and not insert any single row in the table.

Option 2: With IGNORE_DUP_KEY set ON

Now, once again create new table and create nonclustered index on it with the option GNORE_DUP_KEY set ON. Try to insert data from another table into this table using INSERT… SELECT statement.

Even though there are duplicate values in the table, the rows in the table will be inserted; however, the duplicate values will not be inserted and ignored. A warning message will be generated with the script, but the process will not be stopped.

I hope that this example clarifies the use of this feature. Again, as I said earlier, this is a business logic related issue, and it should be used based on your need.

The complete script from the example is given here.

USE tempdb
GO
CREATE TABLE DupData (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(1,'John','Doe','LA');
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(2,'Joe','Doe','LA');
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(3,'John','Doe','LA');
INSERT INTO DupData (ID,FirstName,LastName,City)
VALUES(4,'Joseph','Doe','LA');
GO
-- Check data from Test Table.
SELECT *
FROM DupData
GO
-- Option 1 - With IGNORE_DUP_KEY set OFF
CREATE TABLE DupIndex (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
--Create Unique Index on Table
CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
ON [dbo].[DupIndex]
(
[FirstName] ASC
) ON [PRIMARY]
GO
-- Insert will throw an error
-- Error will rollback transaction
INSERT INTO DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
-- Table with contain NO row
SELECT *
FROM DupIndex
GO
DROP TABLE DupIndex
GO
-- Option 2 - With IGNORE_DUP_KEY set ON
USE tempdb
GO
CREATE TABLE DupIndex (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
--Create Unique Index on Table with IGNORE_DUP_KEY set ON
CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
ON [dbo].[DupIndex]
(
[FirstName] ASC
) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
GO
-- Insert will throw warning
-- Warning will not insert duplicate row
INSERT INTO DupIndex (ID,FirstName,LastName,City)
SELECT ID,FirstName,LastName,City
FROM DupData
ORDER BY ID
GO
-- Table with contain three rows
SELECT *
FROM DupIndex
GO
-- Clean up
DROP TABLE DupIndex
DROP TABLE DupData
GO

Let me know what you think of this option and the type of situation in which you will use this.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

18 thoughts on “SQL SERVER – Unique Nonclustered Index Creation with IGNORE_DUP_KEY = ON – A Transactional Behavior

  1. Hi Pinal,

    Nice article.

    I have one question,

    — Insert will throw warning
    — Warning will not insert duplicate row
    INSERT INTO DupIndex (ID,FirstName,LastName,City)
    SELECT ID,FirstName,LastName,City
    FROM DupData
    ORDER BY ID
    GO

    From above query, WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] key, why ID =1 is ignored from table DupData while insertion, why not ignore ID = 3?

    Liked by 1 person

    • Hi Pinal,
      I tried tweaking the same query using a clustered index and it gives a different result altogether. I am not sure but why is it that the SQL server engine works differently for different indexes. I’m not too much in sql server so I’m not sure whether I am asking the right question as clustered index on First Name sounds wierd.

      Like

  2. Hi Pinal,
    Interesting use for this option. I always wondered how it might come in handy. The same result could be accomplished by using SELECT DISTINCT on the first table, but that would require sorting whereas the IGNORE_DUP_KEY option does not.

    Aaron

    Like

    • You mean like this?

      SELECT MIN(ID) AS ID, FirstName, LastName, City
      FROM DupData
      GROUP BY FirstName, LastName, City
      ORDER BY ID

      It works but it’s not very generic (or efficient) way to do it, I would say.

      Like

  3. Oh, now I see where this might come handy :)

    But I think this is really limited situation where you’d want to copy data from one table to another. Good to know that this kind of possibility exists though.

    Like

    • Not directly but with temporary table (ordinary will do also if you clean it up afterwards) it is possible. Create temporary table with unique index with IGNORE_DUP_KEY = ON. Select all the rows from original table to temporary table. Truncate original table and select all the rows from temporary table to (now clean) original table.

      — There’s duplicate rows now
      SELECT * FROM DupData ORDER BY ID
      GO
      — Create temp table
      CREATE TABLE #DupIndex (ID INT,
      FirstName VARCHAR(100),
      LastName VARCHAR(100),
      City VARCHAR(100))
      GO
      — And index
      CREATE UNIQUE NONCLUSTERED INDEX [IX_DupIndex_FirstName]
      ON [dbo].[#DupIndex]
      (
      [FirstName] ASC
      ) WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
      GO
      — Select rows into temp table (duplicates are ignored)
      INSERT INTO #DupIndex (ID,FirstName,LastName,City)
      SELECT ID,FirstName,LastName,City
      FROM DupData
      ORDER BY ID
      GO
      — Truncate table
      TRUNCATE TABLE DupData;
      GO
      — Select distinct rows back to table
      INSERT INTO DupData (ID,FirstName,LastName,City)
      SELECT ID,FirstName,LastName,City
      FROM #DupIndex
      ORDER BY ID
      GO
      — Now there’s now duplicate data
      SELECT * FROM DupData ORDER BY ID
      GO
      — Drop temp table
      drop table #DupIndex;
      GO

      Like

  4. I actually found use for this!

    I have a situation where I need to select data joining many tables, filter them using various rules and calculate some values out of them. I select data into table valued variable. After that I must update other tables depending what’s in that temp table.

    Now I don’t want to have duplicate entries in that temp table. It wouldn’t break anything but there could be A LOT of rows so I would have to do a lot of unnecessary updates and cause major havoc in the db with write locks.

    I figured out that I could extract distinct rows out of the temp table but that would cause some extra work elsewhere in my procedure. So instead of that I just added “with (IGNORE_DUP_KEY = ON)” to the tables primary key and it seems to work like a charm :)

    Like

  5. Hi,
    I have a table as Test (VouDate Date, Narration Varchar(50)) where both are PK

    Data
    ——–
    insert into test values (‘2012-10-01′,’First-1′);
    insert into test values (‘2012-10-01′,’First-2′);

    insert into test values (‘2012-11-01′,’First-1′);
    insert into test values (‘2012-11-01′,’First-2′);

    When I select (select * from test order by 1) it returns
    2012-10-01 First-1
    2012-10-01 First-2
    2012-11-01 First-1
    2012-11-01 First-2

    And when I again insert
    insert into test values (‘2012-10-01′,’First-3′);
    insert into test values (‘2012-10-01′,’First-0′);

    and select (select * from test order by 1), it returns
    2012-10-01 First-0
    2012-10-01 First-1
    2012-10-01 First-2
    2012-10-01 First-3
    2012-11-01 First-1
    2012-11-01 First-2

    I want to get the data as
    2012-10-01 First-1
    2012-10-01 First-2
    2012-10-01 First-3
    2012-10-01 First-0

    2012-11-01 First-1
    2012-11-01 First-2

    Please let me know if this is possible to sort the data by first column and second column get sorted by the insert sequence.

    Thanks,
    Yunus

    Like

  6. Hi.. this is my point.. i have 200 tables with million of rows each one, we want one unique table with unique email addresses, on that unique mail table i create 2columns pk_id (incremental), and email address , then i created an unique index with ignore_dup_key = ON , everything is fine, but when that table grow.. the insert of the remaining table is extremely slow… any suggestions? of how to accelerate the insert on that huge unique column table???

    Like

  7. Pingback: SQL SERVER – Select and Delete Duplicate Records – SQL in Sixty Seconds #036 – Video « SQL Server Journey with SQL Authority

  8. Pingback: SQL SERVER – Weekly Series – Memory Lane – #011 « SQL Server Journey with SQL Authority

  9. Hello Pinal,

    I am using threading in my .net application and which is causing duplicates in my database, so when two threads are creating duplicates at the same time, in this situation will this IGNORE_DUP_KEY = ON work successfully?

    Like

  10. I believe I’ve got the answer to the question posed in the first 1/6/10 reply in the thread… “WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY] key, why ID =1 is ignored from table DupData while insertion, why not ignore ID = 3?”. I believe/guess/intuit that it is because SQL Server does NOT “ignore” subsequent duplicates, rather it “writes over” the prior record with the most recent duplicate record’s data. By changing the ORDER in which the records are inserted using ORDER BY ID “DESC” (rather than the default BY ID “ASC”), the ID=1 is kept rather than ID=3 — the former being the more recent insertion of the two. If, indeed, this can be generalized, then it might be a GREAT way to eliminate duplicate keyed entries while keeping the most recent one — just sort the to-be-inserted data in such a way that the most recent duplicate is inserted last.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s