I have been writing a weekly round up from my blog where I go over last six years of blog posts and pick the best posts from the pasts. While I do this, there are two major place where I focus 1) If there are change in features – I re-blog about it with additional details or 2) If I have not provided complete information six years ago, I try to fill up the gap now. Well, just like everything my knowledge and writing skills have evolved. Before continuing please read my latest memory lane blog post where in 2007 I wrote scripts for Primary Key and Unique Key.
November 2006 was when I started to learn more about SQL and have been only 4 months in the product, I was still exploring various subjects. I wrote a blog post describing about how Primary Key and Unique Key are different. Everything which I wrote there is correct, however, there are a few more details one should learn when it is about Primary Key and Clustered Index.
Here is the common misconception prevailing in the industry.
Primary Key has to be Clustered Index.
In reality the statement should be corrected as follows:
Primary Key can be Clustered or Non-clustered but it is a common best practice to create a Primary Key as Clustered Index.
Well, now we have corrected the statement let us understand a bit more in detail. Primary Key should be uniquely identifying column of the table and it should be NOT NULL. A good (most of the time) candidate of the clustered index key also uniquely identifies column and NOT NULL (most of the time). Well, that means it is a good idea to create a Primary Key Clustered so it solve both the problems together. Keeping these facts in mind SQL Server automatically creates Clustered Index on the Primary Key when the table is created. Developers often do not specify which column should have clustered index so by default Primary Key becomes Clustered Index. This practice is now extremely common and lots of people have forgotten that Primary Key and Clustered Index is two different things. They can be same column but they do not have to be.
Well here are four examples we will see where we will learn the behavior of SQL Server when it is about Primary Key and Clustered Index.
- Scenario 1 : Primary Key will default to Clustered Index
- Scenario 2: Primary Key is defined as a Non-clustered Index
- Scenario 3: Primary Key defaults to Non-Clustered Index with another column defined as a Clustered Index
- Scenario 4: Primary Key defaults to Clustered Index with other index defaults to Non-clustered index
Now let us see each of the scenarios in detail.
Scenario 1 : Primary Key will default to Clustered Index
In this case we will create only Primary Key and when we check the kind of index created on the table we will notice that it has created clustered index automatically over it.
-- Case 1 Primary Key Defaults to Clustered Index
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY,
Col1 INT NOT NULL)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc] FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO
Scenario 2: Primary Key is defined as a Non-clustered Index
In this case we will explicitly defined Primary Key as a non-clustered index and it will create it as a non-clustered index. It proves that Primary Key can be non-clustered index.
-- Case 2 Primary Key Non-clustered Index
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY NONCLUSTERED,
Col1 INT NOT NULL)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc] FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO
Scenario 3: Primary Key defaults to Non-Clustered Index with another column defined as a Clustered Index
In this case we will create clustered index on another column, SQL Server will automatically create a Primary Key as a non-clustered index as clustered index is specified on another column.
-- Case 3 Primary Key Defaults to Non-clustered Index
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY,
Col1 INT NOT NULL UNIQUE CLUSTERED)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc] FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO
Scenario 4: Primary Key defaults to Clustered Index with other index defaults to Non-clustered index
In this case we will create two indexes on the both the tables but we will not specify the type of the index on the columns. When we check the results we will notice that Primary Key is automatically defaulted to Clustered Index and another column as a Non-clustered index.
-- Case 4 Primary Key and Defaults
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY,
Col1 INT NOT NULL UNIQUE)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc] FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO
I think above examples clarifies if there are any confused related to Primary and Clustered Index.
Now here is the question I often get asked what can be the reason for creating Primary Key and Clustered Index Key on different columns. Well, there are many scenarios when this can be true. It is possible that you have column SSN which you want to create as a Primary Key but do not want make it as a clustered index key because you have uniquely increasing identity column which best suites your need for that table (again this is just an example – you can argue exactly the opposite way as well). You are welcome to continue the discussion on this subject in the comments field or a dedicated blog post I wrote about it years ago over here. There are few really good comments there – I think that blog post is now a gold mine to understand this concept.
Reference : Pinal Dave (https://blog.sqlauthority.com)
43 Comments. Leave new
Hi Pinal,
Let me add some points.
The clustered index will store the value in the leaf node and its corresponding data columns in the same data page.
Ex:
My Employee id is 0087 and it is Clustered index column. Whenever I search the table with employee id, it will easily get all my information like name,Date of joining etc…. because they all stored in the same row of the employee id in the table as so the data page since it is clustered.
NON-Clustered Index.
If you have assigned a column as Non-clustered index. The column values were randomly split in the data pages and thus searching will be little slower compare to the clustered index which has every column in assigned to it in a same data page.
Ex:
If we assign the column ‘Code’ as non-clustered index in a dummy table. Consider we have more than 5000 values with code ‘500’. They will split them into different data pages. So if you search using ‘where code=’500”. The search will finds them in all of the data pages and its remaining column results, because the page reads will be for every data pages.
For the above example say if the column ‘Code’ is clustered index in a dummy table. The values will stored in the data pages in same order and it will store all the ‘code’ column values within few data pages, so the page reads will be minimal and thus the data retrieval will be quicker as comparing to the non-clustered index.
So it is not a must to add Primary key as clustered.
Thanks,
AR
I have a doubt. please clarify. what you meant data pages here?
I think ‘tables’.
thanks
Kavitha
The data in the tables will be in the Data pages.
Another common misconception most people have is “Creating a primary key on a table makes reads fast”
I want to clarify, Primary key is a constraint, it has nothing to do with fast reads. Like Pinal explained, “Primary Key” behind the scenes creates a clustered Index which makes the reads fast. Optionally we can also create non clustered for primary key which contributes to faster reads.
Finally, to compare indexes and Primary Key. You can create an index without a primary key but you cannot create a primary key without an index.
~ IM.
Hi Pinal and frnds,
that was a nice Explanation thank you, i have a question i always face this the project i am running now major FMCG Project. what problem i face every month, i have created index on my sales table, where sale has to be entered in bulk at end of the month, so we can generate Reports and all, but when i try to insert bulk data in table, it get slows because i have indexs on table, what is the best practice to solve this problem.
I do it by disabling index for the current time is it right?
Thank you,
Rajmendra
when have to make clustered with primary key and non cluster & vice versa…
One additional example of why you would want to have a PK and clustered index on different columns is if you have an insert heavy table with a GUID as your identifying column. GUIDs are non-sequential so having a clustered index on a GUID column can cause some significant performance hindrances due to the clustered index insert. This can also cause index fragmentation. In one particular system at my company we made the GUID the PK and created a clustered index on InsertDate DESC. This allowed us to quickly query the latest records due to the clustered sorting and removed the performance issue of the GUID. Even if the InsertDate is not unique, the clustered index will add a “uniquifier” in the index.
Hie frnz,
Indeed a good post…I jst have a query..what if my table doesnt have any primary key,
then will my non clustered index work ??? plz let me know as m a new learner..
Thanks in advance,,
hi frndz
I have 3.2 lak data with repeats how I identify duplicate no’s. but one condition using primary key through. I already try ed.
1. I was created empty table & i set the Primary key ( column name “Mobile_Number)
2. I insert the data using Insert into statement ( Insert into file2 select * from file1) it was successfully uploaded after that I was raised again Insert statement ( insert into file2 select * from file3 but file not appending the data. I checked the second data they have 2 mobile numbers are duplicated.
What I’m saying If duplicated the mobile numbers in skip the mobile number and remaining data how to append the data
can u give me help………
pls send me [email removed]
thanks
raj
what is the procedure to insert rowws into a billion rows table
Dropt the Pk of target table
drop the clustered index
drop all non clustered index
insert the rows from source to target
Recreate Pk
Recreate clustered and non clustered index
Please let me know whether I am correct
Please correct me if I am wrong
HI,
I have defined column1 as primary key and it defaulted it to clustered index.column2 as non clustered index.
when i do a select * from table by default it is using the primary key index,i want the query to use the index of column2. How do i do it?And why does it uses primary key index by default.Please help me.
Hi
Query optimizer should use the appropriate index, however, if you want to force it, you can use the following syntax..
SELECT Column lst..
FROM Table1
WITH (INDEX(idx_Table1_2))
Thx.
hi i like your blog
Hi Pinal,
I have one question :- I know that a table can have only one Primary Key. I used to think the reason behind that is with a Primary Key a clustered index is always there and since clustered index can have only one physical sorting, there fore only one Primary Key.
But not if PK can have nonclustered index then why only one PK?
Thanks
Vasu
Hi Vasu,
I would like to discuss on this,as we know the primary key is belong to physical data storage of table ,that mean in which manner the data should be stored.
and sql server allow us to create only one primary key for a table,because whenever we create primary key on table ,it create clustered index internally that mean in simple language we can say that primary key is nothing but clustered index on table that we are creating while creating table or altering table(simple way to create clustered index) and as we know the clustered index arrange data physically in memory.
we can arrange data only once physically.
and as per theory we can arrange data physically only once that mean we can create only one clustered index on single table and because of this we have only one primary key on a table.
As per me that’s a logic behind one primary key on table.
if you have other logic than this please let me know.
Many Thanks….
Hi Pinal,
One more question :- In Case 2 Primary Key Non-clustered Index, what is role of Heap?
Thanks
Vasu
Hi Friends,
Apologies for this type of questions but just I want to make clear about exact physical data storing structure about index.
As we have multiple type of index like clustered index, nonclustered index, unique clustered index, unique nonclustered index,index with include option(2008), filter index(2005), index view and column stored index(2012).
as we know the whenever we create column stored index on table by considering columns, sql server engine re-manage the data as per column per page.
As we know the clustered index is physical index that mean it arrange the data physically as per index key by keeping index data with key(data pages) and we can create only one clustered index and multiple non clustered index in database on single table.
I am agree with this but what happened when we will create columns stored index on table having clustered index. then how it arrange the data as per column in single page and how it keep the data values(data pages with index key).
Please suggest me on this.
Many Thanks…………….
How is the data stored and sorted in BTree for Scenario 2
Yeah i can understand that how it works.But i got a question how to write this query in sql
Create Table: CREATE TABLE `purchases_index` (
`transactionid` bigint(20) NOT NULL AUTO_INCREMENT,
`dateandtime` datetime DEFAULT NULL,
`cashregisterid` int(11) NOT NULL,
`customerid` int(11) NOT NULL,
`productid` int(11) NOT NULL,
`price` float NOT NULL
) ENGINE=TOKUDB AUTO_INCREMENT=40000001 DEFAULT CHARSET=latin1
which is in mysql
I never seen a explanation like this… I really thank you to give a blog like this which make us to understand in an easier way… I don’t know anything about index before reading this but in the first read itself i got the complete idea and knowledge about it.. Hats off to you knowledge to give a notes in this way..
Hi Pinal,
As we know whenever we create any clustered index on any of the column , it arrange the complete data of the table physically in sorted order by index key.
And clustered index create as a separate physical object of database.
As data stored in database as 8KB of pages whenever we create any table.my confusion is that, how the clustered index create and how it map with this pages. can clustered index create new pages or use old pages that was created while creating table if clustered index use pages to sort the data as per index key.
If clustered index create new pages then its size should be equal to table size but that is not a thing.
If clustered index do not crate new pages and use old pages then its size should be negligible but that is also not the thing. And if it use old pages then what happened if multiple rows use single page while creating table(how index arrange the data in sorted order as per index kay).
How clustered index get created and how It uses the pages while sorting the data.
Please suggest me on the same.
Hi,
This is chowdaiah nice explanation recently i’am attend one interview they will ask this question but i don’t know now i have complete idea about that , so thanks for your explanation
Thanks& Regards
Seela Chowdaiah.
I am glad it helped you @Chowdaiah