SQL SERVER – Difference Between Candidate Keys and Primary Key

Introduction

Not long ago, I had an interesting and extended debate with one of my friends regarding which column should be primary key in a table. The debate instigated an in-depth discussion about candidate keys and primary keys. My present article revolves around the two types of keys.

Let us first try to grasp the definition of the two keys.

Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.

Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.

One needs to be very careful in selecting the Primary Key as an incorrect selection can adversely impact the database architect and future normalization. For a Candidate Key to qualify as a Primary Key, it should be Non-NULL and unique in any domain. I have observed quite often that Primary Keys are seldom changed. I would like to have your feedback on not changing a Primary Key.

An Example to Understand Keys

Let us look at an example where we have multiple Candidate Keys, from which we will select an appropriate Primary Key.

Given below is an example of a table having three columns that can qualify as single column Candidate Key, and on combining more than one column the number of possible Candidate Keys touches seven. A point to remember here is that only one column can be selected as Primary Key. The decision of Primary Key selection from possible combinations of Candidate Key is often very perplexing but very imperative!

On running the following script it will always give 504 rows in all the options. This proves that they are all unique in database and meet the criteria of a Primary Key.

Run the following script to verify if all the tables have unique values or not.

USE AdventureWorks
GO
SELECT *
FROM Production.Product
GO
SELECT DISTINCT ProductID
FROM Production.Product
GO
SELECT DISTINCT Name
FROM Production.Product
GO
SELECT DISTINCT ProductNumber
FROM Production.Product
GO

All of the above queries will return the same number of records; hence, they all qualify as Candidate Keys. In other words, they are the candidates for Primary Key. There are few points to consider while turning any Candidate Key into a Primary Key.

Select a key that does not contain NULL

It may be possible that there are Candidate Keys that presently do not contain value (not null) but technically they can contain null. In this case, they will not qualify for Primary Key. In the following table structure, we can see that even though column [name] does not have any NULL value it does not qualify as it has the potential to contain NULL value in future.

CREATE TABLE [Production].[Product](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[Name] [dbo].[Name] NULL,
[ProductNumber] [nvarchar](25) NOT NULL,
[Manufacturer] [nvarchar](25) NOT NULL
)

Select a key that is unique and does not repeat

It may be possible that Candidate Keys that are unique at this moment may contain duplicate value. These kinds of Candidate Keys do not qualify for Primary Key. Let us understand this scenario by looking into the example given above. It is absolutely possible that two Manufacturers can create products with the same name; the resulting name will be a duplicate and only the name of the Manufacturer will differ in the table. This disqualifies Name in the table to be a Primary Key.

Make sure that Primary Key does not keep changing

This is not a hard and fast rule but rather a general recommendation: Primary Key values should not keep changing. It is quite convenient for a database if Primary Key is static. Primary Keys are referenced in numerous places in the database, from Index to Foreign Keys.  If they keep changing then they can adversely affect database integrity, data statistics as well as internal of Indexes.

Selection of Primary Key

Let us examine our case by applying the above three rules to the table and decide on the appropriate candidate for Primary Key. Name can contain NULL so it disqualifies as per Rule 1 and Rule 2. Product Number can be duplicated for different Manufacturers so it disqualifies as per Rule 2. ProductID is Identity and Identity column cannot be modified. So, in this case ProductID qualifies as Primary Key.

Please note that many database experts suggest that it is not a good practice to make Identity Column as Primary Key. The reason behind this suggestion is that many times Identity Column that has been assigned as Primary Key does not play any role in database. There is no use of this Primary Key in both application and in T-SQL. Besides, this Primary Key may not be used in Joins. It is a known fact that when there is JOIN on Primary Key or when Primary Key is used in the WHERE condition it usually gives better performance than non primary key columns. This argument is absolutely valid and one must make sure not to use such Identity Column. However, our example presents a different case. Here, although ProductID is Identity Column it uniquely defines the row and the same column will be used as foreign key in other tables. If a key is used in any other table as foreign key it is likely that it will be used in joins.

Quick Note on Other Kinds of Keys

The above paragraph evokes another question – what is a foreign key? A foreign key in a database table is a key from another table that refers to the primary key in the table being used. A primary key can be referred by multiple foreign keys from other tables. It is not required for a primary key to be the reference of any foreign keys. The interesting part is that a foreign key can refer back to the same table but to a different column. This kind of foreign key is known as “self-referencing foreign key”.

Summary

A table can have multiple Candidate Keys that are unique as single column or combined multiple columns to the table. They are all candidates for Primary Key. Candidate keys that follow all the three rules – 1) Not Null, 2) Unique Value in Table and 3) Static – are the best candidates for Primary Key. If there are multiple candidate keys that are satisfying the criteria for Primary Key, the decision should be made by experienced DBAs who should keep performance in mind.

Reference: Pinal Dave (http://blog.sqlauthority.com), DNS

About these ads

41 thoughts on “SQL SERVER – Difference Between Candidate Keys and Primary Key

  1. 2 things:

    First, I’m pretty sure a primary key may indeed consist of >1 column. Is there a reason not to allow this?

    Take for example a table which joins 2 other tables together, the 2 foreign keys may themselves join together to form a primary key. Also, sql allows this.

    Second, in my experience it’s common practice to have an identity field be the primary key. It may be that it’s role is purely as a substitute key in lieu of anything else that’s around, but it’s guaranteed unique and non-changing, which makes it ideal for both primary keys and referencing with foreign keys. In my experience, there’s usually very little in the way of business data that qualifies strongly enough as a primary key. Even social security numbers can change. Any generated data can only have a weak guarantee of uniqueness.

    I agree the primary keys should rarely, if ever, change. This is again is due to stored references, which may not be under the database control. Also, primary keys should not be re-used for that same reason.

    Like

  2. A point, perhaps: After marking a candidate as the PK, revisit the other candidates to determine if they are actually still required.

    Such as, if a SKU is used as the PK, the IDENTITY is not required anymore. It serves no purpose. It was there to identify, but the SKU takes care of that now. Regardless, the Name is still required, sothat COLUMN has to stay.

    Like

  3. @JimT : an identity is not guaranteed to be unique in and of itself. A unique constraint or a primary key constraint may enforce that, but the identity property on its own does not guarantee it.

    Like

  4. Database Design is very important to the success of the database systems, understanding and picking the right primary key is one of the key factors in DB Design.

    Like

  5. The 3 + 1 golden rules :
    1) Not Null,
    2) Unique Value in Table
    3) Static – are the best candidates for Primary Key.
    Added by JIM
    4) Primary key may indeed consist column > 1.

    #Primary key accepts only unique values and does not allow null values.
    #Unique key accepts unique values and allows null values.
    #Not null constraint allows duplicate values and doesnt allow null values.

    Impact on Not changing a Primary Key.
    – When a PK constraint or unique key constraint is
    created by default index is created on the columns with
    these constraints so by changing PK will impact the index.
    – If it is referenced by a FOREIGN KEY constraint in another table.
    Article was simply classic , very clear concept..grt work Pinalbhai

    Like

  6. As mentioned, you can indeed concatenate multiple columns as a primary key. I generally discourage this behavior with my projects mainly because you then have to carry all columns through to other tables when using that concatenated key as a Foreign Key, and also usually end up having to tolerate changes to one or more of those concatenated columns.

    Most often, I have seen concatenated keys used as a lazy way to enforce uniqueness amongst certain columns. In those situations, I usually demand that a unique key be set up (such as an Identity), and then create Unique Constraints on the columns that need to be unique (a unique constraint in SQL Server can be multi-column).

    As to updating keys, I really again try to discourage using any field that may need to be updated as a primary key. You end up having all kinds of relational issues if you have the need to update a key value that has enforced relationships with other tables. You also never know if the business requirements will change, with the result that what used to be a unique primary key now must have multiple records in the table with that value (Product Name or SKU for example). By breaking the key out to a value that is unique and generally unrelated to the item itself in the first place, you can avoid later issues when the requirements of the app change. :)

    Like

  7. @Garry, if you need multiple records for the PK, then the solution is to create a new table and join as needed. That’s the relational solution to the problem you are describing.

    If the app requirements change so drastically that one needs to change the PK, then change the structure of the db. Might as well make a wholesale change at that time, but until then, benefit from the use of logical keys useful for the problem domain instead of creating artificial, synthetic columns as PK.

    Like

  8. @Vaibhav
    >> why it is not a good practice to make Identity Column as Primary Key.

    Actually, it can be a good practice. There are plenty of DBA’s who would say “NEVER!!” and others who say “ALWAYS!” to identify columns as the primary key. Reality is that if you have a situation where the values in the primary key can change or if you have a table with sequentially increasing records, then an identity can make an excellent primary key.

    Two examples come to mind: First any time you’re dealing with people, the values that might make up the primary key are always mutable. Names not only are not unique but can change, state-issued identity numbers (Social Security in the US or ID cards elsewhere) can change or simply be entered wrong. In this case a separate unique identifier is a good thing.

    Second example is a history table. While a timestamp is the first, natural, option for a primary key, it may not be unique depending on the rate at which records are entered. In this case, an identity column as a primary key could be the right answer.

    Basically, the real answer is that the right primary key depends on your data and your usage, not on a rule that is simply applied without thinking.

    Steve G.

    Like

  9. Very good explanations above.

    I have one more query relating primary key with performance.

    How will performance get affected if I change-

    1. Primary key column to some other column which is a candidate key.

    2. Content of a primary key. i.e. changing only 1 value in primary key column.

    Like

  10. We are using for our table a dedicated column as primary key, with data type NUMERIC(15, 10). The decimal part used for uniquely identify the record the increment by 0.0000000001 and numeric part to identify the Various Client / Support DB.
    Is it a good practice to follow such logic. Which data type shall be ideal for Primary key.

    Like

  11. Hi,

    I am not able to get the concept regarding the following point that you have specified:

    Please note that many database experts suggest that it is not a good practice to make Identity Column as Primary Key. The reason behind this suggestion is that many times Identity Column that has been assigned as Primary Key does not play any role in database. There is no use of this Primary Key in both application and in T-SQL. Besides, this Primary Key may not be used in Joins. It is a known fact that when there is JOIN on Primary Key or when Primary Key is used in the WHERE condition it usually gives better performance than non primary key columns. This argument is absolutely valid and one must make sure not to use such Identity Column.

    Please explain it more clearly regarding the above point.

    Thanks,
    Santosh

    Like

    • Hi,

      I have to gracefully disagree and exactly for the points you make against using IDENTITY as PK.

      > There is no use of this Primary Key in both application and in T-SQL

      That is exactly why I always prefer adding “separate” IDENTITY column as PK. I don’t want to ever show PK values outside the DB. If I need to expose PK outside I make sure in the application code that it is only seen in the data acces layer.

      > Besides, this Primary Key may not be used in Joins.

      But does it matter? You can use other column in joins and build unique index to those. You just have to make sure you don’t kill performance with additional indexes, of course. If I use PK column in joins or in where-clause, I usually add non-clustered index to it also.

      Like

  12. @ Marko,

    I don’t think it is a great idea to create a clustered as well as nonclustered indexes on the same column. as the clustered index will be having a reference to all the columns in the table but the nonclustered index will only have a reference to the columns specified.

    The key point here is that whenever You update or delete a row (or column)

    all the index will be updated.
    It is redundant to have two indexes on a column.

    In Your case the indexes will take nearly twice time to update, so more time on locks on table.

    So a big performance hit.

    Like

    • I think it depends but I must say I’m not 100% sure about this.

      If both indexes, clustered and non-clustered are ordered in the same way then there’s probably no benefit from non-clustered index.

      But I’ve heard that in some cases having two indexes ordered in opposite ways will increase performance for select queries.

      Also if I use multiple fields in the join/where then I’ll add composite indexes to the table that may overlap with existing ones.

      But anyway, always measure your performance after you’ve done changes. For instance, I did some changes to a DB last week and I’m planning to spend part of next week to observe how those changes are affecting the whole software, not just that table.

      Like

    • @ Marko ,

      As it is being said “Besides, this Primary Key may not be used in Joins.”

      It is quite possible that other tables having foreign key referring the primarykey(let it be non Identity column which you have created separately and declared it as pk)

      So PK mostly be used as foreign key which can appear in joins also as well as in where clause to filter the records.

      so whats the point to create separate column as pk and not to use identity columns as pk?

      -Anil

      Like

  13. Hi
    i am regular reader of your blog i am currently working as BI
    and found one query form my client they want a employee id
    start from ‘00001’ and increment by 1 like 00001,00002,…….00011 and so on. i tried to generate a employee id with above pattern. but still have no luck please help me out of this problem

    Umesh

    Like

  14. hi sir
    i have created some tables using primary key and foreign key .when i was deleting records for that table which id can i taken . either primary key or foreign key id . which one is best one ? please tell me with soome examples

    Like

  15. Thank you for your insights into deciding on what best to use as a tables PK. Here are my 2 cents on the topic for whatever they’re worth. While I do understand the argument for using candidate keys as opposed to surrogate / generated keys, somewhere during my fairly lengthy time of designing databases I have become a die hard advocate of the integer based generated or INCREMENT approach. There are several reasons for this, number one on the list being that internally generated keys are wholly owned and therefore 100% in the control of the DBMS. Candidates are almost always generated by an outside source and can be changed by them as radically and as often as they wish. A prime example of this occured as a biproduct of the merger of 2 large manufacturers. The decision was made to create all new product ID’s to accomodate the newly formed company, and a wholesaler client of mine that kept a large inventory of one of the company’s products was forced to deal with obsolete PK’s on many millions of rows. Ultimately they decided to treat the outdated IDs as internal IDs and create a lookup table to cross-reference the new IDs, and wishing they had generated their own IDs from the begining. A couple of modified views and problem solved. I also find that IDs often have alpha content that holds some attachment or meaning to whoever chose them. Integers rarely have any meaning other than to uniquely identify something and so in my experience are far less likely to change than alpha-numerics. There are also no issues with single or double byte character sets, etc. as far as PKs and FKs are concerned. Of course if everyone used generated IDs there would never have been a debate over which candidate to use on your table and this discussion would never have taken place ;)

    Like

    • A table can have only one and only one primary key. however the primary key in a table can be composite with multiple attributes.
      The table can have as many as candidate keys, all other candidate keys except the primary key is called alternate keys.

      ur qstn is unambigious

      Like

  16. Hi,

    For a given table, if I count the keys (other then the primary key) which are having NOT NULL and UNIQUE, both the constraints, will that give me count of secondary keys for that table?

    If not, how to get count of secondary keys for a table by using SQL query?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s