SQL SERVER – Difference Between Candidate Keys and Primary Key

Introduction

Not long ago, I had an interesting and extended debate with one of my friends regarding which column should be primary key in a table. The debate instigated an in-depth discussion about candidate keys and primary keys. My present article revolves around the two types of keys.

Let us first try to grasp the definition of the two keys.

Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.

Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.

One needs to be very careful in selecting the Primary Key as an incorrect selection can adversely impact the database architect and future normalization. For a Candidate Key to qualify as a Primary Key, it should be Non-NULL and unique in any domain. I have observed quite often that Primary Keys are seldom changed. I would like to have your feedback on not changing a Primary Key.

An Example to Understand Keys

Let us look at an example where we have multiple Candidate Keys, from which we will select an appropriate Primary Key.

Given below is an example of a table having three columns that can qualify as single column Candidate Key, and on combining more than one column the number of possible Candidate Keys touches seven. A point to remember here is that only one column can be selected as Primary Key. The decision of Primary Key selection from possible combinations of Candidate Key is often very perplexing but very imperative!

On running the following script it will always give 504 rows in all the options. This proves that they are all unique in database and meet the criteria of a Primary Key.

Run the following script to verify if all the tables have unique values or not.

USE AdventureWorks
GO
SELECT *
FROM Production.Product
GO
SELECT DISTINCT ProductID
FROM Production.Product
GO
SELECT DISTINCT Name
FROM Production.Product
GO
SELECT DISTINCT ProductNumber
FROM Production.Product
GO

All of the above queries will return the same number of records; hence, they all qualify as Candidate Keys. In other words, they are the candidates for Primary Key. There are few points to consider while turning any Candidate Key into a Primary Key.

Select a key that does not contain NULL

It may be possible that there are Candidate Keys that presently do not contain value (not null) but technically they can contain null. In this case, they will not qualify for Primary Key. In the following table structure, we can see that even though column [name] does not have any NULL value it does not qualify as it has the potential to contain NULL value in future.

CREATE TABLE [Production].[Product](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[Name] [dbo].[Name] NULL,
[ProductNumber] [nvarchar](25) NOT NULL,
[Manufacturer] [nvarchar](25) NOT NULL
)

Select a key that is unique and does not repeat

It may be possible that Candidate Keys that are unique at this moment may contain duplicate value. These kinds of Candidate Keys do not qualify for Primary Key. Let us understand this scenario by looking into the example given above. It is absolutely possible that two Manufacturers can create products with the same name; the resulting name will be a duplicate and only the name of the Manufacturer will differ in the table. This disqualifies Name in the table to be a Primary Key.

Make sure that Primary Key does not keep changing

This is not a hard and fast rule but rather a general recommendation: Primary Key values should not keep changing. It is quite convenient for a database if Primary Key is static. Primary Keys are referenced in numerous places in the database, from Index to Foreign Keys.  If they keep changing then they can adversely affect database integrity, data statistics as well as internal of Indexes.

Selection of Primary Key

Let us examine our case by applying the above three rules to the table and decide on the appropriate candidate for Primary Key. Name can contain NULL so it disqualifies as per Rule 1 and Rule 2. Product Number can be duplicated for different Manufacturers so it disqualifies as per Rule 2. ProductID is Identity and Identity column cannot be modified. So, in this case ProductID qualifies as Primary Key.

Please note that many database experts suggest that it is not a good practice to make Identity Column as Primary Key. The reason behind this suggestion is that many times Identity Column that has been assigned as Primary Key does not play any role in database. There is no use of this Primary Key in both application and in T-SQL. Besides, this Primary Key may not be used in Joins. It is a known fact that when there is JOIN on Primary Key or when Primary Key is used in the WHERE condition it usually gives better performance than non primary key columns. This argument is absolutely valid and one must make sure not to use such Identity Column. However, our example presents a different case. Here, although ProductID is Identity Column it uniquely defines the row and the same column will be used as foreign key in other tables. If a key is used in any other table as foreign key it is likely that it will be used in joins.

Quick Note on Other Kinds of Keys

The above paragraph evokes another question – what is a foreign key? A foreign key in a database table is a key from another table that refers to the primary key in the table being used. A primary key can be referred by multiple foreign keys from other tables. It is not required for a primary key to be the reference of any foreign keys. The interesting part is that a foreign key can refer back to the same table but to a different column. This kind of foreign key is known as “self-referencing foreign key”.

Summary

A table can have multiple Candidate Keys that are unique as single column or combined multiple columns to the table. They are all candidates for Primary Key. Candidate keys that follow all the three rules – 1) Not Null, 2) Unique Value in Table and 3) Static – are the best candidates for Primary Key. If there are multiple candidate keys that are satisfying the criteria for Primary Key, the decision should be made by experienced DBAs who should keep performance in mind.

Reference: Pinal Dave (http://blog.sqlauthority.com), DNS

About these ads

SQL SERVER – Discussion – Effect of Missing Identity on System – Real World Scenario

About a week ago, SQL Server Expert, Imran Mohammed, provided a script, which will list all the missing identity values of a table in a database. In this post, I asked my readers if any could write a similar or better script. The results were interesting. While no one provided a new script, my question sparked a very active discussion that is still ongoing.

When providing the script, Imran asked me if I knew of any specific circumstances in which this kind of query could be useful, as he could not think of an instance where it would be necessary to find a missing identity. I was unable to think of a single reason for listing missing identities in a table. I posted Imran’s script on the assumption that someone would come up with an improved script, but as mentioned earlier, nobody did. Instead, we have been able to follow a very interesting discussion on subject of the need, if any, for listing Missing Identity values.

So, the question is this: “Do you know a real-world scenario where a Missing Identity value in any table can create problems?

I have already received some extremely interesting comments from many experts, and all have posed the above question in one form or another. At this moment, I am still trying to think of an example from my own experience, but have yet to find one. Imran has since come up with one good example. Here is what he and other experts have suggested so far.

Jacob Sebastian – IDENTITY values are not expected to be sequential and there are all chances of having missing identity values, the most common cause is transaction rollbacks.

Simon Worth – The identity column is basically just a random number – even though they come sequentially. A developer making an assumption that the next record inserted will have an identity that is 1 more than the last inserted record. And if this is the case – then there are flaws in the logic of the developer.

Jacob Sebastian – What if the value in my table is 1, 2, 3, 5, 6 etc where “4″ is missing from the sequence. So what is the importance of knowing whether a table has missing identity values or not?

Imran Mohammed – If you use Identity property as your most unique column and Transaction Identifier, then definitely you would want to know why few transaction did not completely, Is there any specific fashion these transaction fails (Can be found out looking at missing values of identity)… Could be helpful to debug.

Now it is your turn. Let us have your thoughts.

Reference : Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Puzzle – Write Script to Generate Primary Key and Foreign Key

In one of my recent projects, a large database migration project, I confronted a peculiar situation. SQL Server tables were already moved from Database_Old to Database_New. However, all the Primary Key and Foreign Keys were yet to be moved from the old server to the new server.

Please note that this puzzle is to be solved for SQL Server 2005 or SQL Server 2008. As noted by Kuldip it is possible to do this in SQL Server 2000.

In SQL Server Management Studio (SSMS), there is no option to script all the keys. If one is required to script keys they will have to manually script each key one at a time. If database has many tables, generating one key at a time can be a very intricate task. I want to throw a question to all of you if any of you have script for the same purpose.

As per my opinion, I think the challenge is to get orders of the column included in Primary Key as well on the filegroup they exist.

Please note here that I am not looking for names of Primary Key or Foreign Key of database. I have already written an article for the same here : SQL SERVER – Two Methods to Retrieve List of Primary Keys and Foreign Keys of Database . I am looking for T-SQL script that generates Primary Key from the existing table for all tables in database. There are already a couple of answers on my post here from two SQL Experts; you can refer to those for example here.

Following is an example of T-SQL that we need to generate. I am looking for script that will generate T-SQL script for all the Primary Key and Foreign Key for the entire database.

ALTER TABLE [Person].[Address] ADD  CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED
(
[AddressID] ASC
) ON [PRIMARY]
GO

If you have a solution for the same, please post here or email me at pinal ‘at’ sqlauthority.com and I will post on this blog with due credit. Again, please spread the word and help community become stronger by your active participation.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Two Methods to Retrieve List of Primary Keys and Foreign Keys of Database

There are two different methods of retrieving the list of Primary Keys and Foreign Keys from database.

Method 1: INFORMATION_SCHEMA

SELECT
DISTINCT
Constraint_Name AS [Constraint],
Table_Schema AS [Schema],
Table_Name AS [TableName]
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
GO

Method 2: sys.objects

SELECT OBJECT_NAME(OBJECT_ID) AS NameofConstraint,
SCHEMA_NAME(schema_id) AS SchemaName,
OBJECT_NAME(parent_object_id) AS TableName,
type_desc AS ConstraintType
FROM sys.objects
WHERE type_desc IN ('FOREIGN_KEY_CONSTRAINT','PRIMARY_KEY_CONSTRAINT')
GO

I am often asked about my preferred method of retrieving list of Primary Keys and Foreign Keys from database. I have a standard answer. I prefer method 3, which is querying sys database. The reason is very simple. sys. schema always provides more information and all the data can be retrieved in our preferred fashion with the preferred filter.

Let us look at the example we have on our hand. When Information Schema is used, we will not be able to discern between primary key and foreign key; we will have both the keys together. In the case of sys schema, we can query the data in our preferred way and can join this table to another table, which can retrieve additional data from the same.

Let us play a small puzzle here. Try to modify both the scripts in such a way that we are able to see the original definition of the key, that is, create a statement for this primary key and foreign key.

If I get an appropriate answer from my readers, I will publish the solution on this blog with due credit.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Clustered Index on Separate Drive From Table Location

How to improve performance of SQL Server Queries is a common topic of discussion among many of us. Much has been said, much has been discussed. Few days back, I had an interesting discussion with one of the Junior developers regarding performance improvement of SQL Server Queries. We discussed on how by using a separate hard drive for several database objects can right away improve performance. I suggested him that non clustered index and tempdb can be created on a separate disk to improve performance.

No sooner had I given my suggestion than I received a question – What will happen if we can create clustered index on a separate drive from the table on which it is built.

My answer is : No! It is not possible at all.

Let us first be clear about the difference between a clustered and a non clustered index.

Clustered Index

  • Only 1 allowed per table
  • Physically rearranges data in the table to conform to the index constraints
  • For use on columns that are frequently searched for ranges of data
  • For use on columns with low selectivity

Non-Clustered Index

  • Up to 249 (for SQL Server 2005) and 999 (for SQL Server 2008) allowed per table
  • Creates a separate list of key values with pointers to the location of the data in the data pages
  • For use on columns that are searched for single values
  • For use on columns with high selectivity

A table devoid of primary key index is called heap, and here data is not arranged in a particular order, which gives rise to issues that adversely affect performance. Data must be stored in some kind of order. If we put clustered index on it then the order will be forced by that index and the data will be stored in that particular order.

Reference : Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Difference Between Candidate Keys and Primary Key

Let us first try to grasp the definition of the two keys.

Candidate Key – A Candidate Key can be any column or a combination of columns that can qualify as unique key in database. There can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary Key.

Primary Key – A Primary Key is a column or a combination of columns that uniquely identify a record. Only one Candidate Key can be Primary Key.

One needs to be very careful in selecting the Primary Key as an incorrect selection can adversely impact the database architect and future normalization. For a Candidate Key to qualify as a Primary Key, it should be Non-NULL and unique in any domain. I have observed quite often that Primary Keys are seldom changed. I would like to have your feedback on not changing a Primary Key.

I have illustrates the difference between a candidate key and a primary key in SQL Server.

1 Introduction
2 An Example to Understand Keys
2.1 Select a key that does not contain NULL
2.2 Select a key that is unique and does not repeat
2.3 Make sure that Primary Key does not keep changing
3 Selection of Primary Key
4 Quick Note on Other Kinds of Keys
5 Summary

Read Complete Article Here

Reference : Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – How to Drop Primary Key Contraint

One area that always, unfailingly pulls my interest is SQL Server Errors and their solution. I enjoy the challenging task of passing through the maze of error to find a way out with a perfect solution. However, when I received the following error from one of my regular readers, I was a little stumped at first! After some online probing, I figured out that it was actually syntax from MySql and not SQL Server. The reader encountered error when he ran the following query.

ALTER TABLE Table1
DROP PRIMARY KEY
GO

Msg 156, Level 15, State 1, Line 3
Incorrect syntax near the keyword ‘PRIMARY’.

As mentioned earlier, this syntax is for MySql, not SQL Server. If you want to drop primary key constraint in SQL Server, run the following query.

ALTER TABLE Table1
DROP CONSTRAINT PK_Table1_Col1
GO

Let us now pursue the complete example. First, we will create a table that has primary key. Next, we will drop the primary key successfully using the correct syntax of SQL Server.

CREATE TABLE Table1(
Col1 INT NOT NULL,
Col2 VARCHAR(100)
CONSTRAINT PK_Table1_Col1 PRIMARY KEY CLUSTERED (
Col1 ASC)
)
GO

/* For SQL Server/Oracle/MS ACCESS */
ALTER TABLE Table1
DROP CONSTRAINT PK_Table1_Col1
GO

/* For MySql */
ALTER TABLE Table1
DROP PRIMARY KEY
GO

I hope this example lucidly explains how to drop primary key. This, no doubt, is a very simple and basic explanation, but when I chanced upon the error message it aroused curiosity in me.  As you all know by now I love sharing new issues and ideas with my readers. So I have included this interesting error in my blog.

Let me have your feedback on this post and also, do feel free to share with me your ideas as well!

Reference : Pinal Dave (http://blog.SQLAuthority.com)