SQL SERVER – Primary Key and NonClustered Index in Simple Words

I have been writing a weekly round up from my blog where I go over last six years of blog posts and pick the best posts from the pasts. While I do this, there are two major place where I focus 1) If there are change in features – I re-blog about it with additional details or 2) If I have not provided complete information six years ago, I try to fill up the gap now. Well, just like everything my knowledge and writing skills have evolved. Before continuing please read my latest memory lane blog post where in 2007 I wrote scripts for Primary Key and Unique Key.

November 2006 was when I started to learn more about SQL and have been only 4 months in the product, I was still exploring various subjects. I wrote a blog post describing about how Primary Key and Unique Key are different. Everything which I wrote there is correct, however, there are a few more details one should learn when it is about Primary Key and Clustered Index.

Here is the common misconception prevailing in the industry.

Primary Key has to be Clustered Index. 

In reality the statement should be corrected as follows:

Primary Key can be Clustered or Non-clustered but it is a common best practice to create a Primary Key as Clustered Index. 

Well, now we have corrected the statement let us understand a bit more in detail. Primary Key should be uniquely identifying column of the table and it should be NOT NULL. A good (most of the time) candidate of the clustered index key also uniquely identifies column and NOT NULL (most of the time). Well, that means it is a good idea to create a Primary Key Clustered so it solve both the problems together. Keeping these facts in mind SQL Server automatically creates Clustered Index on the Primary Key when the table is created. Developers often do not specify which column should have clustered index so by default Primary Key becomes Clustered Index. This practice is now extremely common and lots of people have forgotten that Primary Key and Clustered Index is two different things. They can be same column but they do not have to be.

Well here are four examples we will see where we will learn the behavior of SQL Server when it is about Primary Key and Clustered Index.

  • Scenario 1 : Primary Key will default to Clustered Index
  • Scenario 2: Primary Key is defined as a Non-clustered Index
  • Scenario 3: Primary Key defaults to Non-Clustered Index with another column defined as a Clustered Index
  • Scenario 4: Primary Key defaults to Clustered Index with other index defaults to Non-clustered index

Now let us see each of the scenarios in detail.

Scenario 1 : Primary Key will default to Clustered Index

In this case we will create only Primary Key and when we check the kind of index created on the table we will notice that it has created clustered index automatically over it.

-- Case 1 Primary Key Defaults to Clustered Index
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY,
Col1 INT NOT NULL)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc]
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO

Scenario 2: Primary Key is defined as a Non-clustered Index

In this case we will explicitly defined Primary Key as a non-clustered index and it will create it as a non-clustered index. It proves that Primary Key can be non-clustered index.

-- Case 2 Primary Key Non-clustered Index
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY NONCLUSTERED,
Col1 INT NOT NULL)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc]
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO

Scenario 3: Primary Key defaults to Non-Clustered Index with another column defined as a Clustered Index

In this case we will create clustered index on another column, SQL Server will automatically create a Primary Key as a non-clustered index as clustered index is specified on another column.

-- Case 3 Primary Key Defaults to Non-clustered Index
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY,
Col1 INT NOT NULL UNIQUE CLUSTERED)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc]
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO

Scenario 4: Primary Key defaults to Clustered Index with other index defaults to Non-clustered index

In this case we will create two indexes on the both the tables but we will not specify the type of the index on the columns. When we check the results we will notice that Primary Key is automatically defaulted to Clustered Index and another column as a Non-clustered index.

-- Case 4 Primary Key and Defaults
USE TempDB
GO
-- Create table
CREATE TABLE TestTable
(ID INT NOT NULL PRIMARY KEY,
Col1 INT NOT NULL UNIQUE)
GO
-- Check Indexes
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
[name] IndexName,
[Type_Desc]
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'TestTable'
GO
-- Clean up
DROP TABLE TestTable
GO

I think above examples clarifies if there are any confused related to Primary and Clustered Index.

Now here is the question I often get asked what can be the reason for creating Primary Key and Clustered Index Key on different columns. Well, there are many scenarios when this can be true. It is possible that you have column SSN which you want to create as a Primary Key but do not want make it as a clustered index key because you have uniquely increasing identity column which best suites your need for that table (again this is just an example – you can argue exactly the opposite way as well). You are welcome to continue the discussion on this subject in the comments field or a dedicated blog post I wrote about it years ago over here. There are few really good comments there – I think that blog post is now a gold mine to understand this concept.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

About these ads

SQL SERVER – Shard No More – An Innovative Look at Distributed Peer-to-peer SQL Database

There is no doubt that SQL databases play an important role in modern applications. In an ideal world, a single database can handle hundreds of incoming connections from multiple clients and scale to accommodate the related transactions. However the world is not ideal and databases are often a cause of major headaches when applications need to scale to accommodate more connections, transactions, or both.

In order to overcome scaling issues, application developers often resort to administrative acrobatics, also known as database sharding. Sharding helps to improve application performance and throughput by splitting the database into two or more shards. Unfortunately, this practice also requires application developers to code transactional consistency into their applications. Getting transactional consistency across multiple SQL database shards can prove to be very difficult.

Sharding requires developers to think about things like rollbacks, constraints, and referential integrity across tables within their applications when these types of concerns are best handled by the database. It also makes other common operations such as joins, searches, and memory management very difficult. In short, the very solution implemented to overcome throughput issues becomes a bottleneck in and of itself.

What if database sharding was no longer required to scale your application? Let me explain. For the past several months I have been following and writing about NuoDB, a hot new SQL database technology out of Cambridge, MA. NuoDB is officially out of beta and they have recently released their first release candidate so I decided to dig into the database in a little more detail. Their architecture is very interesting and exciting because it completely eliminates the need to shard a database to achieve higher throughput.

Each NuoDB database consists of at least three or more processes that enable a single database to run across multiple hosts. These processes include a Broker, a Transaction Engine and a Storage Manager.  Brokers are responsible for connecting client applications to Transaction Engines and maintain a global view of the network to keep track of the multiple Transaction Engines available at any time. Transaction Engines are in-memory processes that client applications connect to for processing SQL transactions. Storage Managers are responsible for persisting data to disk and serving up records to the Transaction Managers if they don’t exist in memory.

The secret to NuoDB’s approach to solving the sharding problem is that it is a truly distributed, peer-to-peer, SQL database. Each of its processes can be deployed across multiple hosts. When client applications need to connect to a Transaction Engine, the Broker will automatically route the request to the most available process. Since multiple Transaction Engines and Storage Managers running across multiple host machines represent a single logical database, you never have to resort to sharding to get the throughput your application requires.

NuoDB is a new pioneer in the SQL database world. They are making database scalability simple by eliminating the need for acrobatics such as sharding, and they are also making general administration of the database simpler as well.  Their distributed database appears to you as a user like a single SQL Server database.  With their RC1 release they have also provided a web based administrative console that they call NuoConsole. This tool makes it extremely easy to deploy and manage NuoDB processes across one or multiple hosts with the click of a mouse button. See for yourself by downloading NuoDB here.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

 

SQL SERVER – Sends backups to a Network Folder, FTP Server, Dropbox, Google Drive or Amazon S3

Let me tell you about one of the most useful SQL tools that every DBA should use – it is SQLBackupAndFTP. I have been using this tool since 2009 – and it is the first program I install on a SQL server. Download a free version, 1 minute configuration and your daily backups are safe in the cloud.
In summary, SQLBackupAndFTP

  • Creates SQL Server database and file backups on schedule
  • Compresses and encrypts the backups
  • Sends backups to a network folder, FTP Server, Dropbox, Google Drive or Amazon S3
  • Sends email notifications of job’s success or failure

SQLBackupAndFTP comes in Free and Paid versions (starting from $29) – see version comparison. Free version is fully functional for unlimited ad hoc backups or for scheduled backups of up to two databases – it will be sufficient for many small customers.

What has impressed me from the beginning – is that I understood how it works and was able to configure the job from a single form (see Image 1 – Main form above)

  1. Connect to you SQL server and select databases to be backed up
  2. Click “Add backup destination” to configure where backups should go to (network, FTP Server, Dropbox, Google Drive or Amazon S3)
  3. Enter your email to receive email confirmations
  4. Set the time to start daily full backups (or go to Settings if you need Differential or  Transaction Log backups on a flexible schedule)
  5. Press “Run Now” button to test

You can get to this form if you click “Settings” buttons in the “Schedule section”. Select what types of backups and how often you want to run them and you will see the scheduled backups in the “Estimated backup plan” list

A detailed tutorial is available on the developer’s website.

Along with SQLBackupAndFTP setup gives you the option to install “One-Click SQL Restore” (you can install it stand-alone too) – a basic tool for restoring just Full backups.

However basic, you can drag-and-drop on it the zip file created by SQLBackupAndFTP, it unzips the BAK file if necessary, connects to the SQL server on the start, selects the right database, it is smart enough to restart the server to drop open connections if necessary – very handy for developers who need to restore databases often.

You may ask why is this tool is better than maintenance tasks available in SQL Server? While maintenance tasks are easy to set up, SQLBackupAndFTP is still way easier and integrates solution for compression, encryption, FTP, cloud storage and email which make it superior to maintenance tasks in every aspect.

On a flip side SQLBackupAndFTP is not the fanciest tool to manage backups or check their health. It only works reliably on local SQL Server instances. In other words it has to be installed on the SQL server itself. For remote servers it uses scripting which is less reliable. This limitations is actually inherent in SQL server itself as BACKUP DATABASE command  creates backup not on the client, but on the server itself.

This tool is compatible with almost all the known SQL Server versions. It works with SQL Server 2008 (all versions) and many of the previous versions. It is especially useful for SQL Server Express 2005 and SQL Server Express 2008, as they lack built in tools for backup.

I strongly recommend this tool to all the DBAs. They must absolutely try it as it is free and does exactly what it promises. You can download your free copy of the tool from here.

Please share your experience about using this tool. I am eager to receive your feedback regarding this article.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Find Column Used in Stored Procedure – Search Stored Procedure for Column Name

Place: Any Developer Shop

Scenario: A developer wants to drop a column from a table

Time: Any Day – usually right before developer wants to go home

The developer rushes to the manager and following conversation begins:

Developer: I want to drop  a column from one of the tables.

Manager: Sure, just document it where all the places it is used in our application and come back to me.

Developer: We only use stored procedures.

Manager: Sure, then documented how many stored procedures are there which are using your column and justify the modification. I will approve it once I see the documentation.

Developer back to the desk looking at hundreds of stored procedures in SSMS thinking how to find which stored procedure may be using his column. Suddenly he remembers a bookmark which he has saved earlier which had T-SQL Script to do so. Here quickly opened it and run the code.

SELECT obj.Name SPName, sc.TEXT SPText
FROM sys.syscomments sc
INNER JOIN sys.objects obj ON sc.Id = obj.OBJECT_ID
WHERE sc.TEXT LIKE '%' + 'Name Your Column Here' + '%'
AND TYPE = 'P'

Above T-SQL Script will search in the stored procedure text and return the name of the stored procedure if it will find the value specified in the WHERE condition. He was happy with his discovery and immediately created the list of the stored procedures and next action items as asked by the manager. He sent the list to the manager right after 10 minutes of his discussion with the manager. He rushed to manager to office to inform his promptness and realized that the manager had left for the day just few moments before.

Moral of the story: Work life balanced can be maintained if we work smart!


Let us see above T-SQL Script in action. Let us assume that in AdventureWorks2012 database we want to find the BusinessEntityID column in all the stored procedure. We can use run following T-SQL code in SSMS Query Editor and find the name of all the stored procedure.

USE AdventureWorks2012
GO
SELECT obj.Name SPName, sc.TEXT SPText
FROM sys.syscomments sc
INNER JOIN sys.objects obj ON sc.Id = obj.OBJECT_ID
WHERE sc.TEXT LIKE '%' + 'BusinessEntityID' + '%'
AND TYPE = 'P'
GO

Above T-SQL script will give results containing the name of the stored procedure and stored procedure text along with it.

While we are discussing this subject here are a couple of other additional related blog post which may interesting.

A question to you: Is there any better way to find column used in a stored procedure? Please leave a comment with your solution. I will post the same in this blog with due credit.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – A Puzzle – Swap Value of Column Without Case Statement

For the last few weeks, I have been doing Friday Puzzles and I am really loving it. Yesterday I received a very interesting question by Navneet Chaurasia on Facebook Page. He was asked this question in one of the interview questions for job. Please read the original thread for a complete idea of the conversation. I am presenting the same question here.

Puzzle

Let us assume there is a single column in the table called Gender. The challenge is to write a single update statement which will flip or swap the value in the column. For example if the value in the gender column is ‘male’ swap it with ‘female’ and if the value is ‘female’ swap it with ‘male’.

Here is the quick setup script for the puzzle.

USE tempdb
GO
CREATE TABLE SimpleTable (ID INT, Gender VARCHAR(10))
GO
INSERT INTO SimpleTable (ID, Gender)
SELECT 1, 'female'
UNION ALL
SELECT 2, 'male'
UNION ALL
SELECT 3, 'male'
GO
SELECT *
FROM SimpleTable
GO

The above query will return following result set.

The puzzle was to write a single update column which will generate following result set.

There are multiple answers to this simple puzzle. Let me show you three different ways. I am assuming that the column will have either value ‘male’ or ‘female’ only.

Method 1: Using CASE Statement

I believe this is going to be the most popular solution as we are all familiar with CASE Statement.

UPDATE SimpleTable
SET Gender = CASE Gender WHEN 'male' THEN 'female' ELSE 'male' END
GO
SELECT *
FROM SimpleTable
GO

Method 2: Using REPLACE  Function

I totally understand it is the not cleanest solution but it will for sure work in giving situation.

UPDATE SimpleTable
SET Gender = REPLACE(('fe'+Gender),'fefe','')
GO
SELECT *
FROM SimpleTable
GO

Method 3: Using IIF in SQL Server 2012

If you are using SQL Server 2012 you can use IIF and get the same effect as CASE statement.

UPDATE SimpleTable
SET Gender = IIF(Gender = 'male', 'female', 'male')
GO
SELECT *
FROM SimpleTable
GO

You can read my article series on SQL Server 2012 various functions over here.

Let us clean up.

DROP TABLE SimpleTable
GO

Question to you:

I came up with three simple tricks where there is a single UPDATE statement which swaps the values in the column. Do you know any other simple trick? If yes, please post here in the comments. I will pick two random winners from all the valid answers. Winners will get 1) Print Copy of SQL Server Interview Questions and Answers 2) Free Learning Code for Online Video Courses

I will announce the winners on coming Monday.

Reference:  Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – SmallDateTime and Precision – A Continuous Confusion

Some kinds of confusion never go away. Here is one of the ancient confusing things in SQL. The precision of the SmallDateTime is one concept that confuses a lot of people, proven by the many messages I receive everyday relating to this subject.

Let me start with the question: What is the precision of the SMALLDATETIME datatypes?

What is your answer? Write it down on your notepad.

Now if you do not want to continue reading the blog post, head to my previous blog post over here: SQL SERVER – Precision of SMALLDATETIME.

A Social Media Question

Since the increase of social media conversations, I noticed that the amount of the comments I receive on this blog is a bit staggering. I receive lots of questions on facebooktwitter or Google+. One of the very interesting questions yesterday was asked on Facebook by Raghavendra. I am re-organizing his script and asking all of the questions he has asked me. Let us see if we could help him with his question:

CREATE TABLE #temp
(name VARCHAR(100),registered smalldatetime)
GO
DECLARE @test smalldatetime
SET @test=GETDATE()
INSERT INTO #temp
VALUES ('Value1',@test)
INSERT INTO #temp
VALUES ('Value2',@test)
GO
SELECT *
FROM #temp
ORDER BY registered DESC
GO
DROP TABLE #temp
GO

Now when the above script is ran, we will get the following result:

Well, the expectation of the query was to have the following result. The row which was inserted last was expected to return as first row in result set as the ORDER BY descending.

Side note: Because the requirement is to get the latest data, we can’t use any  column other than smalldatetime column in order by. If we use name column in the order by, we will get an incorrect result as it can be any name.

My Initial Reaction

My initial reaction was as follows:

1) DataType DateTime2: If file precision of the column is expected from the column which store date and time, it should not be smalldatetime. The precision of the column smalldatetime is One Minute (Read Here) for finer precision use DateTime or DateTime2 data type.

Here is the code which includes above suggestion:

CREATE TABLE #temp
(name VARCHAR(100), registered datetime2)
GO
DECLARE @test datetime2
SET @test=GETDATE()
INSERT INTO #temp
VALUES ('Value1',@test)
INSERT INTO #temp
VALUES ('Value2',@test)
GO
SELECT *
FROM #temp
ORDER BY registered DESC
GO
DROP TABLE #temp
GO

2) Tie Breaker Identity: There are always possibilities that two rows were inserted at the same time. In that case, you may need a tie breaker. If you have an increasing identity column, you can use that as a tie breaker as well.


CREATE TABLE #temp
(ID INT IDENTITY(1,1), name VARCHAR(100),registered datetime2)
GO
DECLARE @test datetime2
SET @test=GETDATE()
INSERT INTO #temp
VALUES ('Value1',@test)
INSERT INTO #temp
VALUES ('Value2',@test)
GO
SELECT *
FROM #temp
ORDER BY ID DESC
GO
DROP TABLE #temp
GO

Those two were the quick suggestions I provided. It is not necessary that you should use both advices. It is possible that one can use only DATETIME datatype or Identity column can have datatype of BIGINT or have another tie breaker.

An Alternate NO Solution

In the facebook thread this was also discussed as one of the solutions:

CREATE TABLE #temp
(name VARCHAR(100),registered smalldatetime)
GO
DECLARE @test smalldatetime
SET @test=GETDATE()
INSERT INTO #temp
VALUES ('Value1',@test)
INSERT INTO #temp
VALUES ('Value2',@test)
GO
SELECT name, registered,
ROW_NUMBER() OVER(ORDER BY registered DESC) AS "Row Number"
FROM #temp ORDER BY 3 DESC
GO
DROP TABLE #temp
GO

However, I believe it is not the solution and can be further misleading if used in a production server. Here is the example of why it is not a good solution:

CREATE TABLE #temp
(name VARCHAR(100) NOT NULL,registered smalldatetime)
GO
DECLARE @test smalldatetime
SET @test=GETDATE()
INSERT INTO #temp
VALUES ('Value1',@test)
INSERT INTO #temp
VALUES ('Value2',@test)
GO
-- Before Index
SELECT name, registered,
ROW_NUMBER() OVER(ORDER BY registered DESC) AS "Row Number"
FROM #temp ORDER BY 3 DESC
GO
-- Create Index
ALTER TABLE #temp
ADD CONSTRAINT [PK_#temp] PRIMARY KEY CLUSTERED
(name DESC)
GO
-- After Index
SELECT name, registered,
ROW_NUMBER() OVER(ORDER BY registered DESC) AS "Row Number"
FROM #temp ORDER BY 3 DESC
GO
DROP TABLE #temp
GO

Now let us examine the resultset. You will notice that an index which is created on the base table which is (indeed) schema change the table but can affect the resultset. As you can see, an index can change the resultset, so this method is not yet perfect to get the latest inserted resultset.

No Schema Change Requirement

After giving these two suggestions, I was waiting for the feedback of the asker. However, the requirement of the asker was there can’t be any schema change because the application was used by many other applications. I validated again, and of course, the requirement is no schema change at all. No addition of the column of change of datatypes of any other columns. There is no further help as well.

This is indeed an interesting question. I personally can’t think of any solution which I could provide him given the requirement of no schema change. Can you think of any other solution to this?

Need of Database Designer

This question once again brings up another ancient question:  “Do we need a database designer?” I often come across databases which are facing major performance problems or have redundant data. Normalization is often ignored when a database is built fast under a very tight deadline. Often I come across a database which has table with unnecessary columns and performance problems. While working as Developer Lead in my earlier jobs, I have seen developers adding columns to tables without anybody’s consent and retrieving them as SELECT *.  There is a lot to discuss on this subject in detail, but for now, let’s discuss the question first. Do you have any suggestions for the above question?

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQLAuthority News – Select the Best SQL in Sixty Seconds Episode – Help us Improve

It has been more than 3 months since we have started experimenting with a new concept in  SQL in Sixty Seconds. Every Wednesday, we putt a fresh new interesting concept out via video. Rick MorelanVinod Kumar and myself – the three of us decided to do something new and something exciting. We decided to create a short video which will consider the attention span of the viewer, keep them focused and help them learn something new through our teaching.

We all liked the idea of SQL in Sixty Seconds. As the name suggests, you will not watch content for more than a minute. We did our best to keep the video down to single minute, well, in most of the cases. There are cases when we could not accommodate the content in sixty seconds so the video had to roll for over few more seconds. In the end, we try to come up with an extremely short video which is able to teach some learning on SQL Server. So far we have 10 videos out for the SQL in Sixty Seconds series.

SQL in Sixty Seconds

Here are the details to our SQL in Sixty Seconds series. It has a total of 10 videos and a total cumulative time of around 15 minutes for all the videos.

Link to watch all the Episodes

Episode 01 – Subquery to CTE
Episode 02 – T-SQL Errors and Reactions
Episode 03 – T-SQL Constructs – Declaration and Initialization
Episode 04 – Function: Is Leap Year
Episode 05 – T-SQL Errors and Reactions Demo
Episode 06 – SQL Server Performance: Indexing Basics
Episode 07 – Table Variables and Transactions
Episode 08 – Using MAXDOP 1 for Single Processor Query
Episode 09 – T-SQL Constructs – += and *=
Episode 10 – Installing AdventureWorks Sample Database

We need your help

We seriously need your help by means of rating our videos. All you have to do is to answer these three simple questions:

1. Which is your most favorite SQL in Sixty Seconds Episode? and Why?
2. Which is your least favorite SQL in Sixty Seconds Episode? and Why?
3. What next you would like to see in SQL in Sixty Seconds?

Just leave your answers in comment area.

Selection of Winner

On May 1st, we will go over the complete list. Rick, Vinod and I have also picked our most favorite episode and least favorite episode. You may call it a co-incidence or an accident, but our choice was exactly the same. Yes, three of us have selected the same video for best and least favorite videos. We will select the winner who has exactly the same selection as our selection. In case of a tie, we will look at the third question and as well as your answers to the “Why” follow-up question asked in the first and second questions.

Prizes

We have decided to have more than one prize. This time, we’re giving away three prizes for the winners:

If you stay in a country where we cannot ship these books, we will send you the appropriate Pluralsight FREE learning code.

Reference: Pinal Dave (http://blog.sqlauthority.com)