SQL SERVER – An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3 – Part 1

Index never stops amazing me, there are so much to learn about Index that I never feel that there is enough knowledge out about this subject. If you are interested you can watch my Indexing Course on Pluralsight for further learning on this subject. On my production server I personally use embarcadero DB Optimizer for all performance tuning and routine health check up. I will be interested to know what is your feedback about the product.

Instead of going on the theory overload – let us start with this blog post as a conversation between two individuals – Jon and Mike. These are just random names. Jon is senior and experienced SQL Server Expert and Mike is beginner with SQL Server.

Mike – What is Redundant Index?

Jon – Indexes are redundant when they have similar columns as a part of a definition. Additionally, the indexes are considered redundant when their first few columns are in the same position with same order by direction are also considered as a redundant.

Mike – Would you please explain it with examples?

Jon – Sure, Let us assume we have two indexes:

Index 1: Col1, Col2, Col3
Index 2: Col1, Col2, Col3

Now if you look at them – they have the same columns as a part of their definition, so they are indeed redundant indexes. However, look at the following scenario:

Index 3: Col1, Col2
Index 4: Col1, Col2, Col3

In this case they are also considered as a redundant because the position of the Col1, Col2 are same in both of the index. It is quite commonly considered that if the initial positions of the columns are the same, they are redundant.

However, there is one more concept here to be looked at as well before we make certain about their redundancy. Look at the following indexes:

Index 5: Col1 ASC, Col2 DESC
Index 6: Col1 DESC, Col2 DESC, Col3 ASC

In this case if the initial positions are the same, they are not redundant as the order of the column is not the same.

There are lot more to discuss but this is just to give you an initial idea. There is one more concept we should consider before calling any index redundant is Included Index. Here is the simple scenario for it.

Index 7: Col1 ASC Included (col2)
Index 8: Col1 ASC Included (col3)

You can notice they have same initial column but the Included columns are totally different.

Mike – Thanks, I got it. It seems that Redundant Indexes are not good and they should be dropped correct.

In case of Index 1 and Index 2 I think we should drop either of the one.

In case of Index 3 and Index 4 I believe Index 4 has more columns and covering, so we should keep it and drop the other one.

In case of Index 5 and Index 6, they are both different indexes so we should keep both.

In case of Index 7 and Index 8, again they are both different index in this case. They can be redundant if the included columns are overlapping to each other.

Am I correct to say this?

Jon – Very good analysis. You are very close to the understanding. Generally, redundant indexes are not good and they should be absolutely addressed. In most cases, redundant cases should be dropped. 

Mike – Ahha, so in the most cases indexes should be dropped. Ok, so is there any script or guidance to detect redundant indexes for the most cases.

Jon – Sure, here is the script which does that – however, this query just addresses the scenario of the Index 3 and Index 4. It does not talk about Included Columns or Index Order (ASC or DESC). Just use that for a start but do your analysis on this subject before you drop your indexes. You still have to check for order of the index and included columns as well.

Mike – Perfect, I understand that the script is just for a quick start and not the complete solution. Now you mentioned “In Most Cases” – what are the special cases. What are the cases when an Index which absolutely qualify for the redundant index but should not be dropped. Would you please explain the special cases?

Jon – Absolutely – there are always special cases. For example the width of the column matters.

Mike – Okey I would love to learn more about this – would you please explain.

Jon – Absolutely – I have a working example of it – Checkout Monday’s blog post. I will explain you in detail.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – Identify Numbers of Non Clustered Index on Tables for Entire Database

Here is the script which will give you numbers of non clustered indexes on any table in entire database.

SELECT COUNT(i.TYPE) NoOfIndex,
[schema_name] = s.name, table_name = o.name
FROM sys.indexes i
INNER JOIN sys.objects o ON i.[object_id] = o.[object_id]
INNER JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
WHERE o.TYPE IN ('U')
AND
i.TYPE = 2
GROUP BY s.name, o.name
ORDER BY schema_name, table_name

Here is the small story behind why this script was needed.

I recently went to meet my friend in his office and he introduced me to his colleague in office as someone who is an expert in SQL Server Indexing. I politely said I am yet learning about Indexing and have a long way to go. My friend’s colleague right away said – he had a suggestion for me with related to Index. According to him he was looking for a script which will count all the non clustered on all the tables in the database and he was not able to find that on SQLAuthority.com.

I was a bit surprised as I really do not remember all the details about what I have written so far. I quickly pull up my phone and tried to look for the script on my custom search engine and he was correct. I never wrote a script which will count all the non clustered indexes on tables in the whole database. Excessive indexing is not recommended in general. If you have too many indexes it will definitely negatively affect your performance. The above query will quickly give you details of numbers of indexes on tables on your entire database. You can quickly glance and use the numbers as reference.

Please note that the number of the index is not a indication of bad indexes. There is a lot of wisdom I can write here but that is not the scope of this blog post. There are many different rules with Indexes and many different scenarios. For example – a table which is heap (no clustered index) is often not recommended on OLTP workload (here is the blog post to identify them), drop unused indexes with careful observation (here is the script for it), identify missing indexes and after careful testing add them (here is the script for it). Even though I have given few links here it is just the tip of the iceberg. If you follow only above four advices your ship may still sink. Those who wants to learn the subject in depth can watch the videos here after logging in.

Note: You change where condition type to 6 for nonclustered column store index.

Click to Download Scripts

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Guest Post by Sandip Pani – SQL Server Statistics Name and Index Creation

Sometimes something very small or a common error which we observe in daily life teaches us new things. SQL Server Expert Sandip Pani (winner of Joes 2 Pros Contests) has come across similar experience. Sandip has written a guest post on an error he faced in his daily work. Sandip is working for QSI Healthcare as an Associate Technical Specialist and have more than 5 years of total experience. He blogs at SQLcommitted.com and contribute in various forums. His social media hands are LinkedIn, Facebook and Twitter.


Once I faced following error when I was working on performance tuning project and attempt to create an Index.

Mug 1913, Level 16, State 1, Line 1
The operation failed because an index or statistics with name ‘Ix_Table1_1′ already exists on table ‘Table1′.

The immediate reaction to the error was that I might have created that index earlier and when I researched it further I found the same as the index was indeed created two times. This totally makes sense. This can happen due to many reasons for example if the user is careless and executes the same code two times as well, when he attempts to create index without checking if there was index already on the object. However when I paid attention to the details of the error, I realize that error message also talks about statistics along with the index. I got curious if the same would happen if I attempt to create indexes with the same name as statistics already created. There are a few other questions also prompted in my mind. I decided to do a small demonstration of the subject and build following demonstration script.

The goal of my experiment is to find out the relation between statistics and the index. Statistics is one of the important input parameter for the optimizer during query optimization process. If the query is nontrivial then only optimizer uses statistics to perform a cost based optimization to select a plan. For accuracy and further learning I suggest to read MSDN.

Now let’s find out the relationship between index and statistics. We will do the experiment in two parts. i) Creating Index ii) Creating Statistics

We will be using the following T-SQL script for our example.

IF (OBJECT_ID('Table1') IS NOT NULL)
DROP TABLE Table1
GO
CREATE TABLE Table1
(Col1 INT NOT NULL,
Col2 VARCHAR(20) NOT NULL)
GO

We will be using following two queries to check if there are any index or statistics on our sample table Table1.

-- Details of Index
SELECT OBJECT_NAME(OBJECT_ID) AS TableName, Name AS IndexName, type_desc
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'table1'
GO
-- Details of Statistics
SELECT OBJECT_NAME(OBJECT_ID) TableName, Name AS StatisticsName
FROM sys.stats
WHERE OBJECT_NAME(OBJECT_ID) = 'table1'
GO

When I ran above two scripts on the table right after it was created it did not give us any result which was expected.

Now let us begin our test.

1) Create an index on the table

Create following index on the table.

CREATE NONCLUSTERED INDEX Ix_Table1_1 ON Table1(Col1)
GO

Now let us use above two scripts and see their results.

We can see that when we created index at the same time it created statistics also with the same name.

Before continuing to next set of demo – drop the table using following script and re-create the table using a script provided at the beginning of the table.

DROP TABLE table1
GO

2) Create a statistic on the table

Create following statistics on the table.

CREATE STATISTICS Ix_table1_1 ON Table1 (Col1)
GO

Now let us use above two scripts and see their results.

We can see that when we created statistics Index is not created. The behavior of this experiment is different from the earlier experiment.

Clean up the table setup using the following script:

DROP TABLE table1
GO

Above two experiments teach us very valuable lesson that when we create indexes, SQL Server generates the index and statistics (with the same name as the index name) together. Now due to the reason if we have already had statistics with the same name but not the index, it is quite possible that we will face the error to create the index even though there is no index with the same name.

A Quick Check

To validate that if we create statistics first and then index after that with the same name, it will throw an error let us run following script in SSMS. Make sure to drop the table and clean up our sample table at the end of the experiment.

-- Create sample table
CREATE TABLE TestTable
(Col1 INT NOT NULL,
Col2 VARCHAR(20) NOT NULL)
GO
-- Create Statistics
CREATE STATISTICS IX_TestTable_1 ON TestTable (Col1)
GO
-- Create Index
CREATE NONCLUSTERED INDEX IX_TestTable_1 ON TestTable(Col1)
GO
-- Check error
/*Msg 1913, Level 16, State 1, Line 2
The operation failed because an index or statistics with name 'IX_TestTable_1' already exists on table 'TestTable'.
*/
-- Clean up
DROP TABLE TestTable
GO

While creating index it will throw the following error as statistics with the same name is already created.

In simple words – when we create index the name of the index should be different from any of the existing indexes and statistics.

Click to Download Scripts

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Observation of Top with Index and Order of Resultset

SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. There are three different instances where I have come across a situation where I felt that proper understanding is important and something which looks evil may not be evil as suggested. The real reason may be absolutely different but we may blame something totally different for the incorrect results.

Scenario 1: Database Tuning Advisor and Incorrect Results

One of my friends called me he was stressed out. He just ran a few of Database Tuning Advisor on his system and followed up a few of the suggestions by DTA and suddenly his applications and reporting server started to show incorrect results. He got stressed out and reverted every single change he had done so far. Though he was getting better performance his results were not as per expectation. Finally after quick research we figured out that it was not DTA’s advice but something else which was creating this scenario.

Scenario 2: No Index for Correct Results

One of the place where I went for consultation for performance tuning. I realize that there was absolutely no indexes created. When I discussed this with them, they told me creating indexes is giving incorrect results to them. This was very funny to me. I have heard everything before but this was kind of first time for me. However, with carefully investigating I got the idea what they were suggesting and later on solution was reached.

Scenario 3: Top and Changing Results

This one is my own experience when I was beginning my career. Just like every beginner I was not sure about every single command of SQL Server as well I had no idea how the engine behaves internally. I was under the misconception that TOP changes how the results are displayed. I was very much scared of to use TOP in my queries but after a while I understood that it was not TOP which was misbehaving but my knowledge is incomplete. I eventually figured out how to use TOP clause and get expected results.

The Real Issue

The real issue in the above three cases is not using ORDER BY clause for the query. While writing query I have faced situations where one result can be achieved by multiple ways. It is our responsibility that we select the result which is deterministic and consistent. We should opt for the way which maintains database integrity as well, gives optimal performance.

In all the above three scenario the issue was developer was not writing script which was consistent with his understanding. The SQL Server Engine was behaving how it should behave but lack of understanding was building different expectation from the query.

Reproduction of Scenarios

Following T-SQL script is doing the same. First let us create a table with several records.

USE TempDB
GO
CREATE TABLE TestTable (ID INT, Name VARCHAR(10), Title VARCHAR(20))
GO
INSERT INTO TestTable (ID, Name, Title)
SELECT 1, 'Mark', 'Manager'
UNION ALL
SELECT 2, 'John', 'Sales Manager'
UNION ALL
SELECT 3, 'Matt', 'CEO'
UNION ALL
SELECT 4, 'Roger', 'COO'
UNION ALL
SELECT 5, 'Mike', 'Finance Manager'
UNION ALL
SELECT 6, 'Jeff', 'Manager'
UNION ALL
SELECT 7, 'Mason', 'HR Executive'
GO

Let us select records and observe the resultset.

SELECT TOP 3 Name
FROM TestTable
GO

Now let us create index over one of the columns and retrieve records. You will find the result is inconsistent than before or different than what we have received. (Note that I did not use word INCORRECT as it is not incorrect result, but different for sure)

CREATE NONCLUSTERED INDEX [IX_Name]
ON TestTable
(Name ASC)
GO
SELECT TOP 3 Name
FROM TestTable
GO

Now let us use ORDER BY clause and you will see that result is as per expected.

SELECT TOP 3 Name
FROM TestTable
ORDER BY ID
GO

Let us Clean up.

DROP TABLE TestTable
GO

In our sample reproduction I have selected one single column and single column index. In above scenario we had much difficult scenarios and complex cases. The major point of this blog post is to make sure that developers understand the importance of the ORDER BY clause.

Summary

When we are beginning our career it is quite easy get confused and not understanding the impact of code which we are writing. Quite often we build prejudice towards part of technology which should not be the case as we progress in our career. In this example we are trying to drive home the point that when you use TOP and if you are expecting your results in certain order all the time uses ORDER BY along with it.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – ColumnStore Index – Batch Mode vs Row Mode

What do you do when you are in a hurry and hear someone say things which you do not agree or is wrong? Well, let me tell you what I do or what I recently did.

I was walking by and heard someone mentioning “Columnstore Index are really great as they are using Batch Mode which makes them seriously fast.” While I was passing by and I heard this statement my first reaction was I thought Columnstore Index can use both – Batch Mode and Row Mode. I stopped by even though I was in a hurry and asked the person if he meant that Columnstore indexes are seriously fast because they use Batch Mode all the time or Batch Mode is one of the reasons for Columnstore Index to be faster. He responded that Columnstore Indexes can run only in Batch Mode.

However, I do not like to confront anybody without hearing their complete story. Honestly, I like to do information sharing and avoid confronting as much as possible. There are always ways to communicate the same positively. Well, this is what I did, I quickly pull up my earlier article on Columnstore Index and copied the script to SQL Server Management Studio. I created two versions of the script. 1) Very Large Table 2) Reasonably Small Table. I a query which uses columnstore index on both of the versions. I found very interesting result of the my tests.

I saved my tests and sent it to the person who mentioned about that Columnstore Indexes are using Batch Mode only. He immediately acknowledged that indeed he was incorrect in saying that Columnstore Index uses only Batch Mode. What really caught my attention is that he also thanked me for sending him detail email instead of just having argument where he and I both were standing in the corridor and neither have no way to prove any theory.

Here is the screenshots of the both the scenarios.

1) Columnstore Index using Batch Mode

2) Columnstore Index using Row Mode

Here is the logic behind when Columnstore Index uses Batch Mode and when it uses Row Mode. A batch typically represents about 1000 rows of data. Batch mode processing also uses algorithms that are optimized for the multicore CPUs and increased memory throughput.  Batch mode processing spreads metadata access costs and overhead over all the rows in a batch.  Batch mode processing operates on compressed data when possible leading superior performance.

Here is one last point – Columnstore Index can use Batch Mode or Row Mode but Batch Mode processing is only available in Columnstore Index. I hope this statement truly sums up the whole concept.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Finding Size of a Columnstore Index Using DMVs

Columnstore Index is one of my favorite enhancement in SQL Server 2012. A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. Whereas in case of column store indexes multiple pages will contain (multiple) single columns.  Columnstore Indexes are compressed by default and occupies much lesser space than regular row store index by default.

One of the very common question I often see is need of the list of columnstore index along with their size and corresponding table name. I quickly re-wrote a script using DMVs sys.indexes and sys.dm_db_partition_stats. This script gives the size of the columnstore index on disk only. I am sure there will be advanced script to retrieve details related to components associated with the columnstore index. However, I believe following script is sufficient to start getting an idea of columnstore index size. 

SELECT OBJECT_SCHEMA_NAME(i.OBJECT_ID) SchemaName,
OBJECT_NAME(i.OBJECT_ID ) TableName,
i.name IndexName,
SUM(s.used_page_count) / 128.0 IndexSizeinMB
FROM sys.indexes AS i
INNER JOIN sys.dm_db_partition_stats AS S
ON i.OBJECT_ID = S.OBJECT_ID AND I.index_id = S.index_id
WHERE  i.type_desc = 'NONCLUSTERED COLUMNSTORE'
GROUP BY i.OBJECT_ID, i.name

Here is my introductory article written on SQL Server Fundamentals of Columnstore Index. Create a sample columnstore index based on the script described in the earlier article. It will give the following results.

Please feel free to suggest improvement to script so I can further modify it to accommodate enhancements.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Renaming Index – Index Naming Conventions

If you are regular reader of this blog, you must be aware of that there are two kinds of blog posts 1) I share what I learn recently 2) I share what I learn and request your participation. Today’s blog post is where I need your opinion to make this blog post a good reference for future.

Background Story

Recently I came across system where users have changed the name of the few of the table to match their new standard naming convention. The name of the table should be self explanatory and they should have explain their purpose without either opening it or reading documentations. Well, not every time this is possible but again this should be the goal of any database modeler. Well, I no way encourage the name of the tables to be too long like ‘ContainsDetailsofNewInvoices’. May be the name of the table should be ‘Invoices’ and table should contain a column with New/Processed bit filed to indicate if the invoice is processed or not (if necessary). Coming back to original story, the database had several tables of which the name were changed.

Story Continues…

To continue the story let me take simple example. There was a table with the name  ‘ReceivedInvoices’, it was changed to new name as ‘TblInvoices’. As per their new naming standard they had to prefix every talbe with the words ‘Tbl’ and prefix every view with the letters ‘Vw’. Personally I do not see any need of the prefix but again, that issue is not here to discuss.  Now after changing the name of the table they faced very interesting situation. They had few indexes on the table which had name of the table. Let us take an example.

Old Name of Table: ReceivedInvoice
Old Name of Index: Index_ReceivedInvoice1

Here is the new names

New Name of Table: TblInvoices
New Name of Index: ???

Well, their dilemma was what should be the new naming convention of the Indexes. Here is a quick proposal of the Index naming convention. Do let me know your opinion.

If Index is Primary Clustered Index: PK_TableName
If Index is  Non-clustered Index: IX_TableName_ColumnName1_ColumnName2…
If Index is Unique Non-clustered Index: UX_TableName_ColumnName1_ColumnName2…
If Index is Columnstore Non-clustered Index: CL_TableName

Here ColumnName is the column on which index is created. As there can be only one Primary Key Index and Columnstore Index per table, they do not require ColumnName in the name of the index. The purpose of this new naming convention is to increase readability. When any user come across this index, without opening their properties or definition, user can will know the details of the index.

T-SQL script to Rename Indexes

Here is quick T-SQL script to rename Indexes

EXEC sp_rename N'SchemaName.TableName.IndexName', N'New_IndexName', N'INDEX';
GO

Your Contribute Please

Well, the organization has already defined above four guidelines, personally I follow very similar guidelines too. I have seen many variations like adding prefixes CL for Clustered Index and NCL for Non-clustered Index. I have often seen many not using UX prefix for Unique Index but rather use generic IX prefix only.

Now do you think if they have missed anything in the coding standard. Is NCI and CI prefixed required to additionally describe the index names. I have once received suggestion to even add fill factor in the index name – which I do not recommend at all.

What do you think should be ideal name of the index, so it explains all the most important properties? Additionally, you are welcome to vote if you believe changing the name of index is just waste of time and energy. 

Note: The purpose of the blog post is to encourage all to participate with their ideas. I will write follow up blog posts in future compiling all the suggestions.

Reference: Pinal Dave (http://blog.sqlauthority.com)