SQL SERVER – An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3 – Part 4

This is the fourth part of the series regarding Redundant Indexes. If have not read earlier part – there is quite a good chance that you will miss the context of this part. I quickly suggest you to read earlier three parts. On my production server I personally use embarcadero DB Optimizer for all performance tuning and routine health check up. I will be interested to know what is your feedback about the product.

Part 1: What is Redundant Indexes?
Conversation between Mike and Jon – where they discuss about the fundamentals of Redundant Indexes.

Part 2: Demo – What kind of Redundant Indexes are Useful with a smaller result set
Here Jon demonstrates a script where even though Redundant Indexes exist they are extremely useful and should not be dropped. The result set contains the very small result set.

Part 3: Demo – What kind of Redundant Indexes are Useful with a larger result set
Here Jon demonstrates a script where even though Redundant Indexes exist they are extremely useful and should not be dropped. The result set contains the very large result set.

Once again if you have not read the earlier blog post, there are good chances you will miss the core point of this blog post. I suggest you at least read the conversation between Mike and Jon to get familiar with their personalities.

Jon and Mike continue their conversation from last week.

Mike – Jon, In very first day we discussed that we should also pay attention to the Order By clause of the columns used in the Indexes. Do you have any working demonstration example where we can understand that concept in depth.

Jon – Sure, Mike. I think you are referring to following example which we have discussed in the conversation.

Index 5: Col1 ASC, Col2 DESC
Index 6: Col1 DESC, Col2 DESC, Col3 ASC

Am I correct?

Mike – Yes, would you please explain how ASC and DESC works with the case of Redundant Indexes.

In normal case, we will not see the direction of the Columns used and consider Indexes as a redudant. For example when I see Index 5 and Index 6 I would see them as following.

Index 5: Col1 ASC, Col2 DESC
Index 6: Col1 DESC, Col2 DESC, Col3 ASC

I will just see that Index 5 has two columns (Col1 and Col2), whereas Index 6 have three columns (Col1, Col2, Col3) and they are redundant (read the very first blog post for why they are considered as a redundant).

However, from our conversation I do understand that the columns used to build indexes have a direction and we should consider them as an important factor before we make these indexes redundant.

Jon – Great Point Mike! You are absolutely correct. Let us understand the same using example.

We will create a single table with two different Indexes. Later we will run two different but similar looking queries on the table. We will observe how their index usage pattern is.

First create a table called SampleTable1.

USE tempdb
GO
-- Create table
CREATE TABLE SampleTable1 (ID INT, Col1 INT, Col2 INT, Col3 INT)
GO

Now let us create indexes on Sample Table.

Clustered Indexes are just created for reference – the results of this test will not be affected by its presence or absence of it. I have just created to rule out few doubts I anticipate by their absence. The important part is non clustered indexes. One of the non-clustered index is created on Col1 & Col2 DESC and another one is created on Col1 DESC, Col2 DESC, and Col3.

Table1: SampleTable1 
Index on SampleTable1: IX_ST_Col1_Col2
Index on SampleTable1: IX_ST_Col1_Col2_Col3

-- Create Indexes on Sample Table1
CREATE CLUSTERED INDEX [CX_ST]
ON SampleTable1 (ID)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2]
ON SampleTable1 (Col1 ASC, Col2 DESC)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2_Col3]
ON SampleTable1 (Col1 DESC, Col2 DESC, Col3 ASC)
GO

Now let us populate both the tables. Both the tables have absolutely same data.

-- Populate tables
INSERT INTO SampleTable1 (ID, Col1, Col2, Col3)
SELECT RAND()*10000, RAND()*1000, RAND()*100, RAND()*10
GO 100000

Now is the most interesting part. For this first enable the execution plan in SSMS (shortcut key – CTRL + M).

In this test we will run two different queries on both the tables.

Let us first run two scripts for SampleTable1

-- Select from SampleTable1
SELECT Col1, Col2
FROM SampleTable1 st
WHERE st.Col1 > 10 AND st.Col1 < 20
ORDER BY st.Col1 DESC, st.Col2 DESC
GO
-- Select from SampleTable1
SELECT Col1, Col2
FROM SampleTable1 st
WHERE st.Col1 > 10 AND st.Col1 < 20
ORDER BY st.Col1 ASC, st.Col2 DESC
GO

Let us check the execution plan:

Now when you see the execution plan, you will notice that both the queries are taking an equal amount of the resources to execute both the queries. They each take 50% of the time compared to the each other. However, we need to pay extra attention to the queries and see which indexes each of the query is using.

Let us examine very first query individually.

We can see from the execution plan – it is using the index – IX_ST_Col1_Col2_Col3.

Now let us examine second query individually.

We can see from the execution plan – it is using the index – IX_ST_Col1_Col2.

Now let us put a everything together quickly.

When WHERE Conditions are st.Col1 DESC, st.Col2 DESC they use the Index with Columns Col1 DESC, Col2 DESC, Col3 ASC.
When WHERE Conditions are st.Col1 ASC, st.Col2 DESC they use the Index with ColumnsCol1 ASC, Col2 DESC.

I think by for you must have guessed that there is a relationship between the order of the columns with the ORDER by Clause used in the WHERE condition.

Even though, Index IX_ST_Col1_Col2 looks like a Redundant Index, it is not a Redundant Index of all. They are matter of the fact very different Indexes. They should be both kept if both are used in the queries.

There are a lot more things we can discuss over here – but this is just a starter. Any question so far.

Mike – Thanks Jon, I understood so far. Here is the question in my mind – If I am following two Indexes they are considered as a different Index am I correct?

Index AA: Col1 DESC, Col2 ASC
Index BB: Col1 ASC, Col2 ASC

Am I correct?

Jon – Yes, they are different Indexes for sure.

Mike – Oh okey I perfectly got it. Let me quickly summarize it.

Even though Indexes look redundant there may be some queries which may find them useful. ORDER of the columns used in the Indexes is very important when considering the redundancy of the indexes. If the column order is different the Indexes are also considered as a different indexes. Before dropping the indexes one should properly validate the usage patterns of the indexes and query workloads. One should properly test everything before taking any actions. 

Jon – Good summary. Let us do a quick clean up.

-- Clean Up
DROP TABLE SampleTable1
GO

Mike – Perfect, I think I know a lot now.

Jon – Do you really think you know everything?

Mike – Well, sort of. Right?

Jon – If you think so answer me following question -

Are following two indexes should be considered as a redundant indexes or are they different indexes?

Index AA: Col1 DESC, Col2 DESC
Index BB: Col1 ASC, Col2 ASC

Mike – From what I understood today – they are Not Redundant Index.

Jon – Glad I asked this question. You are WRONG!

Mike – What? How come?

Jon – Let us talk about this future blog posts – I think I need to teach you a lot of things!

Mike – Thanks, I am indeed looking forward to learn more. So what I understand is that it is quite possible that the indexes may look redundant but they may not be redundant indexes. What I understand is that before dropping any indexes I need to identify if the indexes are used or not. Is there any professional tool which can help me?

Jon – Absolutely, I suggest that you try embarcadero DB Optimizer. It helps finding users the precise details related to index usage. It also indicates if Indexes are useful or useless. This tool is very easy to use and help in various aspects of the performance tuning. Here is detailed articles on this subject Part 1 and Part 2. This tool can help you with writing proper queries, do indexes maintenance and optimize overall health of the server. 

Mike – Thanks Jon!

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3 – Part 3

Before you start reading this blog post, I strongly suggest you to read the part 1 of this series.It talks about What is Redundant Index. The story is a conversation between two individuals – Jon and Mike. They are different but have single goal learn and explore SQL Server. Their initial conversation sets the ground for this blog post. They earlier discussed what is a Redundant Index as well, discussed what are the special cases for the same. It is a general assumption (or common best practices) is to drop Redundant Indexes. On my production server I personally use embarcadero DB Optimizer for all performance tuning and routine health check up. I will be interested to know what is your feedback about the product.

Later Mike asks for special case where even though the index is clearly a Redundant Index, why it should not be removed. Jon promises to explain with a demo where a Redundant Index is useful and should not be dropped. In part 2 of this series – we talked about in detail about Kind of Redundant Indexes are Useful. We demonstrated that a column width makes a big difference in the performance of the index with a query. The query used in the demonstration was returning a very small result set.

Mike had a question for Jon that if the query is returning a larger result set will it still have the same conclusion which they had arrived in an earlier blog post. Let us continue the conversation here.

Mike – Today is Friday. You promised me a demo today where a Redundant Index is useful Index, even though the query were returning a larger result set.

Jon - Absolutely. Our demo is absolutely same as the earlier demo – the only difference today we will see is that our where condition will be different from the earlier query. This time we will have a much larger resultset returning by our query. We will observe the output and come to the conclusion if our finding is still the same as we discovered earlier.

We will create two tables and will create absolutely same table. We will notice that on the first table Redundant Index will be useless and needs to be dropped whereas on the second table Redundant Index will useful and should not be dropped for performance. One with all the column INT and second with a wider column CHAR (800).

USE tempdb
GO
-- Create table
CREATE TABLE SampleTable1 (ID INT, Col1 INT, Col2 INT, Col3 INT)
GO
CREATE TABLE SampleTable2 (ID INT, Col1 INT, Col2 INT, Col3 CHAR(800))
GO

Now let us create indexes on both the tables. We will make sure that created indexes are same on both the table.

Clustered Indexes are just created for reference – the results of this test will not be affected by its presence or absence of it. I have just created to rule out few doubts I anticipate by their absence. The important part is non clustered indexes. One of the non-clustered index is created on Col1 & Col2 and another one is created on Col1, Col2, and Col3.

Table1: SampleTable1 
Index on SampleTable1: IX_ST_Col1_Col2
Index on SampleTable1: IX_ST_Col1_Col2_Col3

Table2: SampleTable2
Index on SampleTable2: IX_ST_Col1_Col2
Index on SampleTable2: IX_ST_Col1_Col2_Col3

-- Create Indexes on Sample Table1
CREATE CLUSTERED INDEX [CX_ST]
ON SampleTable1 (ID)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2]
ON SampleTable1 (Col1, Col2)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2_Col3]
ON SampleTable1 (Col1, Col2, Col3)
GO
-- Create Indexes on Sample Table2
CREATE CLUSTERED INDEX [CX_ST]
ON SampleTable2 (ID)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2]
ON SampleTable2 (Col1, Col2)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2_Col3]
ON SampleTable2 (Col1, Col2, Col3)
GO

Now let us populate both the tables. Both the tables have absolutely same data.

-- Populate tables
INSERT INTO SampleTable1 (ID, Col1, Col2, Col3)
SELECT RAND()*10000, RAND()*1000, RAND()*100, RAND()*10
GO 100000
INSERT INTO SampleTable2 (ID, Col1, Col2, Col3)
SELECT *
FROM SampleTable1
GO

Now is the most interesting part. For this first enable the execution plan in SSMS (shortcut key – CTRL + M).

We will be doing two tests. Let describe our first test.

Test 2: Select a larger set of the data

In this test we will run a same query on both the tables. Both the times we will apply our path of the index on the each table. As there are 2 tables and each have 2 indexes we will have a total of 4 indexes.

Table1: SampleTable1 
Index on SampleTable1: IX_ST_Col1_Col2
Index on SampleTable1: IX_ST_Col1_Col2_Col3

Table2: SampleTable2
Index on SampleTable2: IX_ST_Col1_Col2
Index on SampleTable2: IX_ST_Col1_Col2_Col3

Now let us run following script with keeping the Actual Execution Plan on (shortcut key CTRM: + M)

Let us first run two scripts for SampleTable1

-- Select from table
SELECT Col1, Col2
FROM SampleTable1 st WITH(INDEX([IX_ST_Col1_Col2]))
WHERE st.Col1 > 10 AND st.Col1 < 20
GO
-- Select from table
SELECT Col1, Col2
FROM SampleTable1 st WITH(INDEX(IX_ST_Col1_Col2_Col3))
WHERE st.Col1 > 10 AND st.Col1 < 20
GO

Let us check the execution plan:

You can notice from the execution plan that in the case of the SampleTable1 it does not matter if we use either of the index the performance of the both the query is same. Both the queries are using the same amount of the resources. In this case, Col3 is an integer and for SQL Server the width of the column does not make much difference. I can clearly say in this case Indexes are redundant as they are giving the same performance.

Now looking at both the indexes indexes IX_ST_Col1_Col2 is subset of IX_ST_Col1_Col2_Col3 and as mentioned in an earlier note if there is an additional column (col3) is in the SELECT statement that index will be more suitable. We can easily remove IX_ST_Col1_Col2 index in this particular special case (note this does not apply all the time).

Now let us run similar script for SampleTable2

-- Select from table
SELECT Col1, Col2
FROM SampleTable2 st WITH(INDEX([IX_ST_Col1_Col2]))
WHERE st.Col1 > 10 AND st.Col1 < 20
GO
-- Select from table
SELECT Col1, Col2
FROM SampleTable2 st WITH(INDEX([IX_ST_Col1_Col2_Col3]))
WHERE st.Col1 > 10 AND st.Col1 < 20
GO

Let us check the execution plan:

You can notice from the execution plan that in the case of the SampleTable2 it matters a lot about which index is used for the query as the performance differences between those queries is huge.  One of the query is using very little resources and another one is taking a huge amount of the resources. In this case, Col3 is a CHAR (800) datatype which is fixed length string datatype. In this case, for SQL Server the width of the column does make a big difference. We can clearly say that here in our SELECT statement IX_ST_Col1_Col2 is the most optimal indexes.

If that is the case, what is the use of IX_ST_Col1_Col2_Col3. Well, the answer of this question is also interesting. If you change the SELECT statement to also include Col3, it will become totally different scenario as it will require to do a key lookup for IX_ST_Col1_Col2. If your SELECT statement has Col1, Col2, Col3 – the optimal index here is IX_ST_Col1_Col2_Col3. The need of the both the indexes is different and they achieve a specific task. If you think IX_ST_Col1_Col2 is redundant as a more inclusive index IX_ST_Col1_Col2_Col3 exists, it will be not the optimal thinking. Even though, IX_ST_Col1_Col2_Col3 includes all the columns, when SQL Server only needs Col1 and Col2 it finds IX_ST_Col1_Col2 more suitable for performance.

I hope this is now clear to you that how to identify if the redundant index is useful or useless now.

Even this second test, proved our earlier findings correct that it does not matter if resultset had more rows or less, the Redundent Index can be helpful and before dropping them we should evaluate them properly.

One more thing to notice is that when we ran the query for SampleTable2, the resource utilization is even larger than what we have seen in the earlier tests done when we had smaller resultset. With the larger resultset in this example the redudent index which has less number of column have greater (optimal) performance.

Mike - Ahha – I guess in this case the summary should stay the same. Let me quickly summarize it.

Even though Indexes look redundant there may be some queries which may find them useful. This usually happens when the data type of the of any column is much wider than other columns. Before dropping the indexes one should properly validate the usage patterns of the indexes and query workloads. One should properly test everything before taking any actions. 

Jon – Good summary. It is always a good idea to test various combinations before making a decision. One should not drop an index because it was marked as a Redundant Index.

Mike – This is interesting! Jon. Do you have any more goodies related to this subject.

Jon – I do have many more goodies. However, let us do a quick clean up.

-- Clean Up
DROP TABLE SampleTable1
GO
DROP TABLE SampleTable2
GO

Mike – What about some more insight on this subject please?

Jon – Wait and watch – we will have something new next week!

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3 – Part 2

Before you start reading this blog post, I strongly suggest you to read the part 1 of this series. It talks about What is Redundant Index. The story is a conversation between two individuals – Jon and Mike. They are different but have single goal learn and explore SQL Server. Their initial conversation sets the ground for this blog post. They earlier discussed what is a Redundant Index as well, discussed what are the special cases for the same. It is a general assumption (or common best practices) is to drop Redundant Indexes. Later Mike asks for special case where even though the index is clearly a Redundant Index, why it should not be removed. Jon promises to explain with a demo where a Redundant Index is useful and should not be dropped. Here is their conversation continued from earlier. On my production server I personally use embarcadero DB Optimizer for all performance tuning and routine health check up. I will be interested to know what is your feedback about the product.

Mike – Today is Monday. You promised me a demo today where a Redundant Index is useful Index.

Jon - Absolutely. We will create two tables and will create absolutely same table. We will notice that on the first table Redundant Index will be useless and needs to be dropped whereas on the second table Redundant Index will useful and should not be dropped for performance.

Let us start with a demo.

Let us create two tables. One with all the column INT and second with a wider column CHAR (800).

USE tempdb
GO
-- Create table
CREATE TABLE SampleTable1 (ID INT, Col1 INT, Col2 INT, Col3 INT)
GO
CREATE TABLE SampleTable2 (ID INT, Col1 INT, Col2 INT, Col3 CHAR(800))
GO

Now let us create indexes on both the tables. We will make sure that created indexes are same on both the table.

Clustered Indexes are just created for reference – the results of this test will not be affected by its presence or absence of it. I have just created to rule out few doubts I anticipate by their absence. The important part is non clustered indexes. One of the non-clustered index is created on Col1 & Col2 and another one is created on Col1, Col2, and Col3.

Table1: SampleTable1 
Index on SampleTable1: IX_ST_Col1_Col2
Index on SampleTable1: IX_ST_Col1_Col2_Col3

Table2: SampleTable2
Index on SampleTable2: IX_ST_Col1_Col2
Index on SampleTable2: IX_ST_Col1_Col2_Col3

-- Create Indexes on Sample Table1
CREATE CLUSTERED INDEX [CX_ST]
ON SampleTable1 (ID)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2]
ON SampleTable1 (Col1, Col2)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2_Col3]
ON SampleTable1 (Col1, Col2, Col3)
GO
-- Create Indexes on Sample Table2
CREATE CLUSTERED INDEX [CX_ST]
ON SampleTable2 (ID)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2]
ON SampleTable2 (Col1, Col2)
GO
CREATE NONCLUSTERED INDEX [IX_ST_Col1_Col2_Col3]
ON SampleTable2 (Col1, Col2, Col3)
GO

Now let us populate both the tables. Both the tables have absolutely same data.

-- Populate tables
INSERT INTO SampleTable1 (ID, Col1, Col2, Col3)
SELECT RAND()*10000, RAND()*1000, RAND()*100, RAND()*10
GO 100000
INSERT INTO SampleTable2 (ID, Col1, Col2, Col3)
SELECT *
FROM SampleTable1
GO

Now is the most interesting part. For this first enable the execution plan in SSMS (shortcut key – CTRL + M).

We will be doing two tests. Let describe our first test.

Test 1: Select a smaller set of the data

In this test we will run a same query on both the tables. Both the times we will apply our path of the index on the each table. As there are 2 tables and each have 2 indexes we will have a total of 4 indexes.

Table1: SampleTable1 
Index on SampleTable1: IX_ST_Col1_Col2
Index on SampleTable1: IX_ST_Col1_Col2_Col3

Table2: SampleTable2
Index on SampleTable2: IX_ST_Col1_Col2
Index on SampleTable2: IX_ST_Col1_Col2_Col3

Now let us run following script with keeping the Actual Execution Plan on (shortcut key CTRM: + M)

Let us first run two scripts for SampleTable1

-- Select from SampleTable1
SELECT Col1, Col2
FROM SampleTable1 st WITH(INDEX([IX_ST_Col1_Col2]))
WHERE st.Col1 = 10
GO
-- Select from SampleTable1
SELECT Col1, Col2
FROM SampleTable1 st WITH(INDEX(IX_ST_Col1_Col2_Col3))
WHERE st.Col1 = 10
GO

Let us check the execution plan:

You can notice from the execution plan that in the case of the SampleTable1 it does not matter if we use either of the index the performance of the both the query is same. Both the queries are using the same amount of the resources. In this case, Col3 is an integer and for SQL Server the width of the column does not make much difference. I can clearly say in this case Indexes are redundant as they are giving the same performance.

(Note: If you are going to ask me to change the SELECT statement to also include Col3, it will become totally different scenario as it will require to do a key lookup for IX_ST_Col1_Col2. If your SELECT statement has Col1, Col2, Col3 – the optimal index here is IX_ST_Col1_Col2_Col3, there is no further discussion in that case).

(A Quick Tip: When we compare execution plan – the higher cost compared to the batch explains higher usage of the resources and expensive query.)

Now looking at both the indexes indexes IX_ST_Col1_Col2 is subset of IX_ST_Col1_Col2_Col3 and as mentioned in an earlier note if there is an additional column (col3) is in the SELECT statement that index will be more suitable. We can easily remove IX_ST_Col1_Col2 index in this particular special case (note this does not apply all the time).

Now let us run similar script for SampleTable2

-- Select from SampleTable2
SELECT Col1, Col2
FROM SampleTable2 st WITH(INDEX([IX_ST_Col1_Col2]))
WHERE st.Col1 = 10
GO
-- Select from SampleTable2
SELECT Col1, Col2
FROM SampleTable2 st WITH(INDEX([IX_ST_Col1_Col2_Col3]))
WHERE st.Col1 = 10
GO

Let us check the execution plan:

You can notice from the execution plan that in the case of the SampleTable2 it matters a lot about which index is used for the query as the performance differences between those queries is huge.  One of the query is using very little resources and another one is taking a huge amount of the resources. In this case, Col3 is a CHAR (800) datatype which is fixed length string datatype. In this case, for SQL Server the width of the column does make a big difference. We can clearly say that here in our SELECT statement IX_ST_Col1_Col2 is the most optimal indexes.

If that is the case, what is the use of IX_ST_Col1_Col2_Col3. Well, the answer of this question is also interesting. If you change the SELECT statement to also include Col3, it will become totally different scenario as it will require to do a key lookup for IX_ST_Col1_Col2. If your SELECT statement has Col1, Col2, Col3 – the optimal index here is IX_ST_Col1_Col2_Col3. The need of the both the indexes is different and they achieve a specific task. If you think IX_ST_Col1_Col2 is redundant as a more inclusive index IX_ST_Col1_Col2_Col3 exists, it will be not the optimal thinking. Even though, IX_ST_Col1_Col2_Col3 includes all the columns, when SQL Server only needs Col1 and Col2 it finds IX_ST_Col1_Col2 more suitable for performance.

I hope this is now clear to you that how to identify if the redundant index is useful or useless now.

Mike - Thanks Jon, this is a great explanation. Let me quickly summarize it.

Even though Indexes look redundant there may be some queries which may find them useful. This usually happens when the data type of the of any column is much wider than other columns. Before dropping the indexes one should properly validate the usage patterns of the indexes and query workloads. One should properly test everything before taking any actions. 

Jon – Good summary. Test before you Act! However, you should notice that this is not the only case when redundant indexes are useful. There may be other cases too!

Mike – I understand. However, before you continue further, I see that you called this as a Test 1. Is there any test 2 with the larger dataset? Does it also validate my earlier summary.

Jon – Another good question – Let us see the Test 2 in Friday’s blog post. You can clean up your database by dropping your test tables.

-- Clean Up
DROP TABLE SampleTable1
GO
DROP TABLE SampleTable2
GO

Stay tuned for part 3 of this series.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3 – Part 1

Index never stops amazing me, there are so much to learn about Index that I never feel that there is enough knowledge out about this subject. If you are interested you can watch my Indexing Course on Pluralsight for further learning on this subject. On my production server I personally use embarcadero DB Optimizer for all performance tuning and routine health check up. I will be interested to know what is your feedback about the product.

Instead of going on the theory overload – let us start with this blog post as a conversation between two individuals – Jon and Mike. These are just random names. Jon is senior and experienced SQL Server Expert and Mike is beginner with SQL Server.

Mike – What is Redundant Index?

Jon – Indexes are redundant when they have similar columns as a part of a definition. Additionally, the indexes are considered redundant when their first few columns are in the same position with same order by direction are also considered as a redundant.

Mike – Would you please explain it with examples?

Jon – Sure, Let us assume we have two indexes:

Index 1: Col1, Col2, Col3
Index 2: Col1, Col2, Col3

Now if you look at them – they have the same columns as a part of their definition, so they are indeed redundant indexes. However, look at the following scenario:

Index 3: Col1, Col2
Index 4: Col1, Col2, Col3

In this case they are also considered as a redundant because the position of the Col1, Col2 are same in both of the index. It is quite commonly considered that if the initial positions of the columns are the same, they are redundant.

However, there is one more concept here to be looked at as well before we make certain about their redundancy. Look at the following indexes:

Index 5: Col1 ASC, Col2 DESC
Index 6: Col1 DESC, Col2 DESC, Col3 ASC

In this case if the initial positions are the same, they are not redundant as the order of the column is not the same.

There are lot more to discuss but this is just to give you an initial idea. There is one more concept we should consider before calling any index redundant is Included Index. Here is the simple scenario for it.

Index 7: Col1 ASC Included (col2)
Index 8: Col1 ASC Included (col3)

You can notice they have same initial column but the Included columns are totally different.

Mike – Thanks, I got it. It seems that Redundant Indexes are not good and they should be dropped correct.

In case of Index 1 and Index 2 I think we should drop either of the one.

In case of Index 3 and Index 4 I believe Index 4 has more columns and covering, so we should keep it and drop the other one.

In case of Index 5 and Index 6, they are both different indexes so we should keep both.

In case of Index 7 and Index 8, again they are both different index in this case. They can be redundant if the included columns are overlapping to each other.

Am I correct to say this?

Jon – Very good analysis. You are very close to the understanding. Generally, redundant indexes are not good and they should be absolutely addressed. In most cases, redundant cases should be dropped. 

Mike – Ahha, so in the most cases indexes should be dropped. Ok, so is there any script or guidance to detect redundant indexes for the most cases.

Jon – Sure, here is the script which does that – however, this query just addresses the scenario of the Index 3 and Index 4. It does not talk about Included Columns or Index Order (ASC or DESC). Just use that for a start but do your analysis on this subject before you drop your indexes. You still have to check for order of the index and included columns as well.

Mike – Perfect, I understand that the script is just for a quick start and not the complete solution. Now you mentioned “In Most Cases” – what are the special cases. What are the cases when an Index which absolutely qualify for the redundant index but should not be dropped. Would you please explain the special cases?

Jon – Absolutely – there are always special cases. For example the width of the column matters.

Mike – Okey I would love to learn more about this – would you please explain.

Jon – Absolutely – I have a working example of it – Checkout Monday’s blog post. I will explain you in detail.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Identify Numbers of Non Clustered Index on Tables for Entire Database

Here is the script which will give you numbers of non clustered indexes on any table in entire database.

SELECT COUNT(i.TYPE) NoOfIndex,
[schema_name] = s.name, table_name = o.name
FROM sys.indexes i
INNER JOIN sys.objects o ON i.[object_id] = o.[object_id]
INNER JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
WHERE o.TYPE IN ('U')
AND
i.TYPE = 2
GROUP BY s.name, o.name
ORDER BY schema_name, table_name

Here is the small story behind why this script was needed.

I recently went to meet my friend in his office and he introduced me to his colleague in office as someone who is an expert in SQL Server Indexing. I politely said I am yet learning about Indexing and have a long way to go. My friend’s colleague right away said – he had a suggestion for me with related to Index. According to him he was looking for a script which will count all the non clustered on all the tables in the database and he was not able to find that on SQLAuthority.com.

I was a bit surprised as I really do not remember all the details about what I have written so far. I quickly pull up my phone and tried to look for the script on my custom search engine and he was correct. I never wrote a script which will count all the non clustered indexes on tables in the whole database. Excessive indexing is not recommended in general. If you have too many indexes it will definitely negatively affect your performance. The above query will quickly give you details of numbers of indexes on tables on your entire database. You can quickly glance and use the numbers as reference.

Please note that the number of the index is not a indication of bad indexes. There is a lot of wisdom I can write here but that is not the scope of this blog post. There are many different rules with Indexes and many different scenarios. For example – a table which is heap (no clustered index) is often not recommended on OLTP workload (here is the blog post to identify them), drop unused indexes with careful observation (here is the script for it), identify missing indexes and after careful testing add them (here is the script for it). Even though I have given few links here it is just the tip of the iceberg. If you follow only above four advices your ship may still sink. Those who wants to learn the subject in depth can watch the videos here after logging in.

Note: You change where condition type to 6 for nonclustered column store index.

Click to Download Scripts

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Guest Post by Sandip Pani – SQL Server Statistics Name and Index Creation

Sometimes something very small or a common error which we observe in daily life teaches us new things. SQL Server Expert Sandip Pani (winner of Joes 2 Pros Contests) has come across similar experience. Sandip has written a guest post on an error he faced in his daily work. Sandip is working for QSI Healthcare as an Associate Technical Specialist and have more than 5 years of total experience. He blogs at SQLcommitted.com and contribute in various forums. His social media hands are LinkedIn, Facebook and Twitter.


Once I faced following error when I was working on performance tuning project and attempt to create an Index.

Mug 1913, Level 16, State 1, Line 1
The operation failed because an index or statistics with name ‘Ix_Table1_1′ already exists on table ‘Table1′.

The immediate reaction to the error was that I might have created that index earlier and when I researched it further I found the same as the index was indeed created two times. This totally makes sense. This can happen due to many reasons for example if the user is careless and executes the same code two times as well, when he attempts to create index without checking if there was index already on the object. However when I paid attention to the details of the error, I realize that error message also talks about statistics along with the index. I got curious if the same would happen if I attempt to create indexes with the same name as statistics already created. There are a few other questions also prompted in my mind. I decided to do a small demonstration of the subject and build following demonstration script.

The goal of my experiment is to find out the relation between statistics and the index. Statistics is one of the important input parameter for the optimizer during query optimization process. If the query is nontrivial then only optimizer uses statistics to perform a cost based optimization to select a plan. For accuracy and further learning I suggest to read MSDN.

Now let’s find out the relationship between index and statistics. We will do the experiment in two parts. i) Creating Index ii) Creating Statistics

We will be using the following T-SQL script for our example.

IF (OBJECT_ID('Table1') IS NOT NULL)
DROP TABLE Table1
GO
CREATE TABLE Table1
(Col1 INT NOT NULL,
Col2 VARCHAR(20) NOT NULL)
GO

We will be using following two queries to check if there are any index or statistics on our sample table Table1.

-- Details of Index
SELECT OBJECT_NAME(OBJECT_ID) AS TableName, Name AS IndexName, type_desc
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'table1'
GO
-- Details of Statistics
SELECT OBJECT_NAME(OBJECT_ID) TableName, Name AS StatisticsName
FROM sys.stats
WHERE OBJECT_NAME(OBJECT_ID) = 'table1'
GO

When I ran above two scripts on the table right after it was created it did not give us any result which was expected.

Now let us begin our test.

1) Create an index on the table

Create following index on the table.

CREATE NONCLUSTERED INDEX Ix_Table1_1 ON Table1(Col1)
GO

Now let us use above two scripts and see their results.

We can see that when we created index at the same time it created statistics also with the same name.

Before continuing to next set of demo – drop the table using following script and re-create the table using a script provided at the beginning of the table.

DROP TABLE table1
GO

2) Create a statistic on the table

Create following statistics on the table.

CREATE STATISTICS Ix_table1_1 ON Table1 (Col1)
GO

Now let us use above two scripts and see their results.

We can see that when we created statistics Index is not created. The behavior of this experiment is different from the earlier experiment.

Clean up the table setup using the following script:

DROP TABLE table1
GO

Above two experiments teach us very valuable lesson that when we create indexes, SQL Server generates the index and statistics (with the same name as the index name) together. Now due to the reason if we have already had statistics with the same name but not the index, it is quite possible that we will face the error to create the index even though there is no index with the same name.

A Quick Check

To validate that if we create statistics first and then index after that with the same name, it will throw an error let us run following script in SSMS. Make sure to drop the table and clean up our sample table at the end of the experiment.

-- Create sample table
CREATE TABLE TestTable
(Col1 INT NOT NULL,
Col2 VARCHAR(20) NOT NULL)
GO
-- Create Statistics
CREATE STATISTICS IX_TestTable_1 ON TestTable (Col1)
GO
-- Create Index
CREATE NONCLUSTERED INDEX IX_TestTable_1 ON TestTable(Col1)
GO
-- Check error
/*Msg 1913, Level 16, State 1, Line 2
The operation failed because an index or statistics with name 'IX_TestTable_1' already exists on table 'TestTable'.
*/
-- Clean up
DROP TABLE TestTable
GO

While creating index it will throw the following error as statistics with the same name is already created.

In simple words – when we create index the name of the index should be different from any of the existing indexes and statistics.

Click to Download Scripts

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Observation of Top with Index and Order of Resultset

SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. There are three different instances where I have come across a situation where I felt that proper understanding is important and something which looks evil may not be evil as suggested. The real reason may be absolutely different but we may blame something totally different for the incorrect results.

Scenario 1: Database Tuning Advisor and Incorrect Results

One of my friends called me he was stressed out. He just ran a few of Database Tuning Advisor on his system and followed up a few of the suggestions by DTA and suddenly his applications and reporting server started to show incorrect results. He got stressed out and reverted every single change he had done so far. Though he was getting better performance his results were not as per expectation. Finally after quick research we figured out that it was not DTA’s advice but something else which was creating this scenario.

Scenario 2: No Index for Correct Results

One of the place where I went for consultation for performance tuning. I realize that there was absolutely no indexes created. When I discussed this with them, they told me creating indexes is giving incorrect results to them. This was very funny to me. I have heard everything before but this was kind of first time for me. However, with carefully investigating I got the idea what they were suggesting and later on solution was reached.

Scenario 3: Top and Changing Results

This one is my own experience when I was beginning my career. Just like every beginner I was not sure about every single command of SQL Server as well I had no idea how the engine behaves internally. I was under the misconception that TOP changes how the results are displayed. I was very much scared of to use TOP in my queries but after a while I understood that it was not TOP which was misbehaving but my knowledge is incomplete. I eventually figured out how to use TOP clause and get expected results.

The Real Issue

The real issue in the above three cases is not using ORDER BY clause for the query. While writing query I have faced situations where one result can be achieved by multiple ways. It is our responsibility that we select the result which is deterministic and consistent. We should opt for the way which maintains database integrity as well, gives optimal performance.

In all the above three scenario the issue was developer was not writing script which was consistent with his understanding. The SQL Server Engine was behaving how it should behave but lack of understanding was building different expectation from the query.

Reproduction of Scenarios

Following T-SQL script is doing the same. First let us create a table with several records.

USE TempDB
GO
CREATE TABLE TestTable (ID INT, Name VARCHAR(10), Title VARCHAR(20))
GO
INSERT INTO TestTable (ID, Name, Title)
SELECT 1, 'Mark', 'Manager'
UNION ALL
SELECT 2, 'John', 'Sales Manager'
UNION ALL
SELECT 3, 'Matt', 'CEO'
UNION ALL
SELECT 4, 'Roger', 'COO'
UNION ALL
SELECT 5, 'Mike', 'Finance Manager'
UNION ALL
SELECT 6, 'Jeff', 'Manager'
UNION ALL
SELECT 7, 'Mason', 'HR Executive'
GO

Let us select records and observe the resultset.

SELECT TOP 3 Name
FROM TestTable
GO

Now let us create index over one of the columns and retrieve records. You will find the result is inconsistent than before or different than what we have received. (Note that I did not use word INCORRECT as it is not incorrect result, but different for sure)

CREATE NONCLUSTERED INDEX [IX_Name]
ON TestTable
(Name ASC)
GO
SELECT TOP 3 Name
FROM TestTable
GO

Now let us use ORDER BY clause and you will see that result is as per expected.

SELECT TOP 3 Name
FROM TestTable
ORDER BY ID
GO

Let us Clean up.

DROP TABLE TestTable
GO

In our sample reproduction I have selected one single column and single column index. In above scenario we had much difficult scenarios and complex cases. The major point of this blog post is to make sure that developers understand the importance of the ORDER BY clause.

Summary

When we are beginning our career it is quite easy get confused and not understanding the impact of code which we are writing. Quite often we build prejudice towards part of technology which should not be the case as we progress in our career. In this example we are trying to drive home the point that when you use TOP and if you are expecting your results in certain order all the time uses ORDER BY along with it.

Reference: Pinal Dave (http://blog.sqlauthority.com)