SQL SERVER – Mistake to Avoid: Duplicate and Overlapping Indexes – Notes from the Field #073

[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about indexes and its impact. We often believe that indexes will improve the performance of the query, but it is not true always. There are cases when indexes can reduce the performance as well. Read the experience of  Kathi in her own words.

Having the correct indexes in place can improve the performance of your queries immensely. Knowing just what the correct indexes are, however, is not an easy task. SQL Server provides tools to help you create indexes based on a specific query or the workload that has run since the last restart of the instance. While these are handy tools, you must take the guidance they give with the time-honored grain of salt.

Having done my share of index tuning over the past few years, I have seen the consequences of blindly creating indexes based on the suggestions of the Database Engine Tuning Advisor, the missing index DMV, or the missing index recommendation found in the execution plan. It’s easy to copy a slowly running query into the Query window of SSMS and view the execution plan only to find that an index can be created to drastically improve the query performance. The problem is that this recommendation does not consider indexes that are already in place. By creating every index that these tools suggest, you will end up with too many indexes and many that overlap. At first you may not think this is a problem: more is better, right?

Having too many nonclustered indexes can cause numerous problems. First, unneeded indexes take up space. This impacts storage costs, backup and recovery times, and index maintenance times. Indexes must be kept up to date whenever data changes. The performance of inserts, updates, and deletes is impacted by nonclustered indexes. Have you ever heard of a SELECT query that runs more slowly because there are too many indexes on a table? I have seen it happen. When the optimizer comes up with a plan for a query, it must consider the available indexes, three types of joining, order of joins, etc. The number of plan choices increases exponentially. The optimizer won’t take long to come up with a plan, however, and will sometimes stop with a “good enough plan”. It’s possible that the optimizer didn’t have enough time to figure out the best index because there were too many to consider.

To avoid creating unnecessary indexes, always take a look at the existing indexes on the table when you think you should add a new one. Instead of creating a brand new index, maybe you can just add a key or included columns to an existing index. Following this practice will help keep the number of indexes from spiraling out of control. Another thing to watch out for is the cluster key. The cluster key is included in every nonclustered index. It’s there to locate rows in the clustered index, but you won’t see it in the nonclustered index definition. The index tools will often tell you to add the cluster key as included column, but that is not necessary.

By following all of the index recommendations from the tools without considering other indexes, you will end up with tables that resemble the following:

CREATE TABLE TestIndexes(Col1 INT, col2 VARCHAR(10), Col3 DATE, Col4 BIT);
CREATE INDEX ix_TestIndexes_Col1 ON dbo.TestIndexes (col1);
CREATE INDEX ix_TestIndexes_Col1_Col2 ON dbo.TestIndexes (col1, col2);
CREATE INDEX ix_TestIndexes_Col1_Col2_Col3 ON dbo.TestIndexes (col1, col2, Col3);
CREATE INDEX ix_TestIndexes_Col2 ON dbo.TestIndexes (col2);
CREATE INDEX ix_TestIndexes_Col1_includes1 ON dbo.TestIndexes (col1) INCLUDE(Col4);
CREATE INDEX ix_TestIndexes_Col1_includes2 ON dbo.TestIndexes (col1) INCLUDE(Col2);

You may think that this is a contrived example, but I see this pattern all the time. How do you find the overlapping indexes that need to be cleaned up? There are many scripts available, but here is a simple script that just looks at the first two index keys for duplicates:

WITH IndexColumns AS (
SELECT '[' + s.Name + '].[' + T.Name + ']' AS TableName,
i.name AS IndexName,  C.name AS ColumnName, i.index_id,ic.index_column_id,
FROM sys.schemas AS s
JOIN sys.tables AS t ON t.schema_id = s.schema_id
JOIN sys.index_columns AS IC ON  IC.OBJECT_ID = I.OBJECT_ID
AND IC.index_id = I.index_id
AND C.column_id = IC.column_id
WHERE IC.is_included_column = 0
SELECT DISTINCT a.TableName, a.IndexName AS Index1, b.IndexName AS Index2
FROM IndexColumns AS a
JOIN IndexColumns AS b ON b.TableName = a.TableName
AND b.IndexName <> a.IndexName
AND b.index_column_id = a.index_column_id
AND  b.ColumnName = a.ColumnName
AND a.index_column_id < 3
AND a.index_id < b.index_id
AND a.ColCount <= B.ColCount
ORDER BY a.TableName, a.IndexName;

Once you find indexes that are subsets or even exact duplicates of other indexes, you should manually review each match and figure out what can be deleted or consolidated. Just be sure to test your changes before implementing them in production. After reviewing the information, I came up with the following changes:

DROP INDEX ix_TestIndexes_Col1 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_Col2 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_Col2_Col3 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_includes1 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_includes2 ON TestIndexes;
CREATE INDEX ix_TestIndexes_Col1_Col2_Col3_includes ON TestIndexes
(Col1, Col2, Col3) INCLUDE (Col4);

Now there are two indexes, none of them overlap, and I haven’t lost anything that was in place before.

I always say that index tuning is both a science and an art. Now when you use the science of missing index information you will know how to apply the art of avoiding duplicates.

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)


In earlier blogs on “Fundamentals of Columnstore Index”, I took a simple approach on taking a tour of how to create a simple Columnstore index. Having said that, SQL Server 2014 supports columnstore and columnstore archival compression for columnstore tables and indexes. Columnstore tables and indexes are always stored with columnstore compression. Most data warehouses have some data that is frequently accessed and some that is accessed more infrequently.

You might be wondering the use case scenario for these commands. For example, in reality the data may be partitioned by date and the most recent data is accessed much more frequently than older data. In such cases the older data can benefit from additional compression at the cost of slower query performance. I talk to a lot of customers about this special feature:

  • To enable this scenario SQL Server added support for archival compression of SQL Server column stores from SQL Server 2014.
  • The archival compression option is enabled on a per-table or partition (object) basis; a column store can contain objects with archival compression applied and objects without. To allow for easy extensibility of existing on-disk structures, archival compression is implemented as an extra stream compression layer on top of Columnstore index implementation.
  • The Columnstore compression implementation transparently compresses the bytes being written to disk during the column store serialization process and transparently decompresses them during the deserialization process. Stream decompression is always applied when data is read from disk. Data is not cached in memory with stream compression.

The further reduction obtained by archival compression is substantial, ranging anywhere from 25% to 60%+ depending on the data. These are rough numbers based on what I have seen at customer interactions in the past.

  • Use COLUMNSTORE_ARCHIVE data compression to compress columnstore data with archival compression.

Let us look at how this can be achieved using TSQL commands.

To Apply Compression:

The simple TSQL commands to play with ColumnStore and columnStore Archive bits in single ALTER script:

-- Adding Archive bit for single Partition
ALTER TABLE tbl_myColumnStore
-- Adding Archive bit for all Partition
ALTER TABLE tbl_myColumnStore
-- Adding Columnstore for all parition and Archive for few Partition
ALTER TABLE tbl_myColumnStore

Use COLUMNSTORE data compression to decompress archival compression. This resulting data will continue to be compressed with columnstore compression.

To Remove the Archive Compression

Similar to above commands, we can also use to remove our archival compression on specific partitions or on objects. A representative command looks like:

-- Turning Archive bit off to Columnstore for all Partitions
ALTER TABLE tbl_myColumnStore

As I wrap up this blog, I would like to know how many of you out there are using the archival columnstore in your environments. How much compression are you getting and what is the space savings because of this? I would love to hear from your experience as we learn together.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What is Hypothetical Indexes?

If you ever thought this is some sort of trick to bring you to the blog, well you are wrong. This infact is something worth a look and interesting thing to know. Before I start to explain you the finer details, let me tell you that this is NOT a new feature for performance improvement of the SQL Server Engine.

During profiling one of the databases, one of my DBA friends asked a question about why there is a difference in the indexes shown in SQL Server Management Studio vs. T-SQL query. Though it didn’t make sense to me, I asked for details and wanted him to get me the details. This leads to the learning which I thought was worth a share. Here is what I mean:

Trust me, there is absolutely no Photoshop trick in the above image. Sys.Indexes catalog view shows 7 entries for a table but object explorer shown only one index on the table which is the ONLY object inside the database.

Before we talk about the reason, you can play around with me by creating sample table using the below script.

IF DB_ID('HypotheticalIndex') IS NOT NULL
USE HypotheticalIndex
[SSN]         [INT] IDENTITY(1, 1) NOT NULL,
[StudentName] [CHAR](200) NULL,
[GradDate]    [DATETIME] NULL
VALUES      (RAND() * 1000,
DATEADD(dd, RAND() * 100, GETDATE()))
GO 60000 -- should take around 50 seconds

Above script would create a database and populate rows. Let’s create an index on the table.

CREATE INDEX Idx_Alumni_GradDate
ON Alumni(GradDate)

To create hypothetical index on the same column, we can run the below command:

CREATE INDEX Alumni_hyp_1
ON Alumni(GradDate)

Here the keyword is undocumented extension WITH STATISTICS_ONLY which is available with CREATE INDEX command.
Let us next look at sys.indexes output:

SELECT name,
FROM   sys.indexes
type_desc <> 'HEAP'

Notice the highlighted column values for hypothetical index. Data_space_id is zero because there is no physical storage for this index. It is not stored in any filegroup or file.

Let’s look at the statistics.


We can see that rows are sampled and statistics object is generated for hypothetical index.

This means that the hypothetical index is an index that has the metadata (in sys.indexes with is_hypothetical = 1) and a statistics associated to it (in sys.stats), but does not have physical storage. This is used only for costing evaluation of query plan in conjunction with “DBCC AUTOPILOT” and “SET AUTOPILOT ON” commands. These settings are for a future discussion and blog, let us move along.

Since SQL Server Management Studio is filtering out the hypothetical indexes in Object Explorer – this is the reason my friend saw a difference in T-SQL and SSMS.

STATISTICS_ONLY option is undocumented but I was playing around with the option and found something interesting. Based on the value passed to this parameter the statistics, sampling would be changed.

CREATE INDEX Alumni_hyp_0
ON Alumni(GradDate)

If we pass a value as zero the statistics isnot generated.

If I pass the highest value (2^31-1) = 2147483647 then sampling is done with Full Scan.

CREATE INDEX [Alumni_hyp_2147483647]
ON Alumni(GradDate)
DBCC SHOW_STATISTICS (Alumni, 'Alumni_hyp_2147483647') WITH STAT_HEADER

To find a hypothetical index in the database, we can run below query.

FROM    sys.indexes
WHERE   is_hypothetical = 1

Generally the index name would be _dta_index because the DTA uses them behind the scene to evaluate the cost of an index by creating them hypothetically. If DTA exits gracefully than it does clean up these indexes. But if someone is losing patience while DTA is running and kills it using task manager, then those indexes would be left behind. It is safe to drop hypothetical indexes. They can be dropped using the normal DROP INDEX command. I must point out that if you apply recommendation provided by the DTA while tuning a query and don’t change the suggested name, they would have _dta_ in their name. The name doesn’t mean they are hypothetical – we need to use the is_hypothetical column to filter them.

How did I learn this? Someone asked how the Database Engine Tuning Advisor (a.k.a. DTA) works. How does it create and evaluate indexes on huge tables? So I captured profiler while running DTA and found many interesting facts and under the cover working of the tool. This blog is an idea after seeing profiler. Learning never stops if you are working with SQL Server!

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – GIDS – Two Technical Sessions Today – Resources and Whereabouts – Part 2

Today I will be presenting two sessions at Great Indian Developer Summit. Here is the details about where I will be during today.

If you are not going to attend this event, sign up for the newsletter over here: http://bit.ly/sqllearn. I will be sharing my code, demo, slides and all the relevant information in the newsletter.

MySQL Performance Tuning – Unexplored Territory
April 25, 2014 – Time:10:45-11:30

Performance is one of the most essential aspects of any application. Everyone wants their server to perform optimally and at the best efficiency. However, not many people talk about MySQL and Performance Tuning as it is an extremely unexplored territory. In this session, we will talk about how we can tune MySQL Performance. We will also try and cover other performance related tips and tricks. At the end of this session, attendees will not only have a clear idea, but also carry home action items regarding what to do when facing any of the above resource intensive scenarios. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. To master the art of performance tuning one has to understand the fundamentals of performance, tuning and the best practices associated with the same. You will also witness some impressive performance tuning demos in this session.

Hidden Secrets and Gems of SQL Server We Bet You Never Knew
April 25, 2014 – Time:15:00-17:45

It really amazes us every time when someone says SQL Server is an easy tool to handle and work with. Microsoft has done an amazing work in making working with complex relational database a breeze for developers and administrators alike. Though it looks like child’s play for some, the realities are far away from this notion. The basics and fundamentals though are simple and uniform across databases, the behavior and understanding the nuts and bolts of SQL Server is something we need to master over a period of time.

With a collective experience of more than 30+ years amongst the speakers on databases, we will try to take a unique tour of various aspects of SQL Server and bring to you life lessons learnt from working with SQL Server. We will share some of the trade secrets of performance, configuration, new features, tuning, behaviors, T-SQL practices, common pitfalls, productivity tips on tools and more.

This is a highly demo filled session for practical use if you are a SQL Server developer or an Administrator. The speakers will be able to stump you and give you answers on almost everything inside the Relational database called SQL Server.

Introduction to Microsoft Power BI ~ Vinod Kumar
April 25, 2014 – Time:14:05-14:50

Lazy Commit Like NoSQL with SQL Server ~ Balmukund Lakhani
April 25, 2014 – Time:14:05-14:50

If you are looking for downloading a script for finding missing Indexes and Redundant indexes. You can find them here.

During my session I will be asked three random questions, if you get that right, you will get the copy of my book.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – GIDS – Two Technical Sessions Today – Resources and Whereabouts

Today I will be presenting two sessions at Great Indian Developer Summit. Here is the details about where I will be during today.

If you are not going to attend this event, sign up for the newsletter over here: http://bit.ly/sqllearn. I will be sharing my code, demo, slides and all the relevant information in the newsletter.

Indexes, the Unsung Hero
April 22, 2014 – Time:13:35-14:35

Slow Running Queries are the most common problem that developers face while working with SQL Server. While it is easy to blame SQL Server for unsatisfactory performance, the issue often persists with the way queries have been written, and how Indexes has been set up. The session will focus on the ways of identifying problems that slow down SQL Server, and Indexing tricks to fix them. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. Indexes are the most crucial objects of the database. They are the first stop for any DBA and Developer when it is about performance tuning. There is a good side as well evil side to indexes. To master the art of performance tuning one has to understand the fundamentals of indexes and the best practices associated with the same. We will cover various aspects of Indexing such as Duplicate Index, Redundant Index, Missing Index as well as best practices around Indexes.

SQL Server Performance Troubleshooting: Ancient Problems and Modern Solutions
April 22, 2014 – Time:15:55-16:55

Many believe Performance Tuning and Troubleshooting is an art which has been lost in time. However, truth is that art has evolved with time and there are more tools and techniques to overcome ancient troublesome scenarios. There are three major resources that when bottlenecked creates performance problems: CPU, IO, and Memory. In this session we will focus on High CPU scenarios detection and their resolutions. If time permits we will cover other performance related tips and tricks. At the end of this session, attendees will have a clear idea as well as action items regarding what to do when facing any of the above resource intensive scenarios. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. To master the art of performance tuning one has to understand the fundamentals of performance, tuning and the best practices associated with the same. We will discuss about performance tuning in this session with the help of Demos.

Architecting SQL Server HA and DR Solutions on Windows Azure ~ Vinod Kumar
April 22, 2014 – Time:11:45-12:45

Architecting SQL Server HA and DR Solutions on Windows Azure  ~ Vinod Kumar
April 22, 2014 – Time:11:45-12:45

Understanding Windows Better Using SysInternals ~ Vinod Kumar
April 22, 2014 – Time:14:45-15:45

If you are looking for downloading a script for finding missing Indexes and Redundant indexes. You can find them here.

During my session I will be asked three random questions, if you get that right, you will get the copy of my book.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Index Fragmentation Next Steps – Notes from the Field #020

[Notes from Pinal]: Indexes are considered as a silver bullet to performance tuning and do amazing job when they are properly created. Just like any other engine they need maintenance and tuning. As we get more data indexes start to get fragmented and performance starts to degrade. I asked direct questions to my friend Tim about fragmentation and he gave me a very interesting answer.

Linchpin People are database coaches and wellness experts for a data driven world. In this 20th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) shares very interesting conversation with me. I asked:

Pinal: “Tim, what would you do if your indexes gets fragmented?”

When analyzing systems I often find that little maintenance is being performed. Although proper backups are my number one priority when checking out a system, index maintenance is also very important.  I have a script that I run that check for index fragmentation and very often I find extremely high fragmentation levels for most of the tables in the database.

Finding high fragmentation is very common and the fix is quite easy. I simply have to apply a maintenance script to clean up the fragmentation.  The path I take can vary depending on the size of the database and the edition of SQL being ran.  Rebuilding indexes on standard edition will have a vastly different impact, then an online index rebuild on enterprise edition.  I usually take a surgical approach on a large database with very high levels of fragmentation until I get the fragmentation under control. Once the indexes are at a more manageable level of fragmentation I allow the script to keep things in check.

There are numerous products on the market that can help and there are free products as well.  Microsoft even includes an option within Database Maintenance plans; however, I personally use Ola Hallengren’s http://ola.hallengren.com/scripts for Index maintenance and for updating statistics.  Ola also includes a comprehensive backup solution, but unless the client needs a complete overhaul I typically don’t make changes there.

The great thing about this solution is the ability to rebuild when needed and reorganize when needed. You get to choose which option based on the level of fragmentation.   This solution gives you a great deal of flexibility with how it is run.  When running index maintenance on a system that is highly fragmented for the first time, even with the online options, be careful to monitor your transaction log size.  You will also need to have extra space in your data file for index rebuilds as well.

Regardless of which solution you choose for your index maintenance, the important thing is that you have a proper solution in place to deal with fragmentation.  What causes fragmentation in the first place, good for starters inserts, updates, and deletes.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Redundant Indexes Good or Bad – Final Wrap – Part 7

This is the seventh part of a series regarding Redundant Indexes. If you have not read the earlier parts – there is quite a good chance that you will miss the context of this part. I  suggest  that you should quickly read the earlier four parts. On my production server I personally use Embarcadero DB Optimizer for all performance tuning and routine health check ups. I will be interested to know your feedback about the product.

Part 1: What are Redundant Indexes?
Conversation between Mike and Jon – where they discuss about the fundamentals of Redundant Indexes.

Part 2: Demo – What kind of Redundant Indexes are Useful with a smaller result set
Here Jon demonstrates a script where even though Redundant Indexes exist they are extremely useful and should not be dropped. The result set contains a very small result set.

Part 3: Demo – What kind of Redundant Indexes are Useful with a larger result set
Here Jon demonstrates a script where even though Redundant Indexes exist they are extremely useful and should not be dropped. The result set contains a very large result set.

Part 4: An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3
In this part Jon demonstrates a script where the direction or order clause of the index is different, making the index behave very differently. To quality any index redundant, the order of the index should also be considered.

Part 5:  An Interesting Case of Redundant Indexes – Index on Col1 and Included Columns Col2 and Col3
In this part Jon demonstrates how included columns also take part in the decision making  if the column is redundant and duplicate or not.

Part 6: An Interesting Case of Redundant Indexes – Index on Col1, Col2 and Index on Col1, Col2, Col3
In this part Jon and Mike discuss how Included Columns play  a role with Redundant Indexes. It is important to know if the index is redundant or not.

Once again if you have not read the earlier blog posts, there is a  good chance you will miss the core point of this blog post. I suggest you at least read the conversation between Mike and Jon to get familiar with their personalities.

Jon and Mike continue their conversation from an earlier blog post.

Mike – This is great, I learned a lot about duplicate indexes in the last few series of  questions and answers. I even tried out all the scripts which we have discussed so far. Here is one final question to you Jon. Are you ready?

Jon – Mike, I am always ready. What is your question?

Mike – Jon, this time question is very simple – Are redundant indexes are good or bad? Please answer this question with “Good” or “Bad.” I do not want your “It Depends” answers over here.

Jon – Well, actually It De…., hm… You want an answer in one word. So here is the answer in one word: Bad. Now I gave you my answer, can I elaborate a bit more?

Mike - Sure you can.

Jon – Well, if you are beginner in the field of performance tuning and have no idea what to do when you encounter redundant indexes or duplicate indexes, just go ahead and drop the duplicate indexes, as in  most of the cases, duplicate indexes are bad. They have several disadvantages but I would like to highlight two of the biggest disadvantages  here -

1) All the duplicate indexes are updated when data in one of the index changes and this wastes CPU, memory and IO resources.

2) If you have redundant indexes, when any query has to use the indexes they will have to use one of those indexes. It is not possible to use all the duplicate indexes. When SQL Server engine is going to use only one indexes at the end, why have them from the beginning?

Additionally, do not forget that duplicate indexes also confuse users.

Mike – Thanks Jon, here is the follow up question: Is there any scenario where an exact duplicate index with the exact order of the columns and exact order of included indexes can be helpful? What I mean is that if there are two indexes which are exactly same to each other – are they ever useful?

Jon – No.

Mike – I got it. Well, so now we know about Duplicate Indexes or Redundant indexes, besides this what are the other challenges related to Indexes?

Jon – The real challenge with Indexes is to know what kind of index is to be created which will be the most beneficial to your workload. It is easy to create an index for one query or two queries but the key is to create one or a few indexes which can generically improve the performance of most of the queries running in your server.

Mike – Oh, I see. You are correct, we need to come up with indexes which are most helpful to your workload and not just fewer queries. If we create one index per query, we will end up with hundreds of  indexes, which is a pretty bad idea. Jon, I think it is a pretty big challenge to come up with indexes for workload. I believe we need to automate this process. Do you know how we accomplish this easily?

Jon – I knew you would ask that. We have discussed this earlier. The database optimization tool Embarcadero DB Optimizer is your best friend.  It helps  users find the precise details related to index usage. It also indicates if Indexes are useful or useless. This tool is very easy to use and helps in various aspects of  performance tuning. This tool can help you with writing proper queries, do index maintenance and optimize the overall health of the server.

Mike – Thanks Jon!

Reference: Pinal Dave (http://blog.sqlauthority.com)