SQL SERVER – 2016 – Creating Additional Indexes with Clustered ColumnStore Indexes

Every version of SQL Server brings new capabilities that enhance the features that were introduced in the previous versions. I am sure the team is working based on the feedbacks given by customers as they starting using the new capability. As I was scanning through some of the enhancements that reached this release, I thought some good work has gone behind the usage of ColumnStore Indexes.

So to dissect this enhancement, one of the core limitation before SQL Server 2016 for columnStore Indexes was we couldn’t set any additional indexes once a Clustered ColumnStore Index was present. This meant, we needed to drop any existing indexes before we created Clustered CloumnStore Index. With SQL Server 2016, I saw that we can create additional B-Tree indexes on top of Clustered ColumnStore Indexes.

As soon I saw this, I thought of creating a script to test the statement on my SQL Server 2016 instance. Let us see how the script flows below:

USE AdventureWorks2016
GO
-- Create New Table
CREATE TABLE [dbo].[MySalesOrderDetail](
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[LineTotal] [numeric](38, 6) NOT NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
ON [PRIMARY]
GO

Now that we have our base table, let us create the Clustered ColumnStore Index next and add some rows into the table.

-- Create clustered index
CREATE CLUSTERED COLUMNSTORE INDEX [CL_MySalesOrderDetail] ON [dbo].[MySalesOrderDetail]
GO
-- Create Sample Data Table
-- WARNING: This Query may run upto 2 minutes based on your systems resources
INSERT INTO [dbo].[MySalesOrderDetail]
SELECT S1.*
FROM Sales.SalesOrderDetail S1
GO

Now the moment of truth. Let us create an additional non-clustered Index on top of this table.

CREATE NONCLUSTERED INDEX [IX_MySalesOrderDetail_NormalBTree]
ON [MySalesOrderDetail]
(UnitPrice, OrderQty, ProductID)
GO

As you can see, this creates the index without any error and it is interesting to see this enhancement come through. I got curious and wanted to create an additional Non-Clustered ColumnStore Index as a test.

-- Error
CREATE NONCLUSTERED COLUMNSTORE INDEX [IX2_MySalesOrderDetail]
ON [MySalesOrderDetail]
(CarrierTrackingNumber, UnitPriceDiscount)
GO

As expected, this resulted in an error and it was pretty clear too. The error said:

Msg 35339, Level 16, State 1, Line 3
Multiple columnstore indexes are not supported.

With SQL Server 2016, we have got an interesting enhancement to mix and match Clustered ColumnStore Index with B-Tree indexes based on the workload application throws. As I wrap up, let us try to clean up the table structure we just created.
-- Clean up
DROP TABLE IF EXISTS [dbo].[MySalesOrderDetail]
GO

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Basics ColumnStore FAQ – Part II

columnstore SQL SERVER   Basics ColumnStore FAQ   Part III wrote the first part of FAQ in here. This is not a typical blog post but more of a process oriented topic when working with ColumnStore Indexes. With the advancements done in SQL Server 2016, I know that this is going to be even more critical to understand some of the nuances of working with ColumnStore indexes inside SQL Server. So what will be target in this post? Here are few questions I asked and I got back in return which form the flow for this post.

ColumnStore index resulted in a performance regression

On first thought when I heard this from one of the tester, I was baffled to see this was the case. How can this happen. I went the route of asking some fundamental questions to get to the root cause. Here are those for a quick reference and the reason for it:

  • Did the optimizer use the ColumnStore index in the first place?

A user can determine whether a ColumnStore index was used in the query plan by looking at the Storage property in SHOWPLAN. Read about batch mode and row mode here.

  • Did the query start to execute using the columnstore index with batch processing and then fall back to row-based processing?

A user can determine that the query started to execute in batch mode then fell back to row mode by looking at EstimatedExecutionMode and ActualExecutionMode in Showplan. In most cases, even in row mode the columnstore index will provide better performance than the rowstore. If performance regression due to the columnstore is suspected, a query hint to force use of a different index can be used to prevent the query optimizer from selecting the columnstore. We discussed about some of the query hints that can be used in the previous post.

  • Did you insert a FORCE ORDER hint on a join?

If a FORCE ORDER hint is used, optimizer will obey the FORCE ORDER hint. If the optimizer would have used the starjoin optimization and created bitmaps, the loss of the starjoin optimization bitmaps could cause performance to be worse than without the FORCE ORDER hint.

Question: In queries processed in row mode, some filters are evaluated in the storage engine. Can the columnstore do that too for optimization reasons?

Yes, very much. Some filters that are pushed down into the storage engine are:

  • comparisons <column, comparison, constant or parameter>
  • IN lists <constant1 or parameter1, constant2 or parameter2 … constantN or parameterN>
  • IS (NOT) NULL
  • BETWEEN
  • filters on strings are NOT pushed down

Question: In queries processed in row mode, partitions can (sometimes) be eliminated from the scan. Can the columnstore do that too?

Yes, and no. Partition elimination per se does not occur, however segments (actually row groups – i.e. the corresponding segments for each column) can be eliminated from the scan. Each nonempty partition has one or more segments for each column. Eliminating each segment in a partition is equivalent to partition elimination. Eliminating a subset of segments in a partition for which some values in the partition qualify can be better than partition elimination (because that partition would not be eliminated). Each segment has metadata that includes the min and max values in the segment. Similar min/max metadata associated with bitmaps and filters can be used to determine that none of the values in a segment will qualify without scanning the segment. If a segment is eliminated from the scan, the other segments in the same rowgroup are also eliminated. Column_store_segment_eliminate extended event can be used to see if segments are being eliminated.

Question: Does partitioning work the same way for the columnstore index as for row store indexes?

Mostly, yes. A nonclustered columnstore index can be built on a partitioned table. An empty partition can be split. Two empty partitions can be merged. An empty partition can be merged with a nonempty partition. A partition can be switched between partitioned tables if both tables have a columnstore index and all the other requirements for switching partitions are met. There are some restrictions associated with columnstore indexes:

  • If a table has a columnstore indexes, two non-empty partitions cannot be merged
  • A nonclustered columnstore index must always be partition-aligned with the base table. As a corollary, the partition function used to partition the nonclustered columnstore index and the base table must be identical, not just equivalent. The reason for this requirement is that otherwise the partition function on the base table could be changed, resulting in non-alignment.

As I conclude for this post, I will try to create supporting posts with scripts to show how the above restrictions apply and how one can figure these when they are implementing ColumnStore Indexes in future posts. Out of curiosity, I would like to know how many of you literally use the ColumnStore Indexes in your environments? What has been your experience using the same?

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – ColumnStore Frequently Asked Queries

This blog talks about troubleshooting scenarios in FAQ format when dealing with ColumnStore Indexes inside SQL Server. Use a combination of trace flags, query hints, performance counters, extended events discussed in various blogs to gather relevant data and troubleshoot the problem. Sometimes I have seen people use ColumnStore as a silver bullet to tackle performance problems in their upgraded SQL Server. The common question was: creating the ColumnStore index does not result in as much performance improvement as they expected. What went wrong? Where do we start? If you ask me, then the diagnostic efforts may include asking some of the following questions:

Did the optimizer use the ColumnStore index? If not, why not?

A user can determine whether a ColumnStore index was used in the query plan by looking at SHOWPLAN. A new property Storage can have one of two values, either rowstore or ColumnStore. Storage = ColumnStore indicates that the ColumnStore was used in the query plan.

The query optimizer makes a cost based decision regarding whether to use the ColumnStore as an access method for a given query. There are certain limitations on ColumnStore use to be aware of:

  • Use of the ColumnStore index can be forced (or avoided) by using a table hint (for example, FROM t WITH (INDEX(myColumnStoreindex)).
  • The ColumnStore index does not support SEEK. If the query is expected to return a small fraction of the rows, the optimizer is *unlikely* to choose the ColumnStore index.
  • If the table hint FORCESEEK is used, the optimizer will not consider the ColumnStore index.
  • If a hint to use the ColumnStore index is combined with FORCESEEK, the optimizer will not be able to generate a plan and the query will fail and return an error message.
  • If SET FORCEPLAN or the FORCE ORDER hint is used, the optimizer will honor the hint. If the optimizer would otherwise have used the starjoin optimization with the starjoin bitmaps, using the hints can cause performance to be worse than without the hints. Batch processing may be used, but is not guaranteed.
  • A query hint IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX will prevent use of any ColumnStore index on a per-query basis.
  • If TF 9453 is turned on, the query optimizer will not choose a ColumnStore index (and will not use batch mode processing) for any query.
  • If TF 9462 is turned on, the query optimizer will not choose a ColumnStore index unless a query hint explicitly specifies a ColumnStore index.
  • If TF 9453 and TF 9462 are both on, the query optimizer will not choose a ColumnStore index. If a query hint is added to specify a ColumnStore index, the query will fail with error message 8622, which states that the query processor could not produce a query plan because of the hints defined in the query.

columnstore SQL SERVER   ColumnStore Frequently Asked Queries

The next question to ask would be, was the query executed using batch processing?

A user can determine whether batch processing was used for a particular operator by looking at SHOWPLAN. EstimatedExecutionMode can have one of two values, row or batch.

EstimatedExecutionMode = batch indicates that batch processing was chosen in the query plan. ActualExecutionMode also can have one of two values, either row or batch. If EstimatedExecutionMode = batch and ActualExecutionMode = row, then at runtime the query could not be executed in batch mode, indicating that not enough memory was available for batch processing (or there were not enough threads available for parallel execution). The combination EstimatedExecutionMode = row and ActualExecutionMode = batch will never occur. For a complex query, some parts may execute in batch mode and some parts in row mode.

Next analysis would be, did the ColumnStore index provide good compression for my data?

The degree to which the data can be compressed is highly data-dependent. A column with a large number of unique strings cannot be compressed as much as a column with many duplicate string values. Similarly, numeric data that cover a large range of values cannot be as efficiently encoded as can numeric data that cover a more restricted range of values. For a given set of data, including fewer columns in the ColumnStore index will usually result in better compression of the columns in the index as well as resulting in a smaller index by virtue of having less data (fewer columns). However, it is generally recommended that all the columns in a table be included in the ColumnStore index to ensure that all queries will benefit from the use of the index.

The last question to ask here would be, did the query optimizer find a good plan?

The query plan is displayed in Showplan as usual. Use of the batch/ColumnStore plan for star join queries can be determined by looking at the access method (was a ColumnStore index used for the fact table?), the presence of one or more Batch Hash Table Build operators, and ActualExecutionMode = Batch.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #040 – Difference Between Unique Index vs Unique Constraint

Question: What is the difference between unique index and unique constraint?

Answer:

Add Unique Constraint
ALTER TABLE dbo.<tablename> ADD CONSTRAINT
<namingconventionconstraint> UNIQUE NONCLUSTERED
(
<
columnname>
)
ON [PRIMARY]

Add Unique Index
CREATE UNIQUE NONCLUSTERED INDEX
<namingconventionconstraint> ON dbo.<tablename>
(
<
columnname>
)
ON [PRIMARY]

There is no difference between Unique Index and Unique Constraint. Even though the syntax is different the effect is the same. Unique Constraint creates Unique Index to maintain the constraint to prevent duplicate keys. Unique Index or Primary Key Index is a physical structure that maintains uniqueness over some combination of columns across all rows of a table. It is a convenient way to enforce a Unique Constraint in SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #039 – What is Included Column Index in SQL Server 2005 and Onwards?

You may find it interesting that not many people still know what is Included Index in SQL Server. I have seen it personally around me and also have seen at

Question: What is Included Column Index?

Answer: 

In SQL Server 2005, the functionality of non-clustered indexes is extended by adding non-key columns to the leaf level of the non-clustered index. Non-key columns, can help to create cover indexes.By including non-key columns, you can create non-clustered indexes that cover more queries. The Database Engine does not consider non-key columns when calculating the number of index key columns or index key size. Non-key columns can be included in non-clustered index to avoid exceeding the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes. Another advantage is that using non-key column in the index we can have index data types not allowed as index key columns generally.

In following example column FileName is varchar(400), which will increase the size of the index key bigger than it is allowed. If we still want to include in our cover index to gain performance we can do it by using the Keyword INCLUDE.

USE AdventureWorks
GO
CREATE INDEX IX_Document_Title
ON Production.Document (Title, Revision)
INCLUDE (FileName)

Non-key columns can be included only in non-clustered indexes. Columns can’t be defined in both the key column and the INCLUDE list. Column names can’t be repeated in the INCLUDE list. Non-key columns can be dropped from a table only after the non-key index is dropped first. For Included Column Index to exist there must be at least one key column defined with a maximum of 16 key columns and 1023 included columns.

Avoid adding unnecessary columns. Adding too many index columns, key or non-key as they will affect negatively on performance. Fewer index rows will fit on a page. This could create I/O increases and reduced cache efficiency. More disk space will be required to store the index. Index maintenance may increase the time that it takes to perform modifications, inserts, updates, or deletes, to the underlying table or indexed view.

Another example to test:
Create following Index on Database AdventureWorks in SQL SERVER 2005

USE AdventureWorks
GO
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID)
GO

Test the performance of following query before and after creating Index. The performance improvement is significant.

SELECT AddressLine1, AddressLine2, City, StateProvinceID, PostalCode
FROM Person.Address
WHERE PostalCode BETWEEN '98000'
AND '99999';
GO

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #021 – Difference Between Index Seek and Index Scan (Table Scan)

Question: What is the difference between Index Seek and Index Scan?

Answer: Index Scan retrieves all the rows from the table. Index Seek retrieves selective rows from the table.

Index Scan:
Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate.

Index Seek:
Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

Index Scan is nothing but scanning on the data pages from the first page to the last page. If there is an index on a table, and if the query is touching a larger amount of data, which means the query is retrieving more than 50 percent or 90 percent of the data, and then optimizer would just scan all the data pages to retrieve the data rows. If there is no index, then you might see a Table Scan  in the execution plan.

Here are few other related articles on this subject which you may find useful:

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Mistake to Avoid: Duplicate and Overlapping Indexes – Notes from the Field #073

Kathi SQL SERVER   Mistake to Avoid: Duplicate and Overlapping Indexes   Notes from the Field #073[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about indexes and its impact. We often believe that indexes will improve the performance of the query, but it is not true always. There are cases when indexes can reduce the performance as well. Read the experience of  Kathi in her own words.


Having the correct indexes in place can improve the performance of your queries immensely. Knowing just what the correct indexes are, however, is not an easy task. SQL Server provides tools to help you create indexes based on a specific query or the workload that has run since the last restart of the instance. While these are handy tools, you must take the guidance they give with the time-honored grain of salt.

Having done my share of index tuning over the past few years, I have seen the consequences of blindly creating indexes based on the suggestions of the Database Engine Tuning Advisor, the missing index DMV, or the missing index recommendation found in the execution plan. It’s easy to copy a slowly running query into the Query window of SSMS and view the execution plan only to find that an index can be created to drastically improve the query performance. The problem is that this recommendation does not consider indexes that are already in place. By creating every index that these tools suggest, you will end up with too many indexes and many that overlap. At first you may not think this is a problem: more is better, right?

Having too many nonclustered indexes can cause numerous problems. First, unneeded indexes take up space. This impacts storage costs, backup and recovery times, and index maintenance times. Indexes must be kept up to date whenever data changes. The performance of inserts, updates, and deletes is impacted by nonclustered indexes. Have you ever heard of a SELECT query that runs more slowly because there are too many indexes on a table? I have seen it happen. When the optimizer comes up with a plan for a query, it must consider the available indexes, three types of joining, order of joins, etc. The number of plan choices increases exponentially. The optimizer won’t take long to come up with a plan, however, and will sometimes stop with a “good enough plan”. It’s possible that the optimizer didn’t have enough time to figure out the best index because there were too many to consider.

To avoid creating unnecessary indexes, always take a look at the existing indexes on the table when you think you should add a new one. Instead of creating a brand new index, maybe you can just add a key or included columns to an existing index. Following this practice will help keep the number of indexes from spiraling out of control. Another thing to watch out for is the cluster key. The cluster key is included in every nonclustered index. It’s there to locate rows in the clustered index, but you won’t see it in the nonclustered index definition. The index tools will often tell you to add the cluster key as included column, but that is not necessary.

By following all of the index recommendations from the tools without considering other indexes, you will end up with tables that resemble the following:

CREATE TABLE TestIndexes(Col1 INT, col2 VARCHAR(10), Col3 DATE, Col4 BIT);
CREATE INDEX ix_TestIndexes_Col1 ON dbo.TestIndexes (col1);
CREATE INDEX ix_TestIndexes_Col1_Col2 ON dbo.TestIndexes (col1, col2);
CREATE INDEX ix_TestIndexes_Col1_Col2_Col3 ON dbo.TestIndexes (col1, col2, Col3);
CREATE INDEX ix_TestIndexes_Col2 ON dbo.TestIndexes (col2);
CREATE INDEX ix_TestIndexes_Col1_includes1 ON dbo.TestIndexes (col1) INCLUDE(Col4);
CREATE INDEX ix_TestIndexes_Col1_includes2 ON dbo.TestIndexes (col1) INCLUDE(Col2);

You may think that this is a contrived example, but I see this pattern all the time. How do you find the overlapping indexes that need to be cleaned up? There are many scripts available, but here is a simple script that just looks at the first two index keys for duplicates:

WITH IndexColumns AS (
SELECT '[' + s.Name + '].[' + T.Name + ']' AS TableName,
i.name AS IndexName,  C.name AS ColumnName, i.index_id,ic.index_column_id,
COUNT(*) OVER(PARTITION BY t.OBJECT_ID, i.index_id) AS ColCount
FROM sys.schemas AS s
JOIN sys.tables AS t ON t.schema_id = s.schema_id
JOIN sys.indexes AS i ON I.OBJECT_ID = T.OBJECT_ID
JOIN sys.index_columns AS IC ON  IC.OBJECT_ID = I.OBJECT_ID
AND IC.index_id = I.index_id
JOIN sys.columns AS C ON  C.OBJECT_ID = IC.OBJECT_ID
AND C.column_id = IC.column_id
WHERE IC.is_included_column = 0
)
SELECT DISTINCT a.TableName, a.IndexName AS Index1, b.IndexName AS Index2
FROM IndexColumns AS a
JOIN IndexColumns AS b ON b.TableName = a.TableName
AND b.IndexName <> a.IndexName
AND b.index_column_id = a.index_column_id
AND  b.ColumnName = a.ColumnName
AND a.index_column_id < 3
AND a.index_id < b.index_id
AND a.ColCount <= B.ColCount
ORDER BY a.TableName, a.IndexName;

notd 73 SQL SERVER   Mistake to Avoid: Duplicate and Overlapping Indexes   Notes from the Field #073

Once you find indexes that are subsets or even exact duplicates of other indexes, you should manually review each match and figure out what can be deleted or consolidated. Just be sure to test your changes before implementing them in production. After reviewing the information, I came up with the following changes:

DROP INDEX ix_TestIndexes_Col1 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_Col2 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_Col2_Col3 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_includes1 ON TestIndexes;
DROP INDEX ix_TestIndexes_Col1_includes2 ON TestIndexes;
CREATE INDEX ix_TestIndexes_Col1_Col2_Col3_includes ON TestIndexes
(Col1, Col2, Col3) INCLUDE (Col4);

Now there are two indexes, none of them overlap, and I haven’t lost anything that was in place before.

I always say that index tuning is both a science and an art. Now when you use the science of missing index information you will know how to apply the art of avoiding duplicates.

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – COLUMNSTORE and COLUMNSTORE_ARCHIVE data compression

In earlier blogs on “Fundamentals of Columnstore Index”, I took a simple approach on taking a tour of how to create a simple Columnstore index. Having said that, SQL Server 2014 supports columnstore and columnstore archival compression for columnstore tables and indexes. Columnstore tables and indexes are always stored with columnstore compression. Most data warehouses have some data that is frequently accessed and some that is accessed more infrequently.

You might be wondering the use case scenario for these commands. For example, in reality the data may be partitioned by date and the most recent data is accessed much more frequently than older data. In such cases the older data can benefit from additional compression at the cost of slower query performance. I talk to a lot of customers about this special feature:

  • To enable this scenario SQL Server added support for archival compression of SQL Server column stores from SQL Server 2014.
  • The archival compression option is enabled on a per-table or partition (object) basis; a column store can contain objects with archival compression applied and objects without. To allow for easy extensibility of existing on-disk structures, archival compression is implemented as an extra stream compression layer on top of Columnstore index implementation.
  • The Columnstore compression implementation transparently compresses the bytes being written to disk during the column store serialization process and transparently decompresses them during the deserialization process. Stream decompression is always applied when data is read from disk. Data is not cached in memory with stream compression.

The further reduction obtained by archival compression is substantial, ranging anywhere from 25% to 60%+ depending on the data. These are rough numbers based on what I have seen at customer interactions in the past.

  • Use COLUMNSTORE_ARCHIVE data compression to compress columnstore data with archival compression.

Let us look at how this can be achieved using TSQL commands.

To Apply Compression:

The simple TSQL commands to play with ColumnStore and columnStore Archive bits in single ALTER script:

-- Adding Archive bit for single Partition
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= 1 WITH (DATA_COMPRESSION =  COLUMNSTORE_ARCHIVE) ;
-- Adding Archive bit for all Partition
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= ALL WITH (DATA_COMPRESSION =  COLUMNSTORE_ARCHIVE) ;
-- Adding Columnstore for all parition and Archive for few Partition
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= ALL WITH (DATA_COMPRESSION =  COLUMNSTORE_ARCHIVE ON PARTITIONS (1,3)) ;

Use COLUMNSTORE data compression to decompress archival compression. This resulting data will continue to be compressed with columnstore compression.

To Remove the Archive Compression

Similar to above commands, we can also use to remove our archival compression on specific partitions or on objects. A representative command looks like:

-- Turning Archive bit off to Columnstore for all Partitions
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= ALL WITH (DATA_COMPRESSION =  COLUMNSTORE) ;

As I wrap up this blog, I would like to know how many of you out there are using the archival columnstore in your environments. How much compression are you getting and what is the space savings because of this? I would love to hear from your experience as we learn together.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What is Hypothetical Indexes?

helpsign SQL SERVER   What is Hypothetical Indexes?If you ever thought this is some sort of trick to bring you to the blog, well you are wrong. This infact is something worth a look and interesting thing to know. Before I start to explain you the finer details, let me tell you that this is NOT a new feature for performance improvement of the SQL Server Engine.

During profiling one of the databases, one of my DBA friends asked a question about why there is a difference in the indexes shown in SQL Server Management Studio vs. T-SQL query. Though it didn’t make sense to me, I asked for details and wanted him to get me the details. This leads to the learning which I thought was worth a share. Here is what I mean:

Hypothetical 01 SQL SERVER   What is Hypothetical Indexes?

Trust me, there is absolutely no Photoshop trick in the above image. Sys.Indexes catalog view shows 7 entries for a table but object explorer shown only one index on the table which is the ONLY object inside the database.

Before we talk about the reason, you can play around with me by creating sample table using the below script.

USE MASTER
GO
IF DB_ID('HypotheticalIndex') IS NOT NULL
BEGIN
ALTER DATABASE
HypotheticalIndex SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE
HypotheticalIndex
END
CREATE DATABASE
HypotheticalIndex
GO
USE HypotheticalIndex
GO
CREATE TABLE [Alumni]
(
[SSN]         [INT] IDENTITY(1, 1) NOT NULL,
[StudentName] [CHAR](200) NULL,
[GradDate]    [DATETIME] NULL
)
GO
SET NOCOUNT ON
GO
INSERT INTO Alumni
(StudentName,
GradDate)
VALUES      (RAND() * 1000,
DATEADD(dd, RAND() * 100, GETDATE()))
GO 60000 -- should take around 50 seconds
SET NOCOUNT OFF
GO

Above script would create a database and populate rows. Let’s create an index on the table.

CREATE INDEX Idx_Alumni_GradDate
ON Alumni(GradDate)

To create hypothetical index on the same column, we can run the below command:

CREATE INDEX Alumni_hyp_1
ON Alumni(GradDate)
WITH STATISTICS_ONLY=1
GO

Here the keyword is undocumented extension WITH STATISTICS_ONLY which is available with CREATE INDEX command.
Let us next look at sys.indexes output:

SELECT name,
index_id,
type_desc,
data_space_id,
is_hypothetical
FROM   sys.indexes
WHERE  OBJECT_ID = OBJECT_ID('Alumni')
AND    
type_desc <> 'HEAP'

Hypothetical 02 SQL SERVER   What is Hypothetical Indexes?

Notice the highlighted column values for hypothetical index. Data_space_id is zero because there is no physical storage for this index. It is not stored in any filegroup or file.

Let’s look at the statistics.

DBCC SHOW_STATISTICS (Alumni, Alumni_hyp_1) WITH STAT_HEADER
GO

Hypothetical 03 SQL SERVER   What is Hypothetical Indexes?

We can see that rows are sampled and statistics object is generated for hypothetical index.

This means that the hypothetical index is an index that has the metadata (in sys.indexes with is_hypothetical = 1) and a statistics associated to it (in sys.stats), but does not have physical storage. This is used only for costing evaluation of query plan in conjunction with “DBCC AUTOPILOT” and “SET AUTOPILOT ON” commands. These settings are for a future discussion and blog, let us move along.

Since SQL Server Management Studio is filtering out the hypothetical indexes in Object Explorer – this is the reason my friend saw a difference in T-SQL and SSMS.

STATISTICS_ONLY option is undocumented but I was playing around with the option and found something interesting. Based on the value passed to this parameter the statistics, sampling would be changed.

Hypothetical 03 SQL SERVER   What is Hypothetical Indexes?

CREATE INDEX Alumni_hyp_0
ON Alumni(GradDate)
WITH STATISTICS_ONLY=0
GO
DBCC SHOW_STATISTICS (Alumni, Alumni_hyp_0) WITH STAT_HEADER
GO

If we pass a value as zero the statistics isnot generated.

Hypothetical 04 SQL SERVER   What is Hypothetical Indexes?

If I pass the highest value (2^31-1) = 2147483647 then sampling is done with Full Scan.

CREATE INDEX [Alumni_hyp_2147483647]
ON Alumni(GradDate)
WITH STATISTICS_ONLY=2147483647
GO
DBCC SHOW_STATISTICS (Alumni, 'Alumni_hyp_2147483647') WITH STAT_HEADER
GO

Hypothetical 05 SQL SERVER   What is Hypothetical Indexes?

To find a hypothetical index in the database, we can run below query.

SELECT  *
FROM    sys.indexes
WHERE   is_hypothetical = 1

Generally the index name would be _dta_index because the DTA uses them behind the scene to evaluate the cost of an index by creating them hypothetically. If DTA exits gracefully than it does clean up these indexes. But if someone is losing patience while DTA is running and kills it using task manager, then those indexes would be left behind. It is safe to drop hypothetical indexes. They can be dropped using the normal DROP INDEX command. I must point out that if you apply recommendation provided by the DTA while tuning a query and don’t change the suggested name, they would have _dta_ in their name. The name doesn’t mean they are hypothetical – we need to use the is_hypothetical column to filter them.

How did I learn this? Someone asked how the Database Engine Tuning Advisor (a.k.a. DTA) works. How does it create and evaluate indexes on huge tables? So I captured profiler while running DTA and found many interesting facts and under the cover working of the tool. This blog is an idea after seeing profiler. Learning never stops if you are working with SQL Server!

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – GIDS – Two Technical Sessions Today – Resources and Whereabouts – Part 2

Today I will be presenting two sessions at Great Indian Developer Summit. Here is the details about where I will be during today.

If you are not going to attend this event, sign up for the newsletter over here: http://bit.ly/sqllearn. I will be sharing my code, demo, slides and all the relevant information in the newsletter.

MySQL Performance Tuning – Unexplored Territory
April 25, 2014 – Time:10:45-11:30

Performance is one of the most essential aspects of any application. Everyone wants their server to perform optimally and at the best efficiency. However, not many people talk about MySQL and Performance Tuning as it is an extremely unexplored territory. In this session, we will talk about how we can tune MySQL Performance. We will also try and cover other performance related tips and tricks. At the end of this session, attendees will not only have a clear idea, but also carry home action items regarding what to do when facing any of the above resource intensive scenarios. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. To master the art of performance tuning one has to understand the fundamentals of performance, tuning and the best practices associated with the same. You will also witness some impressive performance tuning demos in this session.

Hidden Secrets and Gems of SQL Server We Bet You Never Knew
April 25, 2014 – Time:15:00-17:45

It really amazes us every time when someone says SQL Server is an easy tool to handle and work with. Microsoft has done an amazing work in making working with complex relational database a breeze for developers and administrators alike. Though it looks like child’s play for some, the realities are far away from this notion. The basics and fundamentals though are simple and uniform across databases, the behavior and understanding the nuts and bolts of SQL Server is something we need to master over a period of time.

With a collective experience of more than 30+ years amongst the speakers on databases, we will try to take a unique tour of various aspects of SQL Server and bring to you life lessons learnt from working with SQL Server. We will share some of the trade secrets of performance, configuration, new features, tuning, behaviors, T-SQL practices, common pitfalls, productivity tips on tools and more.

This is a highly demo filled session for practical use if you are a SQL Server developer or an Administrator. The speakers will be able to stump you and give you answers on almost everything inside the Relational database called SQL Server.

Introduction to Microsoft Power BI ~ Vinod Kumar
April 25, 2014 – Time:14:05-14:50

Lazy Commit Like NoSQL with SQL Server ~ Balmukund Lakhani
April 25, 2014 – Time:14:05-14:50

If you are looking for downloading a script for finding missing Indexes and Redundant indexes. You can find them here.

During my session I will be asked three random questions, if you get that right, you will get the copy of my book.

Reference: Pinal Dave (http://blog.SQLAuthority.com)