SQL SERVER – ColumnStore Frequently Asked Queries

This blog talks about troubleshooting scenarios in FAQ format when dealing with ColumnStore Indexes inside SQL Server. Use a combination of trace flags, query hints, performance counters, extended events discussed in various blogs to gather relevant data and troubleshoot the problem. Sometimes I have seen people use ColumnStore as a silver bullet to tackle performance problems in their upgraded SQL Server. The common question was: creating the ColumnStore index does not result in as much performance improvement as they expected. What went wrong? Where do we start? If you ask me, then the diagnostic efforts may include asking some of the following questions:

Did the optimizer use the ColumnStore index? If not, why not?

A user can determine whether a ColumnStore index was used in the query plan by looking at SHOWPLAN. A new property Storage can have one of two values, either rowstore or ColumnStore. Storage = ColumnStore indicates that the ColumnStore was used in the query plan.

The query optimizer makes a cost based decision regarding whether to use the ColumnStore as an access method for a given query. There are certain limitations on ColumnStore use to be aware of:

  • Use of the ColumnStore index can be forced (or avoided) by using a table hint (for example, FROM t WITH (INDEX(myColumnStoreindex)).
  • The ColumnStore index does not support SEEK. If the query is expected to return a small fraction of the rows, the optimizer is *unlikely* to choose the ColumnStore index.
  • If the table hint FORCESEEK is used, the optimizer will not consider the ColumnStore index.
  • If a hint to use the ColumnStore index is combined with FORCESEEK, the optimizer will not be able to generate a plan and the query will fail and return an error message.
  • If SET FORCEPLAN or the FORCE ORDER hint is used, the optimizer will honor the hint. If the optimizer would otherwise have used the starjoin optimization with the starjoin bitmaps, using the hints can cause performance to be worse than without the hints. Batch processing may be used, but is not guaranteed.
  • A query hint IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX will prevent use of any ColumnStore index on a per-query basis.
  • If TF 9453 is turned on, the query optimizer will not choose a ColumnStore index (and will not use batch mode processing) for any query.
  • If TF 9462 is turned on, the query optimizer will not choose a ColumnStore index unless a query hint explicitly specifies a ColumnStore index.
  • If TF 9453 and TF 9462 are both on, the query optimizer will not choose a ColumnStore index. If a query hint is added to specify a ColumnStore index, the query will fail with error message 8622, which states that the query processor could not produce a query plan because of the hints defined in the query.

The next question to ask would be, was the query executed using batch processing?

A user can determine whether batch processing was used for a particular operator by looking at SHOWPLAN. EstimatedExecutionMode can have one of two values, row or batch.

EstimatedExecutionMode = batch indicates that batch processing was chosen in the query plan. ActualExecutionMode also can have one of two values, either row or batch. If EstimatedExecutionMode = batch and ActualExecutionMode = row, then at runtime the query could not be executed in batch mode, indicating that not enough memory was available for batch processing (or there were not enough threads available for parallel execution). The combination EstimatedExecutionMode = row and ActualExecutionMode = batch will never occur. For a complex query, some parts may execute in batch mode and some parts in row mode.

Next analysis would be, did the ColumnStore index provide good compression for my data?

The degree to which the data can be compressed is highly data-dependent. A column with a large number of unique strings cannot be compressed as much as a column with many duplicate string values. Similarly, numeric data that cover a large range of values cannot be as efficiently encoded as can numeric data that cover a more restricted range of values. For a given set of data, including fewer columns in the ColumnStore index will usually result in better compression of the columns in the index as well as resulting in a smaller index by virtue of having less data (fewer columns). However, it is generally recommended that all the columns in a table be included in the ColumnStore index to ensure that all queries will benefit from the use of the index.

The last question to ask here would be, did the query optimizer find a good plan?

The query plan is displayed in Showplan as usual. Use of the batch/ColumnStore plan for star join queries can be determined by looking at the access method (was a ColumnStore index used for the fact table?), the presence of one or more Batch Hash Table Build operators, and ActualExecutionMode = Batch.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Exit mobile version