SQL SERVER – Inserting into ColumnSet with SPARSE Columns

There are a number of blogs around using SPARSE columns here. Here are few of them for reference and can be used as a great learning experience.

SQL SERVER – 2008 – Introduction to SPARSE Columns

SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2

SQL SERVER – Errors and Limitations of Working with SPARSE Columns in SQL Server

Though such capabilities exist inside the product, I hardly see people using this feature effectively. Recently, one friend from a product based company called me to check if I had written on this topic. I explained that there are tons of documentation and it is great to see them use it. He was saying, they had a table which had close to 1000’s of column as they are using it like a generic table and wanted to know if there is an easy way to update or insert into this SPARSE table directly from ColumnSet?

The short answer is, YES. It is quite possible to insert using the ColumnSet column directly. I sent him an example and thought it would be of great use for folks here too.

Let us create our table for the experiment:

USE tempdb;
IF EXISTS (SELECT * FROM sys.objects WHERE name = 'DocStore_With_ColumnSet')
-- note the use of the SPARSE keyword and the XML column_set
CREATE TABLE DocStore_With_ColumnSet
ProductionSpecification VARCHAR(20) SPARSE NULL,
ProductionLocation smallint SPARSE NULL,
MarketingSurveyGroup VARCHAR(20) SPARSE NULL,
MarketingProgramID INT SPARSE NULL,

We can insert into different values columns as shown below:

INSERT DocStore_With_ColumnSet (DocID, Title, ProductionSpecification, ProductionLocation)
VALUES (1, 'Tire Specs 1', 'ABC321', 27)
INSERT DocStore_With_ColumnSet (DocID, Title, MarketingSurveyGroup)
VALUES (2, 'Survey 1234', 'Female 25 - 35')
-- Note that the Special_XML_Column column is correctly populated based on the columns that are populated
SELECT * FROM DocStore_With_ColumnSet

Can we add to the ColumnSet directly? Here is what we can do:

-- you can add data directly by specifying the xml
INSERT DocStore_With_ColumnSet (DocID, Title, Special_XML_Column)
VALUES (3, 'Specification 2', '<ProductionSpecification>AXZ7R242</ProductionSpecification><ProductionLocation>18</ProductionLocation>')

We can see the success of the insert statement. Let us next try to check if the values are inserted into our ColumnSet:

SELECT * FROM DocStore_With_ColumnSet

If this was not enough, we can also use this technique to update the columnset values too. A typical example for this is shown below:

-- now let us update some of the data
UPDATE DocStore_With_ColumnSet
SET Special_XML_Column = '<ProductionSpecification>ZZ456Z</ProductionSpecification><ProductionLocation>18</ProductionLocation>'
WHERE DocID = 3;

You can see again by querying that the values have been inserted successfully. This was a simple example to how we can use columnsets to insert data into a sparse table rather can doing a qualification of each columns. Let us next try to clean this table:

-- Cleanup
DROP TABLE DocStore_With_ColumnSet

As I conclude, would like to know how many of you have used SPARSE columns in your application and what were some of the scenarios you were able to lighten up using this feature?

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Basics ColumnStore FAQ – Part II

columnstore SQL SERVER   Basics ColumnStore FAQ   Part III wrote the first part of FAQ in here. This is not a typical blog post but more of a process oriented topic when working with ColumnStore Indexes. With the advancements done in SQL Server 2016, I know that this is going to be even more critical to understand some of the nuances of working with ColumnStore indexes inside SQL Server. So what will be target in this post? Here are few questions I asked and I got back in return which form the flow for this post.

ColumnStore index resulted in a performance regression

On first thought when I heard this from one of the tester, I was baffled to see this was the case. How can this happen. I went the route of asking some fundamental questions to get to the root cause. Here are those for a quick reference and the reason for it:

  • Did the optimizer use the ColumnStore index in the first place?

A user can determine whether a ColumnStore index was used in the query plan by looking at the Storage property in SHOWPLAN. Read about batch mode and row mode here.

  • Did the query start to execute using the columnstore index with batch processing and then fall back to row-based processing?

A user can determine that the query started to execute in batch mode then fell back to row mode by looking at EstimatedExecutionMode and ActualExecutionMode in Showplan. In most cases, even in row mode the columnstore index will provide better performance than the rowstore. If performance regression due to the columnstore is suspected, a query hint to force use of a different index can be used to prevent the query optimizer from selecting the columnstore. We discussed about some of the query hints that can be used in the previous post.

  • Did you insert a FORCE ORDER hint on a join?

If a FORCE ORDER hint is used, optimizer will obey the FORCE ORDER hint. If the optimizer would have used the starjoin optimization and created bitmaps, the loss of the starjoin optimization bitmaps could cause performance to be worse than without the FORCE ORDER hint.

Question: In queries processed in row mode, some filters are evaluated in the storage engine. Can the columnstore do that too for optimization reasons?

Yes, very much. Some filters that are pushed down into the storage engine are:

  • comparisons <column, comparison, constant or parameter>
  • IN lists <constant1 or parameter1, constant2 or parameter2 … constantN or parameterN>
  • filters on strings are NOT pushed down

Question: In queries processed in row mode, partitions can (sometimes) be eliminated from the scan. Can the columnstore do that too?

Yes, and no. Partition elimination per se does not occur, however segments (actually row groups – i.e. the corresponding segments for each column) can be eliminated from the scan. Each nonempty partition has one or more segments for each column. Eliminating each segment in a partition is equivalent to partition elimination. Eliminating a subset of segments in a partition for which some values in the partition qualify can be better than partition elimination (because that partition would not be eliminated). Each segment has metadata that includes the min and max values in the segment. Similar min/max metadata associated with bitmaps and filters can be used to determine that none of the values in a segment will qualify without scanning the segment. If a segment is eliminated from the scan, the other segments in the same rowgroup are also eliminated. Column_store_segment_eliminate extended event can be used to see if segments are being eliminated.

Question: Does partitioning work the same way for the columnstore index as for row store indexes?

Mostly, yes. A nonclustered columnstore index can be built on a partitioned table. An empty partition can be split. Two empty partitions can be merged. An empty partition can be merged with a nonempty partition. A partition can be switched between partitioned tables if both tables have a columnstore index and all the other requirements for switching partitions are met. There are some restrictions associated with columnstore indexes:

  • If a table has a columnstore indexes, two non-empty partitions cannot be merged
  • A nonclustered columnstore index must always be partition-aligned with the base table. As a corollary, the partition function used to partition the nonclustered columnstore index and the base table must be identical, not just equivalent. The reason for this requirement is that otherwise the partition function on the base table could be changed, resulting in non-alignment.

As I conclude for this post, I will try to create supporting posts with scripts to show how the above restrictions apply and how one can figure these when they are implementing ColumnStore Indexes in future posts. Out of curiosity, I would like to know how many of you literally use the ColumnStore Indexes in your environments? What has been your experience using the same?

Reference: Pinal Dave (http://blog.sqlauthority.com)


In earlier blogs on “Fundamentals of Columnstore Index”, I took a simple approach on taking a tour of how to create a simple Columnstore index. Having said that, SQL Server 2014 supports columnstore and columnstore archival compression for columnstore tables and indexes. Columnstore tables and indexes are always stored with columnstore compression. Most data warehouses have some data that is frequently accessed and some that is accessed more infrequently.

You might be wondering the use case scenario for these commands. For example, in reality the data may be partitioned by date and the most recent data is accessed much more frequently than older data. In such cases the older data can benefit from additional compression at the cost of slower query performance. I talk to a lot of customers about this special feature:

  • To enable this scenario SQL Server added support for archival compression of SQL Server column stores from SQL Server 2014.
  • The archival compression option is enabled on a per-table or partition (object) basis; a column store can contain objects with archival compression applied and objects without. To allow for easy extensibility of existing on-disk structures, archival compression is implemented as an extra stream compression layer on top of Columnstore index implementation.
  • The Columnstore compression implementation transparently compresses the bytes being written to disk during the column store serialization process and transparently decompresses them during the deserialization process. Stream decompression is always applied when data is read from disk. Data is not cached in memory with stream compression.

The further reduction obtained by archival compression is substantial, ranging anywhere from 25% to 60%+ depending on the data. These are rough numbers based on what I have seen at customer interactions in the past.

  • Use COLUMNSTORE_ARCHIVE data compression to compress columnstore data with archival compression.

Let us look at how this can be achieved using TSQL commands.

To Apply Compression:

The simple TSQL commands to play with ColumnStore and columnStore Archive bits in single ALTER script:

-- Adding Archive bit for single Partition
ALTER TABLE tbl_myColumnStore
-- Adding Archive bit for all Partition
ALTER TABLE tbl_myColumnStore
-- Adding Columnstore for all parition and Archive for few Partition
ALTER TABLE tbl_myColumnStore

Use COLUMNSTORE data compression to decompress archival compression. This resulting data will continue to be compressed with columnstore compression.

To Remove the Archive Compression

Similar to above commands, we can also use to remove our archival compression on specific partitions or on objects. A representative command looks like:

-- Turning Archive bit off to Columnstore for all Partitions
ALTER TABLE tbl_myColumnStore

As I wrap up this blog, I would like to know how many of you out there are using the archival columnstore in your environments. How much compression are you getting and what is the space savings because of this? I would love to hear from your experience as we learn together.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Finding Size of a Columnstore Index Using DMVs

Columnstore Index is one of my favorite enhancement in SQL Server 2012. A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. Whereas in case of column store indexes multiple pages will contain (multiple) single columns.  Columnstore Indexes are compressed by default and occupies much lesser space than regular row store index by default.

One of the very common question I often see is need of the list of columnstore index along with their size and corresponding table name. I quickly re-wrote a script using DMVs sys.indexes and sys.dm_db_partition_stats. This script gives the size of the columnstore index on disk only. I am sure there will be advanced script to retrieve details related to components associated with the columnstore index. However, I believe following script is sufficient to start getting an idea of columnstore index size. 

i.name IndexName,
SUM(s.used_page_count) / 128.0 IndexSizeinMB
FROM sys.indexes AS i
INNER JOIN sys.dm_db_partition_stats AS S
ON i.OBJECT_ID = S.OBJECT_ID AND I.index_id = S.index_id

Here is my introductory article written on SQL Server Fundamentals of Columnstore Index. Create a sample columnstore index based on the script described in the earlier article. It will give the following results.

clindexsize SQL SERVER   Finding Size of a Columnstore Index Using DMVs

Please feel free to suggest improvement to script so I can further modify it to accommodate enhancements.

Reference: Pinal Dave (http://blog.sqlauthority.com)