SQL SERVER – COLUMNSTORE and COLUMNSTORE_ARCHIVE data compression

In earlier blogs on “Fundamentals of Columnstore Index”, I took a simple approach on taking a tour of how to create a simple Columnstore index. Having said that, SQL Server 2014 supports columnstore and columnstore archival compression for columnstore tables and indexes. Columnstore tables and indexes are always stored with columnstore compression. Most data warehouses have some data that is frequently accessed and some that is accessed more infrequently.

You might be wondering the use case scenario for these commands. For example, in reality the data may be partitioned by date and the most recent data is accessed much more frequently than older data. In such cases the older data can benefit from additional compression at the cost of slower query performance. I talk to a lot of customers about this special feature:

  • To enable this scenario SQL Server added support for archival compression of SQL Server column stores from SQL Server 2014.
  • The archival compression option is enabled on a per-table or partition (object) basis; a column store can contain objects with archival compression applied and objects without. To allow for easy extensibility of existing on-disk structures, archival compression is implemented as an extra stream compression layer on top of Columnstore index implementation.
  • The Columnstore compression implementation transparently compresses the bytes being written to disk during the column store serialization process and transparently decompresses them during the deserialization process. Stream decompression is always applied when data is read from disk. Data is not cached in memory with stream compression.

The further reduction obtained by archival compression is substantial, ranging anywhere from 25% to 60%+ depending on the data. These are rough numbers based on what I have seen at customer interactions in the past.

  • Use COLUMNSTORE_ARCHIVE data compression to compress columnstore data with archival compression.

Let us look at how this can be achieved using TSQL commands.

To Apply Compression:

The simple TSQL commands to play with ColumnStore and columnStore Archive bits in single ALTER script:

-- Adding Archive bit for single Partition
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= 1 WITH (DATA_COMPRESSION =  COLUMNSTORE_ARCHIVE) ;
-- Adding Archive bit for all Partition
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= ALL WITH (DATA_COMPRESSION =  COLUMNSTORE_ARCHIVE) ;
-- Adding Columnstore for all parition and Archive for few Partition
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= ALL WITH (DATA_COMPRESSION =  COLUMNSTORE_ARCHIVE ON PARTITIONS (1,3)) ;

Use COLUMNSTORE data compression to decompress archival compression. This resulting data will continue to be compressed with columnstore compression.

To Remove the Archive Compression

Similar to above commands, we can also use to remove our archival compression on specific partitions or on objects. A representative command looks like:

-- Turning Archive bit off to Columnstore for all Partitions
ALTER TABLE tbl_myColumnStore
REBUILD PARTITION
= ALL WITH (DATA_COMPRESSION =  COLUMNSTORE) ;

As I wrap up this blog, I would like to know how many of you out there are using the archival columnstore in your environments. How much compression are you getting and what is the space savings because of this? I would love to hear from your experience as we learn together.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Finding Size of a Columnstore Index Using DMVs

Columnstore Index is one of my favorite enhancement in SQL Server 2012. A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. Whereas in case of column store indexes multiple pages will contain (multiple) single columns.  Columnstore Indexes are compressed by default and occupies much lesser space than regular row store index by default.

One of the very common question I often see is need of the list of columnstore index along with their size and corresponding table name. I quickly re-wrote a script using DMVs sys.indexes and sys.dm_db_partition_stats. This script gives the size of the columnstore index on disk only. I am sure there will be advanced script to retrieve details related to components associated with the columnstore index. However, I believe following script is sufficient to start getting an idea of columnstore index size. 

SELECT OBJECT_SCHEMA_NAME(i.OBJECT_ID) SchemaName,
OBJECT_NAME(i.OBJECT_ID ) TableName,
i.name IndexName,
SUM(s.used_page_count) / 128.0 IndexSizeinMB
FROM sys.indexes AS i
INNER JOIN sys.dm_db_partition_stats AS S
ON i.OBJECT_ID = S.OBJECT_ID AND I.index_id = S.index_id
WHERE  i.type_desc = 'NONCLUSTERED COLUMNSTORE'
GROUP BY i.OBJECT_ID, i.name

Here is my introductory article written on SQL Server Fundamentals of Columnstore Index. Create a sample columnstore index based on the script described in the earlier article. It will give the following results.

Please feel free to suggest improvement to script so I can further modify it to accommodate enhancements.

Reference: Pinal Dave (http://blog.sqlauthority.com)