SQL SERVER – Fundamentals of Columnstore Index

There are two kind of storage in database. Row Store and Column Store. Row store does exactly as the name suggests – stores rows of data on a page – and column store stores all the data in a column on the same page. These columns are much easier to search – instead of a query searching all the data in an entire row whether the data is relevant or not, column store queries need only to search much lesser number of the columns. This means major increases in search speed and hard drive use. Additionally, the column store indexes are heavily compressed, which translates to even greater memory and faster searches. I am sure this looks very exciting and it does not mean that you convert every single index from row store to column store index. One has to understand the proper places where to use row store or column store indexes. Let us understand in this article what is the difference in Columnstore type of index.

Column store indexes are run by Microsoft’s VertiPaq technology. However, all you really need to know is that this method of storing data is columns on a single page is much faster and more efficient. Creating a column store index is very easy, and you don’t have to learn new syntax to create them. You just need to specify the keyword “COLUMNSTORE” and enter the data as you normally would. Keep in mind that once you add a column store to a table, though, you cannot delete, insert or update the data – it is READ ONLY. However, since column store will be mainly used for data warehousing, this should not be a big problem. You can always use partitioning to avoid rebuilding the index.

A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. The difference between column store and row store approaches is illustrated below:

In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. In case of column store indexes multiple pages will contain multiple single columns. This will lead only the columns needed to solve a query will be fetched from disk. Additionally there is good chance that there will be redundant data in a single column which will further help to compress the data, this will have positive effect on buffer hit rate as most of the data will be in memory and due to same it will not need to be retrieved.

Let us see small example of how columnstore index improves the performance of the query on a large table.

As a first step let us create databaseset which is large enough to show performance impact of columnstore index. The time taken to create sample database may vary on different computer based on the resources.

USE AdventureWorks
GO
-- Create New Table
CREATE TABLE [dbo].[MySalesOrderDetail](
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[LineTotal] [numeric](38, 6) NOT NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
ON [PRIMARY]
GO
-- Create clustered index
CREATE CLUSTERED INDEX [CL_MySalesOrderDetail] ON [dbo].[MySalesOrderDetail]
( [SalesOrderDetailID])
GO
-- Create Sample Data Table
-- WARNING: This Query may run upto 2-10 minutes based on your systems resources
INSERT INTO [dbo].[MySalesOrderDetail]
SELECT S1.*
FROM Sales.SalesOrderDetail S1
GO 100

Now let us do quick performance test. I have kept STATISTICS IO ON for measuring how much IO following queries take. In my test first I will run query which will use regular index. We will note the IO usage of the query. After that we will create columnstore index and will measure the IO of the same.

-- Performance Test
-- Comparing Regular Index with ColumnStore Index
USE AdventureWorks
GO
SET STATISTICS IO ON
GO
-- Select Table with regular Index
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice,
SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty
FROM [dbo].[MySalesOrderDetail]
GROUP BY ProductID
ORDER BY ProductID
GO
-- Table 'MySalesOrderDetail'. Scan count 1, logical reads 342261, physical reads 0, read-ahead reads 0.
-- Create ColumnStore Index
CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_MySalesOrderDetail_ColumnStore]
ON [MySalesOrderDetail]
(UnitPrice, OrderQty, ProductID)
GO
-- Select Table with Columnstore Index
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice,
SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty
FROM [dbo].[MySalesOrderDetail]
GROUP BY ProductID
ORDER BY ProductID
GO

It is very clear from the results that query is performance extremely fast after creating ColumnStore Index. The amount of the pages it has to read to run query is drastically reduced as the column which are needed in the query are stored in the same page and query does not have to go through every single page to read those columns.

If we enable execution plan and compare we can see that column store index performance way better than regular index in this case.

Let us clean up the database.

-- Cleanup
DROP INDEX [IX_MySalesOrderDetail_ColumnStore] ON [dbo].[MySalesOrderDetail]
GO
TRUNCATE TABLE dbo.MySalesOrderDetail
GO
DROP TABLE dbo.MySalesOrderDetail
GO

In future posts we will see cases where Columnstore index is not appropriate solution as well few other tricks and tips of the columnstore index.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

36 thoughts on “SQL SERVER – Fundamentals of Columnstore Index

  1. Why not just create a normal index?

    CREATE NONCLUSTERED INDEX [IX_MySalesOrderDetail_ColumnStore]
    ON [MySalesOrderDetail]
    (ProductID) INCLUDE (UnitPrice, OrderQty)

    What are the benefits of an columnstore index over an simple index (in this example)?

  2. Hi pinal sir,
    Thank you for such a useful article.

    From this article it is very much clear that CloumnStore index are better solution for non clustered index. So can we avoid using non clustered and start using column store index?

    Thank$
    Chirag Satasiya

  3. In which version of SQL Server?
    I have never read before in the books about columnstore index?
    It is in Denali?

    Please advice, Virul

  4. Pingback: SQL SERVER – Updating Data in A Columnstore Index « Journey to SQLAuthority

  5. Pingback: SQL SERVER – Video – Performance Improvement in Columnstore Index « Journey to SQLAuthority

  6. Can i know if there s a view created on the base table which has a column store index, why does the execution plan not reflect “Batch” mode. When i query the table directly the execution mode is “Batch”. through the view it is “Row”.

  7. sir i have a problem that i created a index but when i select this index for display records then it shown that invalid object name .i ckecked all things as database name but i was understanding what is the reason.please give me the solution of this problem

  8. How is the data stored on the page in column store index.Is it physically stored or logically stored because Column Store index doesnt support Clustered index.

  9. Pingback: SQL SERVER – Identify Columnstore Index Usage from Execution Plan « SQL Server Journey with SQL Authority

  10. Pingback: SQL Server 2012: ColumnStore Characteristics | Vinod Kumar (Blog home)

  11. Pinal, tried your script during the SQL Server Launch Demo and its works perfectly. The simplicity of the script helps everyone to connect to it. Thanks a lot.

  12. Pingback: SQL SERVER – Finding Size of a Columnstore Index Using DMVs « SQL Server Journey with SQL Authority

  13. Nice Article Sir….thank you so much..
    I am using column store index on a table having large number of records.
    I just fired select query to fetch millions of record on two different tables on having normal index and another having column store index.Both query takes almost same time. When i see the execution plan of both the query nearly both are same…i am doing something wrong??

  14. Sir i am following the same process as explained by u above on a table having 200000 records.i am running the following query

    Set statistics IO On
    Select PortfolioID,StrategyID,SignalID,SecID,NoTrds=count(*),SUM(PL) from
    (
    SELECT [TradeID]
    ,[PortfolioID]
    ,[StrategyID]
    ,[SignalID]
    ,[SecID]
    ,PL=case when LongStatus=1 then (EntryPrice-ExitPrice)* EntryQty else (EntryPrice-ExitPrice)* EntryQty*-1 END

    FROM [PST_BANK]
    ) As A group by PortfolioID,StrategyID,SignalID,SecID

    It gives the following message

    Table ‘PST_BANK’. Scan count 18, logical reads 22580, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Now i create the columnstore index as follows:

    create nonclustered columnstore index[Neeaj_Index]
    ON [pst_Bank] (portfolioID,strategyId,SignalID,SecID)

    Set statistics IO On
    Select PortfolioID,StrategyID,SignalID,SecID,NoTrds=count(*),SUM(PL) from
    (
    SELECT [TradeID]
    ,[PortfolioID]
    ,[StrategyID]
    ,[SignalID]
    ,[SecID]
    ,PL=case when LongStatus=1 then (EntryPrice-ExitPrice)* EntryQty else (EntryPrice-ExitPrice)* EntryQty*-1 END

    FROM [PST_BANK] with (INDEX (Neeaj_Index))
    ) As A group by PortfolioID,StrategyID,SignalID,SecID

    it gives following message

    Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘PST_BANK’. Scan count 6, logical reads 4461739, physical reads 0, read-ahead reads 1515, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    The number of logical read increases….why??

  15. Pingback: SQL SERVER – ColumnStore Index – Batch Mode vs Row Mode « SQL Server Journey with SQL Authority

  16. Hi,
    I found some weird problem today with column store index table.

    Not able to truncate data from a table with column store index even after disabling the index. We can delete the data but not truncate. If you ever aware of this, can you please suggest any work around on it other than dropping and recreating column-store index.

  17. Pingback: SQL SERVER – Beginning New Weekly Series – Memory Lane – #002 « SQL Server Journey with SQL Authority

  18. I am using SQL Express 2012, but it is now allowing me to create column store index and fired a messgage :

    CREATE INDEX statement failed because a columnstore index cannot be created in this edition of SQL server.

    Pl. help me.

  19. Hello Mr. Dave,
    I took a script from web to demonstrate column store index. It creates a table and inserts test data.

    use adventureworks2012;
    go
    drop table sales2;
    create table sales2 (
    [id] int not null identity (1,1),
    [date] date not null,
    itemid smallint not null,
    price money not null,
    quantity numeric(18,4) not null)
    ;
    go
    create unique clustered index cdx_sales2_date_id on sales2 ([date], [id]);
    go
    set nocount on;
    go
    –insert sales2
    declare @i int = 0;
    declare @date datetime2;
    begin transaction;
    while @i 0
    order by t.name, au.type
    go
    set statistics io on;
    go
    select [date], sum(price*quantity) from sales2 with (index(cs_sales2_price)) where date = ’20110713′ group by [date]
    go
    set statistics io off;
    go

    ok so far… then I checked allocated Pages:

    select t.name, au.* from sys.system_internals_allocation_units au
    join sys.system_internals_partitions p
    join sys.tables t on p.object_id = t.object_id
    on p.partition_id = au.container_id
    where (
    p.object_id = object_id(‘sales2′))
    and total_pages >0
    order by t.name, au.type
    go

    result: 1002 pages for the columnstore index

    Then I executed

    set statistics io on;
    go
    select [date], sum(price*quantity) from sales2 with (index(cs_sales2_price)) where date = ’20110713′ group by [date]
    go
    set statistics io off;
    go

    Result:
    Table ‘sales2′. Scan count 1, logical reads 1355, physical reads 0, read-ahead reads 4706, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    WHAT?? 1355 logical reads? Why are there 353 reads more than allocated pages?

  20. Hello, I need help in understanding the concept..my ques is
    How does SQL Server redefine a row when compression occurs for a particular column ?
    Difference between creating column store index only on some of the columns and all the columns of a fact table ?

  21. Hi Pinal Sir,
    However, columnstore indexes do have some restrictions. The most limiting
    restriction at present is that they cannot be updated, and as such any updates
    applied to a table will fail. If you do try to perform an insert, update or delete,
    you will see the following error message, even if the query affects zero rows:
    INSERT statement failed because data cannot be updated in a table with
    a columnstore index.
    However, the remainder of the error message comes to our rescue:
    Consider disabling the columnstore index before issuing the INSERT
    statement, then re-building the columnstore index after INSERT is
    complete.
    Is it true?

  22. Dave, you have to reply to user queries. you cannot just do a blind code and post and not answer user queries..

  23. Hi Pinal,

    I am searching for some information regarding difference between column store database and column store index.

    Please share your ideas regarding this.Can we include sql server 2012 among column store databases.

    Thanks
    Hareesh

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s