There are two kind of storage in database. Row Store and Column Store. Row store does exactly as the name suggests – stores rows of data on a page – and column store stores all the data in a column on the same page. These columns are much easier to search – instead of a query searching all the data in an entire row whether the data is relevant or not, column store queries need only to search much lesser number of the columns. This means major increases in search speed and hard drive use. Additionally, the column store indexes are heavily compressed, which translates to even greater memory and faster searches. I am sure this looks very exciting and it does not mean that you convert every single index from row store to columnstore index. One has to understand the proper places where to use row store or column store indexes. Let us understand in this article what is the difference in Columnstore type of index.
Column store indexes are run by Microsoft’s VertiPaq technology. However, all you really need to know is that this method of storing data is columns on a single page is much faster and more efficient. Creating a column store index is very easy, and you don’t have to learn a new syntax to create them. You just need to specify the keyword “COLUMNSTORE” and enter the data as you normally would. Keep in mind that once you add a column store to a table, though, you cannot delete, insert or update the data – it is READ ONLY. However, since column store will be mainly used for data warehousing, this should not be a big problem. You can always use partitioning to avoid rebuilding the index.
A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. The difference between column store and row store approaches is illustrated below:
In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. In case of column store indexes multiple pages will contain multiple single columns. This will lead only the columns needed to solve a query will be fetched from disk. Additionally there is good chance that there will be redundant data in a single column which will further help to compress the data, this will have positive effect on buffer hit rate as most of the data will be in memory and due to same it will not need to be retrieved.
Let us see small example of how columnstore index improves the performance of the query on a large table.
As a first step let us create databaseset which is large enough to show performance impact of columnstore index. The time taken to create sample database may vary on different computer based on the resources.
USE AdventureWorks GO -- Create New Table CREATE TABLE [dbo].[MySalesOrderDetail]( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [LineTotal] [numeric](38, 6) NOT NULL, [rowguid] [uniqueidentifier] NOT NULL, [ModifiedDate] [datetime] NOT NULL ) ON [PRIMARY] GO -- Create clustered index CREATE CLUSTERED INDEX [CL_MySalesOrderDetail] ON [dbo].[MySalesOrderDetail] ( [SalesOrderDetailID]) GO -- Create Sample Data Table -- WARNING: This Query may run upto 2-10 minutes based on your systems resources INSERT INTO [dbo].[MySalesOrderDetail] SELECT S1.* FROM Sales.SalesOrderDetail S1 GO 100
Now let us do a quick performance test. I have kept STATISTICS IO ON for measuring how much IO following queries take. In my test first I will run query which will use regular index. We will note the IO usage of the query. After that we will create columnstore index and will measure the IO of the same.
-- Performance Test -- Comparing Regular Index with ColumnStore Index USE AdventureWorks GO SET STATISTICS IO ON GO -- Select Table with regular Index SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM [dbo].[MySalesOrderDetail] GROUP BY ProductID ORDER BY ProductID GO -- Table 'MySalesOrderDetail'. Scan count 1, logical reads 342261, physical reads 0, read-ahead reads 0. -- Create ColumnStore Index CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_MySalesOrderDetail_ColumnStore] ON [MySalesOrderDetail] (UnitPrice, OrderQty, ProductID) GO -- Select Table with Columnstore Index SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM [dbo].[MySalesOrderDetail] GROUP BY ProductID ORDER BY ProductID GO
It is very clear from the results that query is performance extremely fast after creating ColumnStore Index. The amount of the pages it has to read to run query is drastically reduced as the column which are needed in the query are stored in the same page and query does not have to go through every single page to read those columns.
If we enable execution plan and compare we can see that column store index performance way better than regular index in this case.
Let us clean up the database.
-- Cleanup DROP INDEX [IX_MySalesOrderDetail_ColumnStore] ON [dbo].[MySalesOrderDetail] GO TRUNCATE TABLE dbo.MySalesOrderDetail GO DROP TABLE dbo.MySalesOrderDetail GO
In future posts we will see cases where Columnstore index is not an appropriate solution as well few other tricks and tips of the columnstore index.
Reference: Pinal Dave (https://blog.sqlauthority.com)
38 Comments. Leave new
I am using SQL Express 2012, but it is now allowing me to create column store index and fired a messgage :
CREATE INDEX statement failed because a columnstore index cannot be created in this edition of SQL server.
Pl. help me.
Hello Mr. Dave,
I took a script from web to demonstrate column store index. It creates a table and inserts test data.
use adventureworks2012;
go
drop table sales2;
create table sales2 (
[id] int not null identity (1,1),
[date] date not null,
itemid smallint not null,
price money not null,
quantity numeric(18,4) not null)
;
go
create unique clustered index cdx_sales2_date_id on sales2 ([date], [id]);
go
set nocount on;
go
–insert sales2
declare @i int = 0;
declare @date datetime2;
begin transaction;
while @i 0
order by t.name, au.type
go
set statistics io on;
go
select [date], sum(price*quantity) from sales2 with (index(cs_sales2_price)) where date = ‘20110713’ group by [date]
go
set statistics io off;
go
ok so far… then I checked allocated Pages:
select t.name, au.* from sys.system_internals_allocation_units au
join sys.system_internals_partitions p
join sys.tables t on p.object_id = t.object_id
on p.partition_id = au.container_id
where (
p.object_id = object_id(‘sales2’))
and total_pages >0
order by t.name, au.type
go
result: 1002 pages for the columnstore index
Then I executed
set statistics io on;
go
select [date], sum(price*quantity) from sales2 with (index(cs_sales2_price)) where date = ‘20110713’ group by [date]
go
set statistics io off;
go
Result:
Table ‘sales2’. Scan count 1, logical reads 1355, physical reads 0, read-ahead reads 4706, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
WHAT?? 1355 logical reads? Why are there 353 reads more than allocated pages?
Hello, I need help in understanding the concept..my ques is
How does SQL Server redefine a row when compression occurs for a particular column ?
Difference between creating column store index only on some of the columns and all the columns of a fact table ?
Hi Pinal Sir,
However, columnstore indexes do have some restrictions. The most limiting
restriction at present is that they cannot be updated, and as such any updates
applied to a table will fail. If you do try to perform an insert, update or delete,
you will see the following error message, even if the query affects zero rows:
INSERT statement failed because data cannot be updated in a table with
a columnstore index.
However, the remainder of the error message comes to our rescue:
Consider disabling the columnstore index before issuing the INSERT
statement, then re-building the columnstore index after INSERT is
complete.
Is it true?
Dave, you have to reply to user queries. you cannot just do a blind code and post and not answer user queries..
Hi Pinal,
I am searching for some information regarding difference between column store database and column store index.
Please share your ideas regarding this.Can we include sql server 2012 among column store databases.
Thanks
Hareesh
Than your so much for useful article.
Hi,
I found some weird problem today with column store index table.
Not able to truncate data from a table with column store index even after disabling the index. We can delete the data but not truncate. If you ever aware of this, can you please suggest any work around on it other than dropping and recreating column-store index.
thanks,
and column store stores all the data in a column on the same page – Is this correct statement, all the data in a column in same page, it should be stores column data together in page/pages I mean it may not be always possible to store all the column data in a single page.
This is good stuff Dave. However I have a dumb question regarding this. In case of row-based index if it has “multiple rows of the columns spanning across multiple pages”, then why do we have bookmark look up for row based indexes? the selectivity of the columns should matter, if we include the columns or not to the index, isn’t it so? I must be missing some thing for the lack of this clarity, for sure :-)
I meant to say the selectivity of the columns shouldn’t matter
Simply superb…thanku
Your all post are very helpful …
You said.. “This means major increases in search speed and hard drive use”
I think you meant to say “This means major increases in search speed and much less hard drive use”
It is my experience that columnstore indexes don’t do logical reads but actually do lob reads. If you look further down your output string from the statistics io command, you can see this. However, there is the possibility that this may have changed over versions.
Very helpful article. The example dose a great job. Thank you!