SQL SERVER – Fundamentals of Columnstore Index

There are two kind of storage in database. Row Store and Column Store. Row store does exactly as the name suggests – stores rows of data on a page – and column store stores all the data in a column on the same page. These columns are much easier to search – instead of a query searching all the data in an entire row whether the data is relevant or not, column store queries need only to search much lesser number of the columns. This means major increases in search speed and hard drive use. Additionally, the column store indexes are heavily compressed, which translates to even greater memory and faster searches. I am sure this looks very exciting and it does not mean that you convert every single index from row store to column store index. One has to understand the proper places where to use row store or column store indexes. Let us understand in this article what is the difference in Columnstore type of index.

Column store indexes are run by Microsoft’s VertiPaq technology. However, all you really need to know is that this method of storing data is columns on a single page is much faster and more efficient. Creating a column store index is very easy, and you don’t have to learn new syntax to create them. You just need to specify the keyword “COLUMNSTORE” and enter the data as you normally would. Keep in mind that once you add a column store to a table, though, you cannot delete, insert or update the data – it is READ ONLY. However, since column store will be mainly used for data warehousing, this should not be a big problem. You can always use partitioning to avoid rebuilding the index.

A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. The difference between column store and row store approaches is illustrated below:

In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. In case of column store indexes multiple pages will contain multiple single columns. This will lead only the columns needed to solve a query will be fetched from disk. Additionally there is good chance that there will be redundant data in a single column which will further help to compress the data, this will have positive effect on buffer hit rate as most of the data will be in memory and due to same it will not need to be retrieved.

Let us see small example of how columnstore index improves the performance of the query on a large table.

As a first step let us create databaseset which is large enough to show performance impact of columnstore index. The time taken to create sample database may vary on different computer based on the resources.

USE AdventureWorks
GO
-- Create New Table
CREATE TABLE [dbo].[MySalesOrderDetail](
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[LineTotal] [numeric](38, 6) NOT NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
ON [PRIMARY]
GO
-- Create clustered index
CREATE CLUSTERED INDEX [CL_MySalesOrderDetail] ON [dbo].[MySalesOrderDetail]
( [SalesOrderDetailID])
GO
-- Create Sample Data Table
-- WARNING: This Query may run upto 2-10 minutes based on your systems resources
INSERT INTO [dbo].[MySalesOrderDetail]
SELECT S1.*
FROM Sales.SalesOrderDetail S1
GO 100

Now let us do quick performance test. I have kept STATISTICS IO ON for measuring how much IO following queries take. In my test first I will run query which will use regular index. We will note the IO usage of the query. After that we will create columnstore index and will measure the IO of the same.

-- Performance Test
-- Comparing Regular Index with ColumnStore Index
USE AdventureWorks
GO
SET STATISTICS IO ON
GO
-- Select Table with regular Index
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice,
SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty
FROM [dbo].[MySalesOrderDetail]
GROUP BY ProductID
ORDER BY ProductID
GO
-- Table 'MySalesOrderDetail'. Scan count 1, logical reads 342261, physical reads 0, read-ahead reads 0.
-- Create ColumnStore Index
CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_MySalesOrderDetail_ColumnStore]
ON [MySalesOrderDetail]
(UnitPrice, OrderQty, ProductID)
GO
-- Select Table with Columnstore Index
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice,
SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty
FROM [dbo].[MySalesOrderDetail]
GROUP BY ProductID
ORDER BY ProductID
GO

It is very clear from the results that query is performance extremely fast after creating ColumnStore Index. The amount of the pages it has to read to run query is drastically reduced as the column which are needed in the query are stored in the same page and query does not have to go through every single page to read those columns.

If we enable execution plan and compare we can see that column store index performance way better than regular index in this case.

Let us clean up the database.

-- Cleanup
DROP INDEX [IX_MySalesOrderDetail_ColumnStore] ON [dbo].[MySalesOrderDetail]
GO
TRUNCATE TABLE dbo.MySalesOrderDetail
GO
DROP TABLE dbo.MySalesOrderDetail
GO

In future posts we will see cases where Columnstore index is not appropriate solution as well few other tricks and tips of the columnstore index.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

SQL SERVER – TRACEWRITE – Wait Type – Wait Related to Buffer and Resolution

Earlier this year I wrote for a whole month on SQL Server Wait Stats and the series was one of the best reviewed I have ever written. The same series has been enhanced and compiled into a book as SQL Server Wait Stats [Amazon] | [Flipkart] | [Kindle]. The best part of this book is it is an evolving book. I am planning to expand this book at certain intervals.

Yesterday I came across a very interesting system, where the top most wait type was TRACEWRITE. The DBA of the system reached out to me asking what this wait types means and how it can be resolved. As I had not written about this in the book so far, this is the blog post dedicated to his question. It will definitely be included in future versions of the book. For the moment let us go over it quickly and see what we can learn about TRACEWRITE.

Personally I consider TRACEWRITE a harmless and innocent wait type and I recommend that you ignore it too. However, the DBA who ran the script mentioned in my very first blog still found this wait type on the top as I did not include this wait type in the exclusion list.  I think it should be there and in future versions of the script I will include it.

From Book On-Line:

TRACEWRITE
Occurs when the SQL Trace rowset trace provider waits for either a free buffer or a buffer with events to process.

TRACEWRITE Explanation:

SQL Trace is a SQL Server database engine technology which monitors specific events generated when various actions occur in the database engine. When any event is fired it goes through various stages as well various routes. One of the routes is Trace I/O Provider, which sends data to its final destination either as file or rowset. This rowset provider does not provide any guarantees to data. It stores the data into its internal buffers. If the data from the internal buffer is not consumed quickly enough (20 seconds) the buffers start to drop the events to free itself up to handle future events. This is when it sends a special error messages to the profiler client. When more threads are waiting for free buffers the wait type TRACEWRITE is implemented. The higher this wait type, the higher the number of threads waiting for free buffer, degrading performance in most of the system.

Reducing TRACEWRITE wait:

It is not necessary to consider this wait type as bad or evil. In most systems it can be a perfectly normal wait type and you just need to ignore it. If you are convinced, you should stop reading this blog post here. However, if still want to reduce this wait type, here is something you can experiment with on your development server (never on production server).

Run the following query and see if it returns any value. This query will list all the trace running in your system.

SELECT *
FROM sys.fn_trace_getinfo(0)

In most of the systems I have come across I have noticed default trace enabled for the system. I personally like this to keep it on as it helps the DBA to diagnosis problems the first time they occur. Additionally, this helps Microsoft to diagnosis your system if you request their support. (One more reason to ignore this wait type and do not attempt to disable default trace). However, you can manually disable this trace by following script.

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'default trace enabled', 0;
GO
RECONFIGURE;
GO

Well, as this is not harmful wait type, I had not included it in my book or initial series. Anyway, now we have learned about this wait type so I will include it in future versions.

Read all the post in the Wait Types and Queue series.

Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading my book OnLine for further clarification. All the discussion of Wait Stats in this blog is generic and varies from system to system. It is recommended that you test this on a development server before implementing it to a production server.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – SafePeak Releases a Major Update: SafePeak version 2.1 for SQL Server Performance Acceleration

Couple of months ago I had the opportunity to share with my first look at SafePeak, a new and unique software solution for improving SQL Server performance and solving bottlenecks, accelerates the data access and cuts the CPU and IO of your SQL Server. SafePeak unique approach not just tells you about the problems but actually resolves them automatically and improves SQL Server performance and the performance of applications dramatically.

It looks like SafePeak team gave a very serious thinking how to make things even easier during the integration and fine tuning, thus improving the short term benefits, and also to provide a valuable information about the usage of the database by the application, an important long term benefit. Before I get into my key findings of the new features and improvements let me give a short reminder of SafePeak.

SafePeak in a brief:

Click to Enlarge

SafePeak is a dynamic caching proxy that resides between the application and the database. It is unique in the way it requires neither code changes nor database changes. The only change required is routing the application through SafePeak instead of the database (by changing connection string, a DNS record, or simply adding reroute line in the c:\windows\system32\drivers\etc\hosts file).

Here is a short video about SafePeak (thanks to ApplicationPerformance.com team for benchmark and the video):

SafePeak analyses the database schema and the SQL traffic, prepares sql patterns for decision making process and applies those sql patterns as caching rules. When data served from cache – the result is immediate (tests show 100-400 micro seconds per query – that is 0.000,100 of a second).

Concept of “dynamic caching” is actually very interesting and unique: SafePeak caches in RAM the results of queries and stored procedures, but makes sure that the data integrity is enforced real-time both in SafePeak and in SQL Server.

The benefits of such a solution are obvious: quick integration, immediate ROI, no complex projects and most importantly, minimal downtime for production systems.

To learn more about the concept and how SafePeak works see my previous post here.

New features in the SafePeak version 2.1

As this is my first semi-deep look at the new version, I’ll cover the top features that I found useful.

1. Improvement of the cache setup and fine-tuning.

The #1 obstacle in previous versions was the understanding what needs to be actually configured to improve cache efficiency. Most of the setup is automated, some applications still demand important fine-tuning process. You see, SafePeak takes data integrity as its primary concern and that can reduce even significantly he cache effectiveness at the beginning. Two factors to consider from the beginning: A) “Dynamic Objects”; B) Manual (or non-deterministic) patterns to be enabled for cache. Actually few of the comments of my previous review asked questions exactly about this.

The “dynamic objects” are usually stored procedures that your application has, that dynamically create SQL and then execute it (actually CLR objects and encrypted objects are also considered as “dynamic”). For SafePeak such procs are signed as “dynamic objects” and by default SafePeak takes the safest course of action to ensure data correctness: each execution of such proc ALL Cache is cleaned and SafePeak mode during the execution is “Disabled (DDL)” (visible on the dashboard screen).

Consider that you have 2-3 procedures with dynamic code that do not change any data and are executed every 5 seconds. This means that instead of caching them, SafePeak all instance cache is cleaned each 5 seconds. This doesn’t really give you a chance for improvement, right..?

While it was configurable in the previous versions, the new version makes the process of both identification of such events and procedures and the configuration much easier: Immediately on such execution you will see on the Dashboard the name of the procedure that needs to be configured:

Click to Enlarge

Clicking on the line will take to the Dynamic Objects configuration screen (Caching managementàInstanceàDatabaseàDynamic Object tab) where a filter will apply to show you only this procedure plus all of its dependant dynamic procedures. You will need to manually configure to all these procedures.

Configuration became easier too, eliminating the previous requirement running “sp_helptext” on your production SQL Server each time. The “Available Objects” popup (under “Object Dependencies” tab) shows us the text of the procedure. All we are left to do is identify the tables that are read from (selects, joints), tables that are updated (DMLs) and/or which other stored procedures it calls. Focus only the SQL code that is dynamic (the regular code has already been learned by SafePeak automatically).

Click to Enlarge

At the end configure the checkboxes at the “Object Information”: if object is deterministic (makes the activation of cache automatic), does it have transactions inside, DDL – does it changes schema, and should it clear all SafePeak cache (best not to do that J ).

The new version makes it simple to know what needs to be configured (from the dashboard) and the configuration is simpler and no longer requires opening in parallel your database server. This makes things also nice, as this can be done remotely.

2. Database templates

What happens if I have multiple similar structured database (like content databases in SharePoint)? Or I have several servers with same structured databases?

Well this is the next improvement by SafePeak: creation of Database Templates.

After you configure One database, you can save this a as a database template (in the Caching Management area). Then if you have additional similar database, SafePeak will recognize their pattern and apply that configuration to those databases too. You have also control for it under the very much enhanced “SettingsàInstance Configuration” section. You can also export these settings to a file and then import it on another SafePeak installation. A very nice feature.

Click to Enlarge

3. Controlling caching by databases was also enhanced. In addition to controlling sql patterns, at the “SettingsàInstance Configuration” section you can enable and disable cache per each database. Because SafePeak licensing model is based on cached databases this makes it very simple to control.

4. SafePeak extends to near real-time analytical abilities of the SQL Server usage

Click to Enlarge

SafePeak version 2.1 brings the DBA new abilities to see and analyze all the SQL queries that were executed during anytime. While previously we could see only cacheable “read” sql patterns, the new SafePeak allows investigation of all executed SQL patterns, including DMLs, DDLs, DCLs etc. The new SafePeak also makes it possible to check exactly what happened by choosing the specific timeframe. The new version creates analytical statistics every 15 minutes (reduced from 1 hour), which allows better investigation of suspicions events. SafePeak team also changed the columns list, added a “DB” column. This way you immediately see the queries response time as they run on SQL, performance from cache, and the change on the average performance new performance with SafePeak.

Click to Enlarge

Actually this is interesting, as just couple of weeks ago one of my customers had a sudden overload on the SQL server during evening hours. It is not really clear when it started, when it ended, what happened there and why – all remain a mystery up to day. SafePeak’s new analytical abilities could definitely unveil the mystery and help the DBAs to save their precious time and back to normal work.

Summary

I was impressed of SafePeak first version, but the new features and components make SafePeak a very advanced and comprehensive SQL Server performance, acceleration and analytical solution. In this post scanned a small part of the overall changes and improvement. I would very much like to hear your findings and suggestions.

SafePeak is a great acceleration solution for users who want immediate results for sites with performance, scalability and peak spikes challenges.

SafePeak can significantly increase response times, by reducing network round-trip to the database, decreasing CPU resource usage, eliminating I/O and storage access.

For all those SQL Server shops – definitely worth giving SafePeak a try. You can download a free, fully functional trial www.safepeak.com/download.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – CE – 3 Links to Performance Tuning Compact Edition

Today, I am going to do webcast online on how to improve performance for SQL CE. Here are three articles which I am going to base my session.

Database Design and Performance (SQL Server Compact Edition)

  • Use Database Denormalization
  • Decide Between Variable and Fixed-length Columns
  • Create Smaller Row Lengths
  • Use Smaller Key Lengths
  • Publication Article Types and Options

Query Performance Tuning (SQL Server Compact Edition)

  • Improve Indexes
  • Choose What to Index
  • Use the Query Optimizer
  • Understand Response Time vs. Total Time
  • Rewrite Subqueries to Use JOIN
  • Use Parameterized Queries
  • Query Only When You Must

Optimizing Connectivity (SQL Server Compact Edition)

  • Synchronization Time-Out Properties
  • Time-out Optimization

I found above articles very interesting and useful, and looking forward to your opinion on the same.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – TempDB in RAM for Performance

Performance Tuning is always the most interesting subject when we talk about software application. While I was recently discussing performance tuning with my friend, we started to talk about the best practices for TempDb. I also pointed my friend to the excellent blog post written by Cindy Gross on the subject: Compilation of SQL Server TempDB IO Best Practices. One of the discussion points was that we should put TempDB on the drive which is always giving better performance.

But my friend suddenly asked, “what if we can put TempDB on RAM, as RAM is the fasted drive?”

Good question! This was supported in earlier versions of the SQL Server (I think in 6.5). In later version of the SQL Server, the whole algorithm was written more efficiently and it is not that much of an absolute requirement. However, I still prefer putting TempDB on the drive which has lesser IO pressure.

Afterwards, he introduced to me a tool which can create drive from RAM. Well, that was an interesting thought. But then again, I will not go for this solution as it is not natively provided with SQL Server. For me, SQL Server Engine knows the right thing to do and how to maximize the usage of the RAM. Taking away RAM from OS and from other applications may not be a good idea. There are more optimization tricks that exist for TempDB than going for this option.

I would like to ask my readers who among you use this method in the production environment. What is your experience?

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – Win Surprise Gift at TechED 2011 Sessions – Wait Types and Extended Events

A quick note for all – If you are attending my TechEd sessions today here are few notes for you.

Session Time

Sessions Date: March 25, 2011

Understanding SQL Server Behavioral Pattern – SQL Server Extended Events
Date and Time: March 25, 2011 12:00 PM to 01:00 PM

SQL Server Waits and Queues – Your Gateway to Perf. Troubleshooting
Date and Time: March 25, 2011 04:15 PM to 05:15 PM

Surprise Gifts

If you are attending the session – rest assure – few of you are going to get very interesting surprise gift. A good quality one! To win – you just have to walk in the sessions.

Who Should Attend?

YOU! This session is for everybody who wants to learn something new in the area. It does not matter you work with SQL or not. If you are using application with database as back-end. This session is for you.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Tomorrow 2 Sessions on Performance Tuning at TechEd India 2011 – March 25, 2011

Tomorrow is the third day of the TechED India 2011 at Bangalore. I will be speaking on two very interesting sessions. If you are developer, database administrator or just want to learn something new and interesting, I suggest you attend my two sessions tomorrow. Here is the details of the session.

Sessions Date: March 25, 2011

Here is the abstract of the session:

Understanding SQL Server Behavioral Pattern – SQL Server Extended Events
Date and Time: March 25, 2011 12:00 PM to 01:00 PM

History repeats itself! SQL Server 2008 has introduced a very powerful, yet very minimal reoccurring feature called Extended Events. This advanced session will teach experienced administrators’ capabilities that were not possible before. From T-SQL error to CPU bottleneck, error login to deadlocks –Extended Event can detect it for you. Understanding the pattern of events can prevent future mistakes.

SQL Server Waits and Queues – Your Gateway to Perf. Troubleshooting
Date and Time: March 25, 2011 04:15 PM to 05:15 PM

Just like a horoscope, SQL Server Waits and Queues can reveal your past, explain your present and predict your future. SQL Server Performance Tuning uses the Waits and Queues as a proven method to identify the best opportunities to improve performance. A glance at Wait Types can tell where there is a bottleneck. Learn how to identify bottlenecks and potential resolutions in this fast paced, advanced performance tuning session.

Here is the video which my wife shot while I was preparing for the sessions.

Reference: Pinal Dave (http://blog.SQLAuthority.com)