SQL SERVER – In-Memory Databases and Caching Wars

The In-Memory Database and In-Memory Caching market is heating up and we (the users) are the real winners. Exciting!!!

  • SAP, the giant application vendor, but a database newbie, is pushing to the market its SAP HANA, a standalone in-memory database (in an appliance box).
  • Last year IBM released a set of acceleration technologies, code-named BLU (a development code name that stood for “big data, lightning fast, ultra-easy”), adding in-memory capabilities to DB2 and better integration of solidDB caching engine with DB2.
  • This April (2014) Microsoft released its SQL Server 2014 with the new In-Memory OLTP engine embedded inside. Adding an In-Memory data engine right inside the most used database engine, the SQL Server, opens up many great possibilities for us, the engineers, building new applications and software solutions.
  • This month Oracle CEO Larry unveiled Oracle In-Memory database engine to be available next month. This comes as a direct fight to IBM, Microsoft and SAP in the burgeoning in-memory database market with technologies that can deliver dramatic performance boosts, high-scale online transactions or complex data processing.

The new in-memory database engines enable users to build new apps with a new scale of speed and processing capabilities. Hurrah!!

But the new in-memory database engines come with a price: Long list of data limitations, syntax limitations, use-case limitations, migration complications etc.

Another fast growing approach is adding a Smart Data Caching layer, where SafePeak (www.SafePeak.com) appears to be the visionary leader. SafePeak’s automated caching solution for SQL Server based applications enables easy deployment of a smart caching layer between the application and the database. SafePeak performs automated dynamic caching SQL queries and stored procedures results in-memory, covering immediately all queries of the application. Its key features that make it an interesting solution to explore when performance is discussed are:

1) Automation and ease of use;
2) No code changes;
3) Cache awareness for any transactions and data changes;
4) Wide range of caching options and cache eviction strategies.

The SQL Analytics engine embedded in SafePeak is an excellent bonus, working all the time and collecting information about every query. It’s a bit like the well-known SQL Profiler, but without the performance drop and with conversion of the raw not-really-readable data into advanced analytics.

When it comes to applicability to existing production applications, SafePeak’s strategy is to be easy to use for already-in-production operational read-write (but read-intensive) applications, and to be optimized for virtual and cloud environments. For in-production systems, this gives SafePeak an interesting advantage over the in-memory technologies of database giants.

 safepeakcache SQL SERVER   In Memory Databases and Caching Wars

Any way to look at – these in-memory developments are exciting for us
Comparing main in-memory options for the SQL Server applications, a question I think worth exploring is this:

When would SQL Server 2014 In-Memory OLTP fit your needs better than SafePeak’s Caching, and vice versa?

The answer depends of course on the user needs and the application-database state (Greenfield, Production, or Commercial off the shelf application).

SafePeak caching focuses on “READs”, especially REPEATITIVE READS, therefore it fits well for classic multi-user applications.

If your need is mostly about speeding up the WRITES or COMPLEX PROCESSING long-running queries, then having  a full table (or tables) In-Memory (as in SQL Server 2014 IN-Memory OLTP) sounds a better option.

However, the answer of applicability depending on the state of the application/database is very interesting too. I think it splits between 3 use cases:

1) In “greenfield” applications in the development cycle / process
The SQL Server 2014 In-Memory OLTP is a great option to explore. Having the knowledge how the 2014 In-Memory Tables and Stored Procedures works and their various limitations – gives the development managers tools to plan and developer the new application based on available features. In such cases several few important tables (and procs accessing them) can become in-memory. The rest will be usual.

2) In live production homegrown applications needing to speed-up performance without major redesign
SQL Server 2014 In-Memory OTLP may be a complex option to apply, due to its many current limitations of the solution and the requirement to redesign and reprogram parts of your application. And then there is the migration process… For this reason, SafePeak Cache may be a better choice.

3) In commercial 3rdparty / off-the-shelf applications
Unfortunately, SQL Server 2014 In-Memory OTLP requires new code and design changes, so in most cases, the user cannot apply the new in-memory feature. SafePeak works without any need to touch the application, so SafePeak wins here.

Here’s a quick comparison of advantages and disadvantages of Microsoft’s In-Memory OLTP option with SafePeak’s caching:

1) SQL 2014 works very well for in write intensive processes and large data processing. SafePeak main gain is on repetitive-reads (queries and procs).

2) SQL 2014 has your full table in memory. Every query will be fast, even the first time it is called.
SafePeak has a module called Cache Warm-up, enabling to “pre-fetch data into cache”.

3) SQL 2014 In-Memory OLTP is fully embedded inside the SQL Server, so no extra layer is needed.

SafePeak’s advantages:

1) SafePeak #1 advantage is time to deployment. No code changes, takes only a few hours to deploy and configure;

2) SafePeak immediately covers 100% of the application. Not only tables designed to be in-memory;

3) No SQL syntax limitations. All tables, all existing code, any SQL syntax, any stored procedure can be cached. Most automatically, rest with easy configuration;

Bottom-line, only a customer can say what technology will best fit his/her situation. But definitely the In-Memory Competition will continue to benefit the industry and mostly us – the DBA’s, programmers and application managers!

Interested to trial SQL Server 2014 In-Memory OTLP?

Download SQL Server 2014 from the Microsoft site.
And see my post “Introduction and Quick-start guide to SQL Server 2014 In-Memory OLTP

Interested to trial SafePeak?

SafePeak is available for free trial: www.SafePeak.com/Download
The guys there love helping users with free training and setup assistance, allowing users to get results quicker and better – and allowing the company to learn real customers’ needs and challenges (if you ask me, the right way to build a product and a company).

What In-Memory category fits you? Let us know. 

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Keeping MSDB System Database Lean and Fit – Notes from the Field #033

[Notes from Pinal]: I am working with SQL Server for almost 10 years. I have seen the most ignored databases on any database server is system database. We hardly pay attention to them. Just for the sake of it – try to run queries in your master database and see how many user databases already exists there. I am sure unless and until you have strict policies in the place, there will be a handful of the tables available there. Similarly try to ask DBA question about what is the exact importance of the MSDB database and you will find it that there are so many of the DBA without its knowledge. I often talk about this with my friend Tim. I asked him if he can help us out  with the tricks to keep MSDB system database at its optimal settings.

timradneysql SQL SERVER   Keeping MSDB System Database Lean and Fit   Notes from the Field #033Linchpin People are database coaches and wellness experts for a data driven world. In this 33rd episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple word how to keep MSDB system database at optimal settings. 

When doing a review of a client’s database server, one of the checks I perform is checking to see how much backup history is being kept within MSDB. SQL Server does a fantastic job logging all the backups and restores we perform and depending how your routine is configured you are probably only keeping enough backup files to meet your SLA’s. What most DBA’s or Admins don’t think about is all that backup and restore history being retained with MSDB.

The tables within msdb that hold this history include:

  • restorefile
  • restorefilegroup
  • restorehistory
  • backupfile
  • backupfilegroup
  • backupmediafamily
  • backupmediaset
  • backupset

Microsoft SQL Server has a system stored procedure that helps us maintain the history however it is not scheduled to run by default. We have to create a process to run the stored procedure on a routine basis. You will need to know how much history is enough to keep for your organization. I have seen this range from 90 days to 14 days.

The stored procedure is sp_delete_backupshistory within the MSDB database.

This stored procedure takes a single parameter of date. This date is the cutoff date of history to keep. Any history past the date provided will be deleted.

Over the years I have found MSDB ranging from several megabytes to nearly 20 GB.  I recall one client who had a database server with hundreds of databases that were being log shipped to a secondary with 15 minute log backup and the system had been in place for many years. MSDB had grown to almost 20 GB. The client had complained that if they ever tried to restore a database within the GUI that it would lock up SSMS.

A simple script to execute to purge this history older than 30 days is below.

USE msdb
@CutOffDate = CONVERT(VARCHAR(10), DATEADD(dd, -30,GETDATE()), 101)
EXEC sp_delete_backuphistory @CutOffDate

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SQL Server High Availability Options – Notes from the Field #032

[Notes from Pinal]: When it is about High Availability or Disaster Recovery, I often see people getting confused. There are so many options available that when the user has to select what is the most optimal solution for their organization they are often confused. Most of the people even know the salient features of various options, but when they have to figure out one single option to use they are often not sure which option to use. I like to give ask my dear friend time all these kinds of complicated questions. He has a skill to make a complex subject very simple and easy to understand.

timradneysql SQL SERVER   SQL Server High Availability Options   Notes from the Field #032Linchpin People are database coaches and wellness experts for a data driven world. In this 26th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple words the best High Availability Option for your SQL Server. 

Working with SQL Server a common challenge we are faced with is providing the maximum uptime possible.  To meet these demands we have to design a solution to provide High Availability (HA). Microsoft SQL Server depending on your edition provides you with several options.  This could be database mirroring, log shipping, failover clusters, availability groups or replication.

Each possible solution comes with pro’s and con’s.  Not anyone one solution fits all scenarios so understanding which solution meets which need is important.  As with anything IT related, you need to fully understand your requirements before trying to solution the problem.  When it comes to building an HA solution, you need to understand the risk your organization needs to mitigate the most.

I have found that most are concerned about hardware failure and OS failures. Other common concerns are data corruption or storage issues.  For data corruption or storage issues you can mitigate those concerns by having a second copy of the databases. That can be accomplished with database mirroring, log shipping, replication or availability groups with a secondary replica.  Failover clustering and virtualization with shared storage do not provide redundancy of the data.

I recently created a chart outlining some pros and cons of each of the technologies that I posted on my blog.

hadroptions SQL SERVER   SQL Server High Availability Options   Notes from the Field #032

I like to use this chart to help illustrate how each technology provides a certain number of benefits.  Each of these solutions carries with it some level of cost and complexity.  As a database professional we should all be familiar with these technologies so we can make the best possible choice for our organization.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Iridium I/O – SQL Server Deduplication that Shrinks Databases and Improves Performance

Database performance is a common problem for SQL Server DBA’s.  It seems like we spend more time on performance than just about anything else.  In many cases, we use scripts or tools that point out performance bottlenecks but we don’t have any way to fix them.  For example, what do you do when you need to speed up a query that is already tuned as well as possible?  Or what do you do when you aren’t allowed to make changes for a database supporting a purchased application?

Iridium I/O for SQL Server was originally built at Confio software (makers of Ignite) because DBA’s kept asking for a way to actually fix performance instead of just pointing out performance problems. The technology is certified by Microsoft and was so promising that it was spun out into a separate company that is now run by the Confio Founder/CEO and technology management team.

Iridium uses deduplication technology to both shrink the databases as well as boost IO performance.  It is intriguing to see it work.  It will deduplicate a live database as it is running transactions.  You can watch the database get smaller while user queries are running.

Iridium is a simple tool to use. After installing the software, you click an “Analyze” button which will spend a minute or two on each database and estimate both your storage and performance savings.  Next, you click an “Activate” button to turn on Iridium I/O for your selected databases.  You don’t need to reboot the operating system or restart the database during any part of the process.

As part of my test, I also wanted to see if there would be an impact on my databases when Iridium was removed.  The ‘revert’ process (bringing the files back to their SQL Server native format) was executed by a simple click of a button, and completed while the databases were available for normal processing.

I was impressed and enjoyed playing with the software and encourage all of you to try it out.  Here is the link to the website to download Iridium for free. .

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

Update: This blog post is written based on the SafePeak, which is available for free download.

Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications.

I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems.

To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”.

Basic Caching

Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache.  As you can tell, this approach has its limitations.

Dynamic Caching

Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of:

  • DML commands (update/insert/delete),
  • indirect modifications due to triggers on other tables,
  • executions of stored procedures with internal DML commands
  • complex cases of stored procedures with multiple levels of internal stored procedures logic.

When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands.

Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper.

SafePeak: A comprehensive and versatile caching platform

SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for  a wide range of business cases and applications.

  • Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures;
  • Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s  Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation;
  • Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs);
  • Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases;
  • Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching;
  • Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation);
  • Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting;
  • Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load);
  • Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight);
  • Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment;
  • Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure;
  • Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access;
  • Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data;
  • Cloud Ready: Available for instant deployment on Amazon Web Services (AWS).

As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started.

You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud.

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – Scalability on Amazon RDS: Scale out to multiple RDS instances

Today, I’d like to discuss getting better MySQL scalability on Amazon RDS.

The question of the day: “What can you do when a MySQL database needs to scale write-intensive workloads beyond the capabilities of the largest available machine on Amazon RDS?”

Let’s take a look.

In a typical EC2/RDS set-up, users connect to app servers from their mobile devices and tablets, computers, browsers, etc.  Then app servers connect to an RDS instance (web/cloud services) and in some cases they might leverage some read-only replicas.

 MySQLScaleAWS MySQL   Scalability on Amazon RDS: Scale out to multiple RDS instances

Figure 1. A typical RDS instance is a single-instance database, with read replicas.  This is not very good at handling high write-based throughput.

As your application becomes more popular you can expect an increasing number of users, more transactions, and more accumulated data.  User interactions can become more challenging as the application adds more sophisticated capabilities. The result of all this positive activity: your MySQL database will inevitably begin to experience scalability pressures.

What can you do?

Broadly speaking, there are four options available to improve MySQL scalability on RDS.

1. Larger RDS Instances – If you’re not already using the maximum available RDS instance, you can always scale up – to larger hardware.  Bigger CPUs, more compute power, more memory et cetera. But the largest available RDS instance is still limited.  And they get expensive.

“High-Memory Quadruple Extra Large DB Instance”:

  • 68 GB of memory
  • 26 ECUs (8 virtual cores with 3.25 ECUs each)
  • 64-bit platform
  • High I/O Capacity
  • Provisioned IOPS Optimized: 1000Mbps

2. Provisioned IOPs – You can get provisioned IOPs and higher throughput on the I/O level.

However, there is a hard limit with a maximum instance size and maximum number of provisioned IOPs you can buy from Amazon and you simply cannot scale beyond these hardware specifications.

3. Leverage Read Replicas – If your application permits, you can leverage read replicas to offload some reads from the master databases. But there are a limited number of replicas you can utilize and Amazon generally requires some modifications to your existing application.

And read-replicas don’t help with write-intensive applications.

4. Multiple Database Instances – Amazon offers a fourth option:

You can implement partitioning,thereby spreading your data across multiple database Instances” (Link)

However, Amazon does not offer any guidance or facilities to help you with this. “Multiple database instances” is not an RDS feature.  And Amazon doesn’t explain how to implement this idea.

In fact, when asked, this is the response on an Amazon forum:

Q: Is there any documents that describe the partition DB across multiple RDS?
I need to use DB with more 1TB but exist a limitation during the create process, but I read in the any FAQ that you need to partition database, but I don’t find any documents that describe it.

A: “DB partitioning/sharding is not an official feature of Amazon RDS or MySQL, but a technique to scale out database by using multiple database instances. The appropriate way to split data depends on the characteristics of the application or data set. Therefore, there is no concrete and specific guidance.”

So now what?

The answer is to scale out with ScaleBase.

Amazon RDS with ScaleBase: What you get – MySQL Scalability!

ScaleBase is specifically designed to scale out a single MySQL RDS instance into multiple MySQL instances.

Critically, this is accomplished with no changes to your application code.  Your application continues to “see” one database.   ScaleBase does all the work of managing and enforcing an optimized data distribution policy to create multiple MySQL instances.

With ScaleBase, data distribution, transactions, concurrency control, and two-phase commit are all 100% transparent and 100% ACID-compliant, so applications, services and tooling continue to interact with your distributed RDS as if it were a single MySQL instance.

The result: now you can cost-effectively leverage multiple MySQL RDS instance to scale out write-intensive workloads to an unlimited number of users, transactions, and data.

Amazon RDS with ScaleBase: What you keep – Everything!

And how does this change your Amazon environment?

1. Keep your application, unchanged – There is no change your application development life-cycle at all.  You still use your existing development tools, frameworks and libraries.  Application quality assurance and testing cycles stay the same. And, critically, you stay with an ACID-compliant MySQL environment.

2. Keep your RDS value-added services – The value-added services that you rely on are all still available. Amazon will continue to handle database maintenance and updates for you. You can still leverage High Availability via Multi A-Z.  And, if it benefits youra application throughput, you can still use read replicas.

3. Keep your RDS administration – Finally the RDS monitoring and provisioning tools you rely on still work as they did before.

With your one large MySQL instance, now split into multiple instances, you can actually use less expensive, smallersmaller available RDS hardware and continue to see better database performance.


Amazon RDS is a tremendous service, but it doesn’t offer solutions to scale beyond a single MySQL instance. Larger RDS instances get more expensive.  And when you max-out on the available hardware, you’re stuck.  Amazon recommends scaling out your single instance into multiple instances for transaction-intensive apps, but offers no services or guidance to help you. This is where ScaleBase comes in to save the day.

It gives you a simple and effective way to create multiple MySQL RDS instances, while removing all the complexities typically caused by “DIY” sharding andwith no changes to your applications .

With ScaleBase you continue to leverage the AWS/RDS ecosystem: commodity hardware and value added services like read replicas, multi A-Z, maintenance/updates and administration with monitoring tools and provisioning.


If you’re curious to try ScaleBase on Amazon, it can be found here – Download NOW.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3

This is the third part of the series Incremental Statistics. Here is the index of the complete series.

In earlier two parts we have seen what is incremental statistics and its simple example. In this blog post we will be discussing about DMV, which will list all the statistics which are enabled for Incremental Updates.

sys.columns.name AS ColumnName,
sys.stats.name AS StatisticsName
FROM   sys.stats
INNER JOIN sys.stats_columns ON sys.stats.OBJECT_ID = sys.stats_columns.OBJECT_ID
AND sys.stats.stats_id = sys.stats_columns.stats_id
INNER JOIN sys.columns ON sys.stats.OBJECT_ID = sys.columns.OBJECT_ID
AND sys.stats_columns.column_id = sys.columns.column_id
WHERE   sys.stats.is_incremental = 1

If you run above script in the example displayed, in part 1 and part 2 you will get resultset as following.

incrstats SQL SERVER   DMV to Identify Incremental Statistics   Performance improvements in SQL Server 2014   Part 3

When you execute the above script, it will list all the statistics in your database which are enabled for Incremental Update. The script is very simple and effective. If you have any further improved script, I request you to post in the comment section and I will post that on blog with due credit.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2

This is the second part of the series Incremental Statistics. Here is the index of the complete series.

In part 1 we have understood what is incremental statistics and now in this second part we will see a simple example of incremental statistics. This blog post is heavily inspired from my friend Balmukund’s must read blog post. If you have partitioned table and lots of data, this feature can be specifically very useful.


Here are two things you must know before you start with the demonstrations.

AdventureWorks – For the demonstration purpose I have installed AdventureWorks 2012 as an AdventureWorks 2014 in this demonstration.

Partitions – You should know how partition works with databases.

Setup Script

Here is the setup script for creating Partition Function, Scheme, and the Table. We will populate the table based on the SalesOrderDetails table from AdventureWorks.

-- Use Database
USE AdventureWorks2014
-- Create Partition Function
(44000, 54000, 64000, 74000)
-- Create Partition Scheme
-- Create Table Incremental_Statistics
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[ModifiedDate] [datetime] NOT NULL)
ON IncrStatSch(SalesOrderID)
-- Populate Table
INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate])
SELECT     [SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate]
FROM       [Sales].[SalesOrderDetail]
WHERE      SalesOrderID < 54000

Check Details

Now we will check details in the partition table IncrStatSch.

-- Check the partition
FROM sys.partitions

incrstat1 SQL SERVER   Simple Example of Incremental Statistics   Performance improvements in SQL Server 2014   Part 2

You will notice that only a few of the partition are filled up with data and remaining all the partitions are empty.

Now we will create statistics on the Table on the column SalesOrderID.

However, here we will keep adding one more keyword which is INCREMENTAL = ON. Please note this is the new keyword and feature added in SQL Server 2014. It did not exist in earlier versions.

-- Create Statistics
ON [IncrStatTab] (SalesOrderID)

Now we have successfully created statistics let us check the statistical histogram of the table.

incrstat3 SQL SERVER   Simple Example of Incremental Statistics   Performance improvements in SQL Server 2014   Part 2

Now let us once again populate the table with more data. This time the data are entered into a different partition than earlier populated partition.

-- Populate Table
INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate])
SELECT     [SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate]
FROM       [Sales].[SalesOrderDetail]
WHERE      SalesOrderID > 54000

Let us check the status of the partition once again with following script.

-- Check the partition
FROM sys.partitions

incrstat2 SQL SERVER   Simple Example of Incremental Statistics   Performance improvements in SQL Server 2014   Part 2

Statistics Update

Now here has the new feature come into action. Previously, if we have to update the statistics, we will have to FULLSCAN the entire table irrespective of which partition got the data.

However, in SQL Server 2014 we can just specify which partition we want to update in terms of Statistics. Here is the script for the same.

-- Update Statistics Manually

Now let us check the statistics once again.

-- Show Statistics

incrstat4 SQL SERVER   Simple Example of Incremental Statistics   Performance improvements in SQL Server 2014   Part 2

Upon examining statistics histogram, you will notice that now the distribution has changed and there is way more rows in the histogram.


The new feature of Incremental Statistics is indeed a boon for the scenario where there are partitions and statistics needs to be updated frequently on the partitions. In earlier version to update statistics one has to do FULLSCAN on the entire table which was wasting too many resources. With the new feature in SQL Server 2014, now only those partitions which are significantly changed can be specified in the script to update statistics.


You can clean up the database by executing following scripts.

-- Clean up
DROP TABLE [IncrStatTab]

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1

This is the first part of the series Incremental Statistics. Here is the index of the complete series.

Statistics are considered one of the most important aspects of SQL Server Performance Tuning. You might have often heard the phrase, with related to performance tuning.

“Update Statistics before you take any other steps to tune performance”.

Honestly, I have said above statement many times and many times, I have personally updated statistics before I start to do any performance tuning exercise. You may agree or disagree to the point, but there is no denial that Statistics play an extremely vital role in the performance tuning.

SQL Server 2014 has a new feature called Incremental Statistics. I have been playing with this feature for quite a while and I find that very interesting. After spending some time with this feature, I decided to write about this subject over here.

New in SQL Server 2014 – Incremental Statistics

bigstats SQL SERVER   What is Incremental Statistics?   Performance improvements in SQL Server 2014   Part 1Well, it seems like lots of people wants to start using SQL Server 2014’s new feature of Incremetnal Statistics. However, let us understand what actually this feature does and how it can help. I will try to simplify this feature first before I start working on the demo code.

Code for all versions of SQL Server

Here is the code which you can execute on all versions of SQL Server and it will update the statistics of your table. The keyword which you should pay attention is WITH FULLSCAN. It will scan the entire table and build brand new statistics for you which your SQL Server Performance Tuning engine can use for better estimation of your execution plan.


Who should learn about this? Why?

If you are using partitions in your database, you should consider about implementing this feature. Otherwise, this feature is pretty much not applicable to you. Well, if you are using single partition and your table data is in a single place, you still have to update your statistics the same way you have been doing.

If you are using multiple partitions, this may be a very useful feature for you. In most cases, users have multiple partitions because they have lots of data in their table. Each partition will have data which belongs to itself. Now it is very common that each partition are populated separately in SQL Server.

Real World Example

For example, if your table contains data which is related to sales, you will have plenty of entries in your table. It will be a good idea to divide the partition into multiple filegroups for example, you can divide this table into 3 semesters or 4 quarters or even 12 months. Let us assume that we have divided our table into 12 different partitions. Now for the month of January, our first partition will be populated and for the month of February our second partition will be populated. Now assume, that you have plenty of the data in your first and second partition. Now the month of March has just started and your third partition has started to populate. Due to some reason, if you want to update your statistics, what will you do?

In SQL Server 2012 and earlier version

You will just use the code of WITH FULLSCAN and update the entire table. That means even though you have only data in third partition you will still update the entire table. This will be VERY resource intensive process as you will be updating the statistics of the partition 1 and 2 where data has not changed at all.

In SQL Server 2014

You will just update the partition of Partition 3. There is a special syntax where you can now specify which partition you want to update now. The impact of this is that it is smartly merging the new data with old statistics and update the entire statistics without doing FULLSCAN of your entire table. This has a huge impact on performance.

Remember that the new feature in SQL Server 2014 does not change anything besides the capability to update a single partition. However, there is one feature which is indeed attractive. Previously, when table data were changed 20% at that time, statistics update were triggered. However, now the same threshold is applicable to a single partition. That means if your partition faces 20% data, change it will also trigger partition level statistics update which, when merged to your final statistics will give you better performance.

In summary

If you are not using a partition, this feature is not applicable to you.

If you are using a partition, this feature can be very helpful to you.


We will see working code of SQL Server 2014 Incremental Statistics.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Good Value for Page Life Expectancy – Notes from the Field #026

[Notes from Pinal]: In the past, I have been wrong many times, but I was man enough to accept my mistakes and correct myself. Page Life Expectancy is a very similar subject for me. In the past when I had written a blog post based on Microsoft’s white paper, I was corrected by SQL Experts immediately for my error in judgement and incorrect information. I accepted my mistakes and corrected it. I just shared this story with my good friend Tim Radney and he was very kind to give me guidance on this subject. I asked him if he can write further on this topic and help people understand this complex subject in simple words.

timradney SQL SERVER   Good Value for Page Life Expectancy   Notes from the Field #026Linchpin People are database coaches and wellness experts for a data driven world. In this 26th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple words what should be the good value for Page Life Expectancy (PLE).

When troubleshooting SQL Server performance issues one of the top places to look is memory.  One of the more common methods to check for memory pressure is to check to see the memory counter ‘Page Life Expectancy’ (PLE).  The value returned is represented in seconds.  A very old and dated recommendation is that this value be 300 or higher which is only 5 minutes.  With the standard servers in place today, this value is too low.

What is the proper value to monitor for your server, you may ask, well it really depends on the amount of memory allocated to your instance of SQL.  When working with my clients the value I start with is taking (max memory in GB) / 4 * 300.  This would be the minimum value I would expect to see.  On a server with 32GB allocated to SQL this value would be (32/4)*300 = 2400 seconds or 40 minutes.

For a server with 32 GB of ram, maintaining a PLE value of 2400 or higher is a good value, but what is equally important is to know what the baseline value is for the instance.  What is the average value the server has during certain times during the day?  If you capture this value and alarm/alert when you have big dips then you will be more equipped to get an early detection of a potential issue.  Things like improper indexed queries, large data pulls, etc.

I blogged awhile back and included the query that I run to retrieve the Page Life Expectancy.  http://timradney.com/2013/03/08/what-is-page-life-expectancy-ple-in-sql-server/

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)