SQL SERVER – Basic Statistics Maintenance – Notes from the Field #083

[Note from Pinal]: This is a 83rd episode of Notes from the Field series. Maintenance of the database is very critical activity and I have always seen DBA taking it very seriously. There is a only one problem – there is no single solution or guidance for how to maintain the database. Everybody has their own opinion and way to do different tasks. Statistics is one of the most important aspect of the database. The performance of entire application can depend on statistics, as it can help SQL Engine with intelligence to execute optimal plan.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about basic statistics maintenance.


SQL SERVER - Basic Statistics Maintenance - Notes from the Field #083 stuart

Statistic maintenance is an important (but often overlooked) aspect of performance tuning for SQL Server.  The query optimizer relies on distribution statistics to determine how the query will be executed, with a particular emphasis on SEEKS vs SCANS as well as estimates of effort needed (the cost threshold for parallelism).  Out-of date stats can impact performance significantly.

Luckily, the default setting for most databases covers most database performance scenarios; SQL Server offers three basic settings for statistics maintenance:

  • Auto Create Statistics – SQL Server will create statistics during an index creation script, or when deemed necessary to satisfy a query; enabled by default.
  • Auto Update Statistics – SQL Server will update statistics when it deems them to be outdated; enabled by default.
  • Auto Update Statistics Asynchronously – When enabled, SQL Server will updated statistics after the execution of a query if it determines that an update is necessary; if disabled, the statistics will be updated first, and then the query executed. This setting is disabled by default, and there’s some controversy over whether or not it’s useful.

SQL SERVER - Basic Statistics Maintenance - Notes from the Field #083 notes-83-1

For the most part, SQL Server does a good job of maintaining statistics if these options are left with the defaults; however, statistics can still become stale over time if the data is updated at a slow rate of change.  If your statistics are more than a year old in your database, it’s probably time to do a refresh.

But how do you tell when your stats are out of date? There’s a catalog view in SQL Server called sys.stats that will give you a lot of information about statistics, but it’s very detailed; data is collected down to the column level, which may be overwhelming if you have lots of databases.  I suggest starting at a higher level, by taking a look at how out of date statistics are across all of your databases.   I use the following query to help me quickly identify troublesome databases:

/*checks last updated stats date for all databases; calculates a percentage of stats updated within the last 24 hours.
useful to determine how out-of-date statistics are.  Also identifies if auto updatestates are on and if asynchronous updates
are enabled.
*/
CREATE TABLE #dbs
(
database_name VARCHAR(100)
,
oldest_stats DATETIME
, newest_stats DATETIME
, percent_update_within_last_day DECIMAL(5, 2)
,
is_auto_update_stats_on BIT
, is_auto_update_stats_async_on BIT
)
DECLARE @dynsql NVARCHAR(4000) = 'use ?
;
if db_id() >4
BEGIN
WITH    x AS ( SELECT   STATS_DATE(s.object_id, stats_id) date_updated
FROM     sys.stats s
JOIN sys.tables t ON t.object_id = s.object_id
WHERE    t.is_ms_shipped = 0
) ,
x1
AS ( SELECT   MIN(date_updated) AS oldest_stats
, MAX(date_updated) AS newest_stats
FROM     x
)
SELECT DB_NAME() database_name
, oldest_stats
, newest_stats
, SUM(CASE WHEN DATEDIFF(d, date_updated, newest_stats) <=1 THEN 100.00 ELSE 0.00 END)/COUNT(*) "percent_update_within_last_day"
, d.is_auto_update_stats_on
, d.is_auto_update_stats_async_on
FROM    x
CROSS JOIN x1
CROSS JOIN sys.databases d
WHERE d.database_id = db_id()
GROUP BY oldest_stats
, newest_stats
, d.is_auto_update_stats_on
, d.is_auto_update_stats_async_on
END
'
INSERT  INTO #dbs
( database_name
, oldest_stats
, newest_stats
, percent_update_within_last_day
, is_auto_update_stats_on
, is_auto_update_stats_async_on
)
EXEC sp_MSforeachdb @dynsql
SELECT  *
FROM    #dbs d
DROP TABLE #dbs

Results will look similar to the following:

SQL SERVER - Basic Statistics Maintenance - Notes from the Field #083 notes-83-2

Looking at the results, if I see that the oldest database stats are more than a year old, it’s s a pretty good indicator that statistics are not being maintained by some form of ongoing maintenance operation.  If the defaults are not being used, that’s also something that needs to be investigated.

The percent_update_within_last_day is also a good trigger for me to investigate a potential issue; if less than 20% of the statistics in a database were updated in the last 24 hours, the odds of a less-than-optimal execution plan increase significantly.  If the tables in the database are large, it may take several million rows of changed data to trigger a refresh.

To refresh outdated statistics, I recommend Ola Hallengren’s maintenance scripts; they offer a lot of flexibility for developing a customized index and statistics maintenance plan, including the ability to update statistics on a scheduled basis, and focus on updating statistics that need to be refreshed.  Basic maintenance (such as this) can help prevent performance problems from occurring.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Notes from the Field, SQL Server Security, SQL Statistics
Previous Post
SQL SERVER – Finding Out Identity Column Without Using Column Name
Next Post
SQL SERVER – FIX – Error 3702, Level 16, State 3 – Cannot Drop Database “DB_Name” Because it is Currently in Use

Related Posts

9 Comments. Leave new

  • Thanks, Stuart and Pinal (and greetings to you both).

    Stuart, I find that the query you offer fails because of the reference in the first query to is_temporary. that’s not a column in the select, which returns just date_updated. Is that really working for you?

    In case someone else may search this page for the error (if they get it to), it’s:

    Invalid column name ‘is_temporary’.
    Msg 207, Level 16, State 1, Line 7

    When I remove that WHERE clause, it then at least works. That said, I do get a warning:

    Warning: Null value is eliminated by an aggregate or other SET operation.

    That occurs once each for the 4 db’s I have. I guess some col in one of those that query doing a GROUP By has a null value causing problem for one of the aggregating functions.

    But it does work otherwise, so again thanks for the nifty code! :-)

    Reply
  • Thanks for replying. Pinal, And I was having a brain fart that day, in thinking that there was a connection between cols in the WHERE clause and cols in the SELECT. (Some old DB’s used to have that limit. Just slipped into old thinking.

    And I should have checked to see if it was a version-specific thing. I suspect the server I was testing it on is in 2008 or its compat mode.

    At least the comments will stand, to help others coming along who may be in the same boat. If Stuart may tweak it to somehow make it conditional based on the version, you could freely delete my comments and your reply if you’d like. Thanks, as always, for all you (both) do.

    Reply
  • corporateworldasiseeit
    June 11, 2015 3:29 pm

    I also got same error. To make it run , it simply commented where clause.

    Reply
  • Thanks to you both, Pinal and Stuart, for your responses to the observations shared here, to make the script more widely usable by future readers. :-)

    Reply

Leave a Reply