SQL SERVER – Solution – Puzzle – Statistics are not Updated but are Created Once

Earlier I asked puzzle why statistics are not updated. Read the complete details over here: Statistics are not Updated but are Created Once

In the question I have demonstrated even though statistics should have been updated after lots of insert in the table are not updated.(Read the details SQL SERVER – When are Statistics Updated – What triggers Statistics to Update)

In this example I have created following situation:

  • Create Table
  • Insert 1000 Records
  • Check the Statistics
  • Now insert 10 times more 10,000 indexes
  • Check the Statistics – it will be NOT updated
  • Auto Update Statistics and Auto Create Statistics for database is TRUE

Now I have requested two things in the example 1) Why this is happening? 2) How to fix this issue?

I have many answers – here is the how I fixed it which has resolved the issue for me. NOTE: There are multiple answers to this problem and I will do my best to list all.

Solution:

  • Create nonclustered Index on column City

Here is the working example for the same. Let us understand this script and there is added explanation at the end.

-- Execution Plans Difference
-- Estimated Execution Plan Vs Actual Execution Plan
-- Create Sample Database
CREATE DATABASE SampleDB
GO
USE SampleDB
GO
-- Create Table
CREATE TABLE ExecTable (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
CREATE NONCLUSTERED INDEX IX_ExecTable1
ON ExecTable (City);
GO
-- Insert One Thousand Records
-- INSERT 1
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM
sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City  = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
DBCC SHOW_STATISTICS('ExecTable', IX_ExecTable1);
GO
--------------------------------------------------------------
-- Round 2
-- Insert One Thousand Records
-- INSERT 2
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM
sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City  = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
DBCC SHOW_STATISTICS('ExecTable', IX_ExecTable1);
GO
-- Clean up Database
DROP TABLE ExecTable
GO

When I created non clustered index on the column city, it also created statistics on the same column with same name as index. When we populate the data in the column the index is update – resulting execution plan to be invalided – this leads to the statistics to be updated in next execution of SELECT. This behavior does not happen on Heap or column where index is auto created. If you explicitly update the index, often you can see the statistics are updated as well. You can see this is for sure happening if you follow the tell of John Sansom.

John Sansom‘s suggestion:

That was fun!

Although the column statistics are invalidated by the time the second select statement is executed, the query is not compiled/recompiled but instead the existing query plan is reused.

It is the “next” compiled query against the column statistics that will see that they are out of date and will then in turn instantiate the action of updating statistics.

You can see this in action by forcing the second statement to recompile.

SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = ‘New York’ option(RECOMPILE)
GO

Kevin Cross also have another suggestion:

I agree with John. It is reusing the Execution Plan. Aside from OPTION(RECOMPILE), clearing the Execution Plan Cache before the subsequent tests will also work.

i.e., run this before round 2:
————————————————————–
– Clear execution plan cache before next test
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
————————————————————–

Nice puzzle!

Kevin

As this was puzzle John and Kevin both got the correct answer, there was no condition for answer to be part of best practices. I know John and he is finest DBA around – his tremendous knowledge has always impressed me. John and Kevin both will agree that clearing cache either using DBCC FREEPROCCACHE and recompiling each query every time is for sure not good advice on production server. It is correct answer but not best practice.

By the way, if you have better solution or have better suggestion please advise. I am open to change my answer and publish further improvement to this solution. On very separate note, I like to have clustered index on my Primary Key, which I have not mentioned here as it is out of the scope of this puzzle.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Puzzle – Statistics are not Updated but are Created Once

After having excellent response to my quiz – Why SELECT * throws an error but SELECT COUNT(*) does not?I have decided to ask another puzzling question to all of you.

I am running this test on SQL Server 2008 R2. Here is the quick scenario about my setup.

  • Create Table
  • Insert 1000 Records
  • Check the Statistics
  • Now insert 10 times more 10,000 indexes
  • Check the Statistics – it will be NOT updated

Note: Auto Update Statistics and Auto Create Statistics for database is TRUE

Expected Result – Statistics should be updated – SQL SERVER – When are Statistics Updated – What triggers Statistics to Update

Now the question is why the statistics are not updated?

The common answer is – we can update the statistics ourselves using

UPDATE STATISTICS TableName WITH FULLSCAN, ALL

However, the solution I am looking is where statistics should be updated automatically based on algorithm mentioned here.

Now the solution is to ____________________.

Vinod Kumar is not allowed to take participate over here as he is the one who has helped me to build this puzzle.

I will publish the solution on next week. Please leave a comment and if your comment consist valid answer, I will publish with due credit.

Here is the script to reproduce the scenario which I mentioned.

-- Execution Plans Difference
-- Create Sample Database
CREATE DATABASE SampleDB
GO
USE SampleDB
GO
-- Create Table
CREATE TABLE ExecTable (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
-- Insert One Thousand Records
-- INSERT 1
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM
sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Display statistics of the table - none listed
sp_helpstats N'ExecTable', 'ALL'
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City  = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
 SQL SERVER   Puzzle   Statistics are not Updated but are Created Once
-- Replace your Statistics over here
-- NOTE: Replace your _WA_Sys with stats from above query
DBCC SHOW_STATISTICS('ExecTable', _WA_Sys_00000004_7D78A4E7);
GO
--------------------------------------------------------------
-- Round 2
-- Insert Ten Thousand Records
-- INSERT 2
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM
sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City  = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
-- NOTE: Replace your _WA_Sys with stats from above query
DBCC SHOW_STATISTICS('ExecTable', _WA_Sys_00000004_7D78A4E7);
GO
-- You will notice that Statistics are still updated with 1000 rows SQL SERVER   Puzzle   Statistics are not Updated but are Created Once
-- Clean up Database
DROP TABLE ExecTable
GO
USE MASTER
GO
ALTER DATABASE SampleDB
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO
DROP DATABASE SampleDB
GO

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – Solid Quality Journal – Importance of Statistics

My article on “Important of Statistics” has been published in Solid Quality Journal.

Statistics are a key part of getting solid performance. In this article we will go over the basics of the statistics and various best practices related to Statistics. We will go over various frequently asked questions like when to update statistics and difference between sync and async update of statistics. We will also discuss the pros and cons of the statistics update.

I have answered one very important questions in this article:

Should keep Auto Create Statistics and Auto Update Statistics settings true/on?

Download Importance of Statistics

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Get File Statistics Using fn_virtualfilestats

Quite often when I am staring at my SSMS I wonder what is going on under the hood in my SQL Server. I often want to know which database is very busy and which database is bit slow because of IO issue. Sometime, I think at the file level as well. I want to know which MDF or NDF is busiest and doing most of the work. Following query gets the same results very quickly.

SELECT DB_NAME(vfs.DbId) DatabaseName, mf.name,
mf.physical_name, vfs.BytesRead, vfs.BytesWritten,
vfs.IoStallMS, vfs.IoStallReadMS, vfs.IoStallWriteMS,
vfs.NumberReads, vfs.NumberWrites,
(
Size*8)/1024 Size_MB
FROM ::fn_virtualfilestats(NULL,NULL) vfs
INNER JOIN sys.master_files mf ON mf.database_id = vfs.DbId
AND mf.FILE_ID = vfs.FileId
GO

When you run above query you will get many valuable information like what is the size of the file as well how many times the reads and writes are done for each file. It also displays the read/write data in bytes. Due to IO if there has been any stall (delay) in read or write, you can know that as well.

I keep this handy but have not shared on blog earlier.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Copy Statistics from One Server to Another Server

I was recently working on a performance tuning project in Dubai (yeah I was able to see the tallest tower from the window of my work place). I had a very interesting learning experience there. There was a situation where we wanted to receive the schema of original database from a certain client. However, the client was not able to provide us any data due to privacy issues. The schema was very important because without having an access to underlying data, it was a bit difficult to judge the queries etc. For example, without any primary data, all the queries are running in 0 (zero) milliseconds and all were using nested loop as there were no data to be returned. Even though we had CPU offending queries, they were not doing anything without the data in the tables. This was really a challenge as I did not have access to production server data and I could not recreate the scenarios as production without data.

Well, I was confused but Ruben from Solid Quality Mentors, Spain taught me new tricks. He suggested that when table schema is generated, we can create the statistics consequently. Here is how we had done that:

Once statistics is created along with the schema, without data in the table, all the queries will work as how they will work on production server. This way, without access to the data, we were able to recreate the same scenario as production server on development server.

statsscript1 SQL SERVER   Copy Statistics from One Server to Another Server

statsscript2 SQL SERVER   Copy Statistics from One Server to Another Server

statsscript3 SQL SERVER   Copy Statistics from One Server to Another Server

statsscript4 SQL SERVER   Copy Statistics from One Server to Another Server

statsscript5 SQL SERVER   Copy Statistics from One Server to Another Server

statsscript6 SQL SERVER   Copy Statistics from One Server to Another Server

statsscript7 SQL SERVER   Copy Statistics from One Server to Another Server

When observed at the script, you will find that the statistics were also generated along with the query.

You will find statistics included in WITH STATS_STREAM clause.

statsscript8 SQL SERVER   Copy Statistics from One Server to Another Server

What a very simple and effective script.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQLAuthority News – Statistics and Best Practices – Virtual Tech Days – Nov 22, 2010

I am honored that I have been invited to speak at Virtual TechDays on Nov 22, 2010 by Microsoft. I will be speaking on my favorite subject of Statistics and Best Practices.

This exclusive online event will have 80 deep technical sessions across 3 days – and, attendance is completely FREE. There are dedicated tracks for Architects, Software Developers/Project Managers, Infrastructure Managers/Professionals and Enterprise Developers. So, REGISTER for this exclusive online event TODAY.

statistics SQLAuthority News   Statistics and Best Practices   Virtual Tech Days   Nov 22, 2010

Statistics and Best Practices
Timing: 11:45am-12:45pm
Statistics are a key part of getting solid performance. In this session we will go over the basics of the statistics and various best practices related to Statistics. We will go over various frequently asked questions like a) when to update statistics, b) different between sync and async update of statistics c) best method to update statistics d) optimal interval of updating statistics. We will also discuss the pros and cons of the statistics update. This session is for all of you – whether you’re a DBA or developer!

You can register for this event over here.

If you have never attended my session on this subject I strongly suggest that you attend the event as this is going to be very interesting conversation between us. If you have attended this session earlier, this will contain few new information which will for sure interesting to share with all.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Outer Join Not Allowed in Indexed Views

I recently received an email that contains a question from one of my readers. I have already replied the answer to his email, but I would still like to bring it to your attention and ask if you think I could have done any better with the example I gave.

The question was raised when the email sender read the white paper, Improving Performance with SQL Server 2008 Indexed Views. If you scroll all the way down through the said white paper, there are several questions and answers.

Q: Why can’t I use OUTER JOIN in an Indexed view?

A: Rows can logically disappear from an Indexed view based on OUTER JOIN when you insert data into a base table. This makes the OUTER JOIN view to be increasingly updated, which is relatively difficult to implement. In addition, the performance of the implementation would be slower than for views based on standard (INNER) JOIN.

The reader was confused with my answer and wanted me to explain it further. Here is the example which I have quickly put together to demonstrate the behavior described in the above statement.
USE tempdb
GO
-- Create Two Tables
CREATE TABLE BaseTable (ID1 INT, Col1 VARCHAR(100))
CREATE TABLE JoinedTable (ID2 INT, Col2 VARCHAR(100))
GO
-- Insert Values in Tables
INSERT INTO BaseTable (ID1,Col1)
SELECT 1,'First'
UNION ALL
SELECT 2,'Second'
GO
INSERT INTO JoinedTable (ID2,Col2)
SELECT 1,'First'
UNION ALL
SELECT 2,'Second'
UNION ALL
SELECT 3,'Third'
UNION ALL
SELECT 4,'Fourth'
GO
-- Use Outer Join
SELECT jt.*
FROM BaseTable bt
RIGHT OUTER JOIN JoinedTable jt ON bt.ID1 = jt.ID2
WHERE bt.ID1 IS NULL
GO

The script above will give us the following output:

outerjoinmiss1 SQL SERVER   Outer Join Not Allowed in Indexed Views

-- Now Insert Rows in Base Table
INSERT INTO BaseTable (ID1,Col1)
SELECT 3,'Third'
GO
-- You will notice that one row less retrieved from Join
SELECT jt.*
FROM BaseTable bt
RIGHT OUTER JOIN JoinedTable jt ON bt.ID1 = jt.ID2
WHERE bt.ID1 IS NULL
GO
-- Clean up
DROP TABLE BaseTable
DROP TABLE JoinedTable
GO

outerjoinmiss2 SQL SERVER   Outer Join Not Allowed in Indexed Views

After running this script, you will notice that as the base table gains one row, the result loses one row. Going back to the white paper mentioned earlier, I believe this is expensive to manage for the same reason why it is not allowed in Indexed View.

Let me know if you have a better example to demonstrate this behavior in the Outer Join.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Find Statistics Update Date – Update Statistics

Statistics are one of the most important factors of a database as it contains information about how data is distributed in the database objects (tables, indexes etc). It is quite common to listen people talking about not optimal plan and expired statistics. Quite often I have heard the suggestion to update the statistics if query is not optimal. Please note that there are many other factors for query to not perform well; expired statistics are one of them for sure.

If you want to know when your statistics was last updated, you can run the following query.

USE AdventureWorks
GO
SELECT name AS index_name,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexes
WHERE OBJECT_ID = OBJECT_ID('HumanResources.Department')
GO

statsupdate SQL SERVER   Find Statistics Update Date   Update Statistics

If due to any reason you think that your statistics outdated and you want to update them, you can run following statement. In following statement, I have specified an additional option of FULLSCAN, which implies that the complete table is scanned to update the statistics.

USE AdventureWorks;
GO
UPDATE STATISTICS HumanResources.Department
WITH FULLSCAN
GO

Please note that you should only run update statistics if you think they will benefit your query and when your server is not very busy. If you have auto update “usually” on, the SQL Server takes care of updating stats when necessary. Here, is a quick example where you can see the updating of older statistics using fullscan.

statsupdate1 SQL SERVER   Find Statistics Update Date   Update Statistics

Statistics is very deep subject, and a lot of things can be discussed over it. We will discuss about many other in-depth topics in some other article in future.

Reference: Pinal Dave (http://blog.SQLAuthority.com)

SQL SERVER – Enable Automatic Statistic Update on Database

In one of the recent projects, I found out that despite putting good indexes and optimizing the query, I could not achieve an optimized performance and I still received an unoptimized response from the SQL Server. On examination, I figured out that the culprit was statistics. The database that I was trying to optimize had auto update of the statistics was disabled.

Once I enabled the auto update of statistics, the database started to respond as expected. If you ever face situation like this, please do the following:

1) Enable Auto Creation of Statistics

2) Enable Auto Update of Statistics

3) Update Statistics for whole database

USE AdventureWorks;
GO
/* Enable Stats and Auto Update */
-- Enable Auto Create of Statistics
ALTER DATABASE AdventureWorks
SET AUTO_CREATE_STATISTICS ON;
-- Enable Auto Update of Statistics
ALTER DATABASE AdventureWorks
SET AUTO_UPDATE_STATISTICS ON;
GO
-- Update Statistics for whole database
EXEC sp_updatestats
GO

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SQL Server Management Studio and Client Statistics

Client Statistics is very important. Many a time, people relate queries execution plan with query cost. This is not a good comparison. Both are different parameters, and they are not always related. It is possible that the query cost of any statement is less, but the amount of the data returned is considerably large, which is causing any query to run slow. How do we know if any query is retrieving a large amount data or very little data? In one way, it is quite easy to figure this out by just looking at the result set; however, this method cannot be relied upon all the time as it is difficult to reach a conclusion when there are many columns and many rows.

To measure how much data is retrieved from server to client side is very simple. SQL Server Management Studio has feature that can measure client statistics. There are three different ways to enable client statistics.

Method 1

Press SHIFT+ALT+S together.

Method 2

Go to Menu >> Query >> Include Client Statistics

clientstats1 SQL SERVER   SQL Server Management Studio and Client Statistics

Method 3

clientstats2 SQL SERVER   SQL Server Management Studio and Client Statistics

Once you enable the client statistics, it will display another tab in the result span; when clicked, we obtain the following:

clientstats3 SQL SERVER   SQL Server Management Studio and Client Statistics

As shown in the image, it will capture details for different trials and they are compared with each other. Green arrows indicate improved statistics and red arrows indicate degrading statistics. Please note that here an increase does not always imply a good result; sometimes some statistics when marked as “up” can imply a bad result as well. For the same reason, they are colored in green and red, signifying good and bad results, respectively.

A maximum of 10 trials are averaged in the display window. When the11th trial comes into the result, the very first trial is dropped, keeping total average of latest 10 trials. There are many other aspects about client statistics; these shall be discussed in other articles.

Do you use this particular feature of SQL Server? If you do, what do you use it for?

Reference: Pinal Dave (http://blog.SQLAuthority.com)