After having excellent response to my quiz – Why SELECT * throws an error but SELECT COUNT(*) does not?I have decided to ask another puzzling question to all of you.
I am running this test on SQL Server 2008 R2. Here is the quick scenario about my setup.
- Create Table
- Insert 1000 Records
- Check the Statistics
- Now insert 10 times more 10,000 indexes
- Check the Statistics – it will be NOT updated
Note: Auto Update Statistics and Auto Create Statistics for database is TRUE
Expected Result – Statistics should be updated – SQL SERVER – When are Statistics Updated – What triggers Statistics to Update
Now the question is why the statistics are not updated?
The common answer is – we can update the statistics ourselves using
UPDATE STATISTICS TableName WITH FULLSCAN, ALL
However, the solution I am looking is where statistics should be updated automatically based on algorithm mentioned here.
Now the solution is to ____________________.
Vinod Kumar is not allowed to take participate over here as he is the one who has helped me to build this puzzle.
I will publish the solution on next week. Please leave a comment and if your comment consist valid answer, I will publish with due credit.
Here is the script to reproduce the scenario which I mentioned.
-- Execution Plans Difference
-- Create Sample Database
CREATE DATABASE SampleDB
GO
USE SampleDB
GO
-- Create Table
CREATE TABLE ExecTable (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
-- Insert One Thousand Records
-- INSERT 1
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Display statistics of the table - none listed
sp_helpstats N'ExecTable', 'ALL'
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
-- NOTE: Replace your _WA_Sys with stats from above query
DBCC SHOW_STATISTICS('ExecTable', _WA_Sys_00000004_7D78A4E7);
GO
--------------------------------------------------------------
-- Round 2
-- Insert Ten Thousand Records
-- INSERT 2
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
-- NOTE: Replace your _WA_Sys with stats from above query
DBCC SHOW_STATISTICS('ExecTable', _WA_Sys_00000004_7D78A4E7);
GO
-- You will notice that Statistics are still updated with 1000 rows
-- Clean up Database
DROP TABLE ExecTable
GO
USE MASTER
GO
ALTER DATABASE SampleDB
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO
DROP DATABASE SampleDB
GO
Reference: Pinal Dave (https://blog.sqlauthority.com)
11 Comments. Leave new
Ensure auto-update statistics is set?
Hi Bender,
AUTO UPDATE STATS and AUTO CREATE STATS will be enabled by default when creating new database.
Regards
Arun Raj.C
Hi Pinal,
Statistics is initially created while retrieving the data from table using the RowSample as 1000 records which are initially inserted, when we insert the Next 10000 records and retrieving the data with same Filter criteria, Query optimizer will reuse the Same Statistics, so its not triggering the engine to update Statistics
If you check the Statistics Updated time after inserting 10000 records, it will be same as before.
If we change the Filter criteria like this
SELECT FirstName, LastName, City
FROM ExecTable
WHERE FirstName=’Bob’ and City=’New York’
It will trigger the Statistics Update process and create a new statistics based on FirstName and also update the Statistics based on City
Warm Regards
Arunraj Chandrasekaran
Following link says it should not be used in 2008R2. Also it says to obtain information about statistics, query the sys.stats and sys.stats_columns catalog views.
https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/ms173416(v=sql.100)
I guess it may be becuse the table is not indexed
If we crate an Index
CREATE CLUSTERED INDEX IDX ON ExecTable(id)
Insert 1000 rows frist then
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = ‘New York’
— this statement compiles it first time to keep track of subsequent executions
and
insert next 700 rows (as per the algorithm its 500 + 20% rows changed)
Then check your statistics using
DBCC SHOW_STATISTICS
we can see statistics updated.
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
‘Bob’,’smith’,’New York’
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
CREATE CLUSTERED INDEX IDX ON ExecTable(id)
GO
SELECT FirstName, LastName, City
FROM ExecTable
WHERE city = ‘New York’
sp_helpstats N’ExecTable’, ‘ALL’
DBCC SHOW_STATISTICS(ExecTable, ‘statistics’)
GO
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 700 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
‘Bob’,’smith’,’New York’
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
DBCC SHOW_STATISTICS(ExecTable, ‘statistics’)
Go
But, when there is no Index:
Ex: when there are no Indexes
1) First insert 1000 rows
2)Compile statemnt
3) insert 700 rows
4) insert 850 rows (as per the algorithm)
Then Check your Statistics those will be updated.
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
‘Bob’,’smith’,’New York’
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
SELECT FirstName, LastName, City
FROM ExecTable
WHERE city = ‘New York’
sp_helpstats N’ExecTable’, ‘ALL’
DBCC SHOW_STATISTICS(ExecTable, ‘statistics’)
GO
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 700 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
‘Bob’,’smith’,’New York’
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
Go
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 850 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
‘Bob’,’smith’,’New York’
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
sp_helpstats N’ExecTable’, ‘ALL’
DBCC SHOW_STATISTICS(ExecTable, ‘statistics’)
Update Statistics is Triggered when ever the optimizer is trying to build an optimal execution plan for the query , in the present scenario we are trying to query the same query twice
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = ‘New York’
and allowing the optimizer to reuse the existing plan ,which is considered as most optimal plan .
That was fun!
Although the column statistics are invalidated by the time the second select statement is executed, the query is not compiled/recompiled but instead the existing query plan is reused.
It is the “next” compiled query against the column statistics that will see that they are out of date and will then in turn instantiate the action of updating statistics.
You can see this in action by forcing the second statement to recompile.
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = ‘New York’ option(RECOMPILE)
GO
I agree with John. It is reusing the Execution Plan. Aside from OPTION(RECOMPILE), clearing the Execution Plan Cache before the subsequent tests will also work.
i.e., run this before round 2:
————————————————————–
— Clear execution plan cache before next test
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
————————————————————–
Nice puzzle!
Kevin
FREEPROCCACHE will clear entire cache. Recompile a table or stored procedure is best option.
just select with id and compile it gives the right answer
This is an excellent post Pinal.
I think the issue lies with the fact that the plan generated for the query is a “Trivial Plan”. This can be checked by clicking on the Select icon on the Actual Execution plan and looking at the “Optimisation Level” value in the properties window.
In this case the statistics do get marked for update by the background auto-update stats process but aren’t actually updated since the Trivial Plan present in the cache is never recompiled irrespective of the number of times it is used. This is true even if we encapsulate the select into a stored procedure.
The statistics update on the city column is triggered by operations like (though not limited to)
– Explicit Procedure/Plan recompilation
– DBCC FREEPROCCACHE
– OPTION RECOMPILE as per John’s suggestion above
– New query compiled which makes use of the statistics created on this column as per Arunraj’s suggestion above
More information can be found under the heading “Skipping the Recompilation Step” on this link:
https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc293623(v=technet.10)
Hope this makes sense!
Regards
Chirag