SQL SERVER – Solution – Puzzle – Statistics are not Updated but are Created Once
- by pinaldave
Earlier I asked puzzle why statistics are not updated. Read the complete details over here: Statistics are not Updated but are Created Once
In the question I have demonstrated even though statistics should have been updated after lots of insert in the table are not updated.(Read the details SQL SERVER – When are Statistics Updated – What triggers Statistics to Update)
In this example I have created following situation:
Create Table
Insert 1000 Records
Check the Statistics
Now insert 10 times more 10,000 indexes
Check the Statistics – it will be NOT updated
Auto Update Statistics and Auto Create Statistics for database is TRUE
Now I have requested two things in the example 1) Why this is happening? 2) How to fix this issue?
I have many answers – here is the how I fixed it which has resolved the issue for me. NOTE: There are multiple answers to this problem and I will do my best to list all.
Solution:
Create nonclustered Index on column City
Here is the working example for the same. Let us understand this script and there is added explanation at the end.
-- Execution Plans Difference
-- Estimated Execution Plan Vs Actual Execution Plan
-- Create Sample Database
CREATE DATABASE SampleDB
GO
USE SampleDB
GO
-- Create Table
CREATE TABLE ExecTable (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
CREATE NONCLUSTERED INDEX IX_ExecTable1
ON ExecTable (City);
GO
-- Insert One Thousand Records
-- INSERT 1
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
DBCC SHOW_STATISTICS('ExecTable', IX_ExecTable1);
GO
--------------------------------------------------------------
-- Round 2
-- Insert One Thousand Records
-- INSERT 2
INSERT INTO ExecTable (ID,FirstName,LastName,City)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 3 THEN 'Los Angeles'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 7 THEN 'La Cinega'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 13 THEN 'San Diego'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%20 = 17 THEN 'Las Vegas'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Select Statement
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = 'New York'
GO
-- Display statistics of the table
sp_helpstats N'ExecTable', 'ALL'
GO
-- Replace your Statistics over here
DBCC SHOW_STATISTICS('ExecTable', IX_ExecTable1);
GO
-- Clean up Database
DROP TABLE ExecTable
GO
When I created non clustered index on the column city, it also created statistics on the same column with same name as index. When we populate the data in the column the index is update – resulting execution plan to be invalided – this leads to the statistics to be updated in next execution of SELECT. This behavior does not happen on Heap or column where index is auto created. If you explicitly update the index, often you can see the statistics are updated as well. You can see this is for sure happening if you follow the tell of John Sansom.
John Sansom‘s suggestion:
That was fun!
Although the column statistics are invalidated by the time the second select statement is executed, the query is not compiled/recompiled but instead the existing query plan is reused.
It is the “next” compiled query against the column statistics that will see that they are out of date and will then in turn instantiate the action of updating statistics.
You can see this in action by forcing the second statement to recompile.
SELECT FirstName, LastName, City
FROM ExecTable
WHERE City = ‘New York’ option(RECOMPILE)
GO
Kevin Cross also have another suggestion:
I agree with John. It is reusing the Execution Plan. Aside from OPTION(RECOMPILE), clearing the Execution Plan Cache before the subsequent tests will also work.
i.e., run this before round 2:
————————————————————–
– Clear execution plan cache before next test
DBCC FREEPROCCACHE WITH NO_INFOMSGS;
————————————————————–
Nice puzzle!
Kevin
As this was puzzle John and Kevin both got the correct answer, there was no condition for answer to be part of best practices. I know John and he is finest DBA around – his tremendous knowledge has always impressed me. John and Kevin both will agree that clearing cache either using DBCC FREEPROCCACHE and recompiling each query every time is for sure not good advice on production server. It is correct answer but not best practice.
By the way, if you have better solution or have better suggestion please advise. I am open to change my answer and publish further improvement to this solution. On very separate note, I like to have clustered index on my Primary Key, which I have not mentioned here as it is out of the scope of this puzzle.
Reference: Pinal Dave (http://blog.SQLAuthority.com)
Filed under: Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Index, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Statistics