SQL SERVER – Update Statistics are Sampled By Default
- by pinaldave
After reading my earlier post SQL SERVER – Create Primary Key with Specific Name when Creating Table on Statistics, I have received another question by a blog reader. The question is as follows:
Question: Are the statistics sampled by default?
Answer: Yes. The sampling rate can be specified by the user and it can be anywhere between a very low value to 100%.
Let us do a small experiment to verify if the auto update on statistics is left on. Also, let’s examine a very large table that is created and statistics by default- whether the statistics are sampled or not.
USE [AdventureWorks]
GO
-- Create Table
CREATE TABLE [dbo].[StatsTest](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](100) NULL,
[LastName] [varchar](100) NULL,
[City] [varchar](100) NULL,
CONSTRAINT [PK_StatsTest] PRIMARY KEY CLUSTERED
([ID] ASC)
) ON [PRIMARY]
GO
-- Insert 1 Million Rows
INSERT INTO [dbo].[StatsTest] (FirstName,LastName,City)
SELECT TOP 1000000 'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Update the statistics
UPDATE STATISTICS [dbo].[StatsTest]
GO
-- Shows the statistics
DBCC SHOW_STATISTICS ("StatsTest"PK_StatsTest)
GO
-- Clean up
DROP TABLE [dbo].[StatsTest]
GO
Now let us observe the result of the DBCC SHOW_STATISTICS.
The result shows that Resultset is for sure sampling for a large dataset. The percentage of sampling is based on data distribution as well as the kind of data in the table. Before dropping the table, let us check first the size of the table. The size of the table is 35 MB.
Now, let us run the above code with lesser number of the rows.
USE [AdventureWorks]
GO
-- Create Table
CREATE TABLE [dbo].[StatsTest](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](100) NULL,
[LastName] [varchar](100) NULL,
[City] [varchar](100) NULL,
CONSTRAINT [PK_StatsTest] PRIMARY KEY CLUSTERED
([ID] ASC)
) ON [PRIMARY]
GO
-- Insert 1 Hundred Thousand Rows
INSERT INTO [dbo].[StatsTest] (FirstName,LastName,City)
SELECT TOP 100000 'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Update the statistics
UPDATE STATISTICS [dbo].[StatsTest]
GO
-- Shows the statistics
DBCC SHOW_STATISTICS ("StatsTest"PK_StatsTest)
GO
-- Clean up
DROP TABLE [dbo].[StatsTest]
GO
You can see that Rows Sampled is just the same as Rows of the table. In this case, the sample rate is 100%.
Before dropping the table, let us also check the size of the table. The size of the table is less than 4 MB.
Let us compare the Result set just for a valid reference.
Test 1: Total Rows: 1000000, Rows Sampled: 255420, Size of the Table: 35.516 MB
Test 2: Total Rows: 100000, Rows Sampled: 100000, Size of the Table: 3.555 MB
The reason behind the sample in the Test1 is that the data space is larger than 8 MB, and therefore it uses more than 1024 data pages. If the data space is smaller than 8 MB and uses less than 1024 data pages, then the sampling does not happen. Sampling aids in reducing excessive data scan; however, sometimes it reduces the accuracy of the data as well.
Please note that this is just a sample test and there is no way it can be claimed as a benchmark test. The result can be dissimilar on different machines. There are lots of other information can be included when talking about this subject. I will write detail post covering all the subject very soon.
Reference: Pinal Dave (http://blog.SQLAuthority.com)
Filed under: Pinal Dave, SQL, SQL Authority, SQL Index, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: SQL Statistics