The most common performance mistake SQL Server developers make:
SQL Server estimates memory requirement for queries at compilation time. This mechanism is fine for dynamic queries that need memory, but not for queries that cache the plan. With dynamic queries the plan is not reused for different set of parameters values / predicates and hence different amount of memory can be estimated based on different set of parameter values / predicates. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Sort with examples. It is recommended to read Plan Caching and Query Memory Part II after this article which covers Hash Match operations.
When the plan is cached by using stored procedure or other plan caching mechanisms like sp_executesql or prepared statement, SQL Server estimates memory requirement based on first set of execution parameters. Later when the same stored procedure is called with different set of parameter values, the same amount of memory is used to execute the stored procedure. This might lead to underestimation / overestimation of memory on plan reuse, overestimation of memory might not be a noticeable issue for Sort operations, but underestimation of memory will lead to spill over tempdb resulting in poor performance.
This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.
To read additional articles I wrote click here.
In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a stored procedure does not change significantly based on predicates.
The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts
Enough theory, let’s see an example where we sort initially 1 month of data and then use the stored procedure to sort 6 months of data.
Let’s create a stored procedure that sorts customers by name within certain date range.
--Example provided by www.sqlworkshops.com
create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as
begin
declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime
select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c
where c.CreationDate between @CreationDateFrom and @CreationDateTo
order by c.CustomerName
option (maxdop 1)
end
go
Let’s execute the stored procedure initially with 1 month date range.
set statistics time on
go
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-01-31'
go
The stored procedure took 48 ms to complete.
The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.
The estimated number of rows, 43199.9 is similar to actual number of rows 43200 and hence the memory estimation should be ok.
There was no Sort Warnings in SQL Profiler.
Now let’s execute the stored procedure with 6 month date range.
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-06-30'
go
The stored procedure took 679 ms to complete.
The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.
The estimated number of rows, 43199.9 is way different from the actual number of rows 259200 because the estimation is based on the first set of parameter value supplied to the stored procedure which is 1 month in our case. This underestimation will lead to sort spill over tempdb, resulting in poor performance.
There was Sort Warnings in SQL Profiler.
To monitor the amount of data written and read from tempdb, one can execute
select num_of_bytes_written, num_of_bytes_read from sys.dm_io_virtual_file_stats(2, NULL) before and after the stored procedure execution, for additional information refer to the webcast: www.sqlworkshops.com/webcasts.
Let’s recompile the stored procedure and then let’s first execute the stored procedure with 6 month date range.
In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts for further details.
exec sp_recompile CustomersByCreationDate
go
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-06-30'
go
Now the stored procedure took only 294 ms instead of 679 ms.
The stored procedure was granted 26832 KB of memory.
The estimated number of rows, 259200 is similar to actual number of rows of 259200. Better performance of this stored procedure is due to better estimation of memory and avoiding sort spill over tempdb.
There was no Sort Warnings in SQL Profiler.
Now let’s execute the stored procedure with 1 month date range.
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-01-31'
go
The stored procedure took 49 ms to complete, similar to our very first stored procedure execution.
This stored procedure was granted more memory (26832 KB) than necessary memory (6656 KB) based on 6 months of data estimation (259200 rows) instead of 1 month of data estimation (43199.9 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is 6 months in this case. This overestimation did not affect performance, but it might affect performance of other concurrent queries requiring memory and hence overestimation is not recommended. This overestimation might affect performance Hash Match operations, refer to article Plan Caching and Query Memory Part II for further details.
Let’s recompile the stored procedure and then let’s first execute the stored procedure with 2 day date range.
exec sp_recompile CustomersByCreationDate
go
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-01-02'
go
The stored procedure took 1 ms.
The stored procedure was granted 1024 KB based on 1440 rows being estimated.
There was no Sort Warnings in SQL Profiler.
Now let’s execute the stored procedure with 6 month date range.
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-06-30'
go
The stored procedure took 955 ms to complete, way higher than 679 ms or 294ms we noticed before.
The stored procedure was granted 1024 KB based on 1440 rows being estimated. But we noticed in the past this stored procedure with 6 month date range needed 26832 KB of memory to execute optimally without spill over tempdb. This is clear underestimation of memory and the reason for the very poor performance.
There was Sort Warnings in SQL Profiler. Unlike before this was a Multiple pass sort instead of Single pass sort. This occurs when granted memory is too low.
Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined date range.
Let’s recreate the stored procedure with recompile hint.
--Example provided by www.sqlworkshops.com
drop proc CustomersByCreationDate
go
create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as
begin
declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime
select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c
where c.CreationDate between @CreationDateFrom and @CreationDateTo
order by c.CustomerName
option (maxdop 1, recompile)
end
go
Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-01-30'
exec CustomersByCreationDate '2001-01-01', '2001-06-30'
go
The stored procedure took 48ms and 291 ms in line with previous optimal execution times.
The stored procedure with 1 month date range has good estimation like before.
The stored procedure with 6 month date range also has good estimation and memory grant like before because the query was recompiled with current set of parameter values.
The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.
Let’s recreate the stored procedure with optimize for hint of 6 month date range.
--Example provided by www.sqlworkshops.com
drop proc CustomersByCreationDate
go
create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as
begin
declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime
select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c
where c.CreationDate between @CreationDateFrom and @CreationDateTo
order by c.CustomerName
option (maxdop 1, optimize for (@CreationDateFrom = '2001-01-01', @CreationDateTo ='2001-06-30'))
end
go
Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-01-30'
exec CustomersByCreationDate '2001-01-01', '2001-06-30'
go
The stored procedure took 48ms and 291 ms in line with previous optimal execution times.
The stored procedure with 1 month date range has overestimation of rows and memory. This is because we provided hint to optimize for 6 months of data.
The stored procedure with 6 month date range has good estimation and memory grant because we provided hint to optimize for 6 months of data.
Let’s execute the stored procedure with 12 month date range using the currently cashed plan for 6 month date range.
--Example provided by www.sqlworkshops.com
exec CustomersByCreationDate '2001-01-01', '2001-12-31'
go
The stored procedure took 1138 ms to complete.
2592000 rows were estimated based on optimize for hint value for 6 month date range. Actual number of rows is 524160 due to 12 month date range.
The stored procedure was granted enough memory to sort 6 month date range and not 12 month date range, so there will be spill over tempdb.
There was Sort Warnings in SQL Profiler.
As we see above, optimize for hint cannot guarantee enough memory and optimal performance compared to recompile hint.
This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.
Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.
I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.
Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.
Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners.
Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work.
This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.
R Meyyappan
[email protected]
LinkedIn: http://at.linkedin.com/in/rmeyyappan