More hardware normally delivers better performance, but there are exceptions where it can hinder performance. Understanding these exceptions and working around it is a major part of SQL Server performance tuning.
When a memory allocating query executes in parallel, SQL Server distributes memory to each task that is executing part of
the query in parallel. In our example
the sort operator that executes in parallel divides
the memory across all tasks assuming even distribution of rows. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.
In reality, how often are column values evenly distributed, think about an example; are employees working for your company distributed evenly across all
the Zip codes or mainly concentrated in
the headquarters? What happens when you sort result set based on Zip codes? Do all products in
the catalog sell equally or are few products hot selling items?
One of my customers tested
the below example on a 24 core server with various MAXDOP settings and here are
the results:MAXDOP 1: CPU time = 1185 ms, elapsed time = 1188 msMAXDOP 4: CPU time = 1981 ms, elapsed time = 1568 msMAXDOP 8: CPU time = 1918 ms, elapsed time = 1619 msMAXDOP 12: CPU time = 2367 ms, elapsed time = 2258 msMAXDOP 16: CPU time = 2540 ms, elapsed time = 2579 msMAXDOP 20: CPU time = 2470 ms, elapsed time = 2534 msMAXDOP 0: CPU time = 2809 ms, elapsed time = 2721 ms - all 24 cores.In
the above test, when
the data was evenly distributed,
the elapsed time of parallel query was always lower than serial query.
Why does
the query get slower and slower with more CPU cores / higher MAXDOP? Maybe you can answer this question after reading
the article; let me know:
[email protected].
Well you get
the point, let’s see an example.
The best way to learn is to practice. To create
the below tables and reproduce
the behavior, join
the mailing list by using this link: www.sqlworkshops.com/ml and I will send you
the table creation script.
Let’s update
the Employees table with 49 out of 50 employees located in Zip code 2001.
update Employees set Zip = EmployeeID /
400 + 1 where EmployeeID % 50 = 1
update Employees set Zip = 2001 where EmployeeID % 50 != 1
go
update statistics Employees with fullscan
go
Let’s create
the temporary table #FireDrill with all possible Zip codes.
drop table #FireDrill
go
create table #FireDrill (Zip int primary key)
insert into #FireDrill select distinct Zip from Employees
update statistics #FireDrill with fullscan
go
Let’s execute
the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--First serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 1)
goThe query took 1011 ms to complete.
The execution plan shows
the 77816 KB of memory was granted while
the estimated rows were 799624.
No Sort Warnings in SQL Server Profiler.
Now let’s execute
the query in parallel with MAXDOP 0.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 0)
go
The query took 1912 ms to complete.
The execution plan shows
the 79360 KB of memory was granted while
the estimated rows were 799624.
The estimated number of rows between serial and parallel plan are
the same.
The parallel plan has slightly more memory granted due to additional overhead.
Sort properties shows
the rows are unevenly distributed over
the 4 threads.
Sort Warnings in SQL Server Profiler.
Intermediate Summary:
The reason for
the higher duration with parallel plan was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001.
Now let’s update
the Employees table and distribute employees evenly across all Zip codes.
update Employees set Zip = EmployeeID /
400 + 1
go
update statistics Employees with fullscan
go
Let’s execute
the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--Serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 1)
go
The query took 751 ms to complete.
The execution plan shows
the 77816 KB of memory was granted while
the estimated rows were 784707.
No Sort Warnings in SQL Server Profiler.
Now let’s execute
the query in parallel with MAXDOP 0.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 0)
go
The query took 661 ms to complete.
The execution plan shows
the 79360 KB of memory was granted while
the estimated rows were 784707.
Sort properties shows
the rows are evenly distributed over
the 4 threads.
No Sort Warnings in SQL Server Profiler.
Intermediate Summary: When employees were distributed unevenly, concentrated on 1 Zip code, parallel sort spilled while serial sort performed well without spilling to tempdb. When
the employees were distributed evenly across all Zip codes, parallel sort and serial sort did not spill to tempdb. This shows uneven data distribution may affect
the performance of some parallel queries negatively. For detailed discussion of memory allocation, refer to webcasts available at www.sqlworkshops.com/webcasts.
Some of you might conclude from
the above execution times that parallel query is not faster even when there is no spill. Below you can see when we are joining limited amount of Zip codes, parallel query will be fasted since it can use Bitmap Filtering.
Let’s update
the Employees table with 49 out of 50 employees located in Zip code 2001.
update Employees set Zip = EmployeeID /
400 + 1 where EmployeeID % 50 = 1
update Employees set Zip = 2001 where EmployeeID % 50 != 1
go
update statistics Employees with fullscan
go
Let’s create
the temporary table #FireDrill with limited Zip codes.
drop table #FireDrill
go
create table #FireDrill (Zip int primary key)
insert into #FireDrill select distinct Zip
from Employees where Zip between 1800 and 2001
update statistics #FireDrill with fullscan
go
Let’s execute
the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--Serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 1)
go
The query took 989 ms to complete.
The execution plan shows
the 77816 KB of memory was granted while
the estimated rows were 785594.
No Sort Warnings in SQL Server Profiler.
Now let’s execute
the query in parallel with MAXDOP 0.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 0)
go
The query took 1799 ms to complete.
The execution plan shows
the 79360 KB of memory was granted while
the estimated rows were 785594.
Sort Warnings in SQL Server Profiler.
The estimated number of rows between serial and parallel plan are
the same.
The parallel plan has slightly more memory granted due to additional overhead.
Intermediate Summary:
The reason for
the higher duration with parallel plan even with limited amount of Zip codes was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001.
Now let’s update
the Employees table and distribute employees evenly across all Zip codes.
update Employees set Zip = EmployeeID /
400 + 1
go
update statistics Employees with fullscan
go
Let’s execute
the query serially with MAXDOP 1.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--Serially with MAXDOP 1
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 1)
go
The query took 250 ms to complete.
The execution plan shows
the 9016 KB of memory was granted while
the estimated rows were 79973.8.
No Sort Warnings in SQL Server Profiler.
Now let’s execute
the query in parallel with MAXDOP 0.
--Example provided by www.sqlworkshops.com
--Execute query with uneven Zip code distribution
--In parallel with MAXDOP 0
set statistics time on
go
declare @EmployeeID int, @EmployeeName varchar(48),@zip int
select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e
inner join #FireDrill fd on (e.Zip = fd.Zip)
order by e.Zip
option (maxdop 0)
go
The query took 85 ms to complete.
The execution plan shows
the 13152 KB of memory was granted while
the estimated rows were 784707.
No Sort Warnings in SQL Server Profiler.
Here you see, parallel query is much faster than serial query since SQL Server is using Bitmap Filtering to eliminate rows before
the hash join.
Parallel queries are very good for performance, but in some cases it can hinder performance. If one identifies
the reason for these hindrances, then it is possible to get
the best out of parallelism. I covered many aspects of monitoring and tuning parallel queries in webcasts (www.sqlworkshops.com/webcasts) and articles (www.sqlworkshops.com/articles). I suggest you to watch
the webcasts and read
the articles to better understand how to identify and tune parallel query performance issues.
Summary: One has to avoid sort spill over tempdb and
the chances of spills are higher when a query executes in parallel with uneven data distribution. Parallel query brings its own advantage, reduced elapsed time and reduced work with Bitmap Filtering. So it is important to understand how to avoid spills over tempdb and when to execute a query in parallel.
I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them.
The best way to learn is to practice. To create
the above tables and reproduce
the behavior, join
the mailing list at www.sqlworkshops.com/ml and I will send you
the relevant SQL Scripts.
Register for
the upcoming 3 Day Level
400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.
Disclaimer and copyright information:This article refers to organizations and products that may be
the trademarks or registered trademarks of their various owners.
Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use
the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work.
This article is for informational purposes only; you use any of
the suggestions given here entirely at your own risk.
Register for
the upcoming 3 Day Level
400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.
R Meyyappan
[email protected]
LinkedIn: http://at.linkedin.com/in/rmeyyappan