Related to: Current wisdom on SQL Server and Hyperthreading
Recently we upgraded our Windows 2008 R2 database server from an X5470 to a X5560. The theory is both CPUs have very similar performance, if anything the X5560 is slightly faster.
However, SQL Server 2008 R2 performance has been pretty bad over the last day or so and CPU usage has been pretty high.
Page life expectancy is massive, we are getting almost 100% cache hit for the pages, so memory is not a problem.
When I ran:
SELECT * FROM sys.dm_os_wait_stats
order by signal_wait_time_ms desc
I got:
wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms
------------------------------------------------------------ -------------------- -------------------- -------------------- --------------------
XE_TIMER_EVENT 115166 2799125790 30165 2799125065
REQUEST_FOR_DEADLOCK_SEARCH 559393 2799053973 5180 2799053973
SOS_SCHEDULER_YIELD 152289883 189948844 960 189756877
CXPACKET 234638389 2383701040 141334 118796827
SLEEP_TASK 170743505 1525669557 1406 76485386
LATCH_EX 97301008 810738519 1107 55093884
LOGMGR_QUEUE 16525384 2798527632 20751319 4083713
WRITELOG 16850119 18328365 1193 2367880
PAGELATCH_EX 13254618 8524515 11263 1670113
ASYNC_NETWORK_IO 23954146 6981220 7110 1475699
(10 row(s) affected)
I also ran
-- Isolate top waits for server instance since last restart or statistics clear
WITH Waits AS (
SELECT
wait_type,
wait_time_ms / 1000. AS [wait_time_s],
100. * wait_time_ms / SUM(wait_time_ms) OVER() AS [pct],
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS [rn]
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE',
'SLEEP_TASK','SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR','LOGMGR_QUEUE',
'CHECKPOINT_QUEUE','REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH',
'BROKER_TASK_STOP','CLR_MANUAL_EVENT','CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE',
'FT_IFTS_SCHEDULER_IDLE_WAIT','XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN'))
SELECT W1.wait_type,
CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold
And got
wait_type wait_time_s pct running_pct
CXPACKET 554821.66 65.82 65.82
LATCH_EX 184123.16 21.84 87.66
SOS_SCHEDULER_YIELD 37541.17 4.45 92.11
PAGEIOLATCH_SH 19018.53 2.26 94.37
FT_IFTSHC_MUTEX 14306.05 1.70 96.07
That shows huge amounts of time synchronizing queries involving parallelism (high CXPACKET). Additionally, anecdotally many of these problem queries are being executed on multiple cores (we have no MAXDOP hints anywhere in our code)
The server has not been under load for more than a day or so. We are experiencing a large amount of variance with query executions, typically many queries appear to be slower that they were on our previous DB server and CPU is really high.
Will disabling Hyperthreading help at reducing our CPU usage and increase throughput?