Find out which task is generating a lot of context switches on linux
Posted
by
Gaks
on Server Fault
See other posts from Server Fault
or by Gaks
Published on 2010-10-12T10:43:23Z
Indexed on
2011/01/10
12:55 UTC
Read the original article
Hit count: 448
According to vmstat, my Linux server (2xCore2 Duo 2.5 GHz) is constantly doing around 20k context switches per second.
# vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 7292 249472 82340 2291972 0 0 0 0 0 0 7 13 79 0
0 0 7292 251808 82344 2291968 0 0 0 184 24 20090 1 1 99 0
0 0 7292 251876 82344 2291968 0 0 0 83 17 20157 1 0 99 0
0 0 7292 251876 82344 2291968 0 0 0 73 12 20116 1 0 99 0
... but uptime
shows small load: load average: 0.01, 0.02, 0.01
and top
doesn't show any process with high %CPU usage.
How do I find out what exactly is generating those context switches? Which process/thread?
I tried to analyze pidstat
output:
# pidstat -w 10 1
12:39:13 PID cswch/s nvcswch/s Command
12:39:23 1 0.20 0.00 init
12:39:23 4 0.20 0.00 ksoftirqd/0
12:39:23 7 1.60 0.00 events/0
12:39:23 8 1.50 0.00 events/1
12:39:23 89 0.50 0.00 kblockd/0
12:39:23 90 0.30 0.00 kblockd/1
12:39:23 995 0.40 0.00 kirqd
12:39:23 997 0.60 0.00 kjournald
12:39:23 1146 0.20 0.00 svscan
12:39:23 2162 5.00 0.00 kjournald
12:39:23 2526 0.20 2.00 postgres
12:39:23 2530 1.00 0.30 postgres
12:39:23 2534 5.00 3.20 postgres
12:39:23 2536 1.40 1.70 postgres
12:39:23 12061 10.59 0.90 postgres
12:39:23 14442 1.50 2.20 postgres
12:39:23 15416 0.20 0.00 monitor
12:39:23 17289 0.10 0.00 syslogd
12:39:23 21776 0.40 0.30 postgres
12:39:23 23638 0.10 0.00 screen
12:39:23 25153 1.00 0.00 sshd
12:39:23 25185 86.61 0.00 daemon1
12:39:23 25190 12.19 35.86 postgres
12:39:23 25295 2.00 0.00 screen
12:39:23 25743 9.99 0.00 daemon2
12:39:23 25747 1.10 3.00 postgres
12:39:23 26968 5.09 0.80 postgres
12:39:23 26969 5.00 0.00 postgres
12:39:23 26970 1.10 0.20 postgres
12:39:23 26971 17.98 1.80 postgres
12:39:23 27607 0.90 0.40 postgres
12:39:23 29338 4.30 0.00 screen
12:39:23 31247 4.10 23.58 postgres
12:39:23 31249 82.92 34.77 postgres
12:39:23 31484 0.20 0.00 pdflush
12:39:23 32097 0.10 0.00 pidstat
Looks like some postgresql tasks are doing >10 context swiches per second, but it doesn't all sum up to 20k anyway.
Any idea how to dig a little deeper for an answer?
© Server Fault or respective owner