We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our Java threads are mainly using objects that are private to that thread, sometimes reading memory that other threads will be reading, and very very occasionally writing or locking shared objects.
We can't explain why the CPU cores are 25% idle. Below is a dump of "top":
top - 23:06:38 up 1 day, 23 min, 3 users, load average: 10.84, 10.27, 9.62
Tasks: 676 total, 1 running, 675 sleeping, 0 stopped, 0 zombie
Cpu(s): 64.5%us, 1.3%sy, 0.0%ni, 32.9%id, 1.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132138168k total, 131652664k used, 485504k free, 92340k buffers
Swap: 5701624k total, 230252k used, 5471372k free, 13444344k cached
...
top - 22:37:39 up 23:54, 3 users, load average: 7.83, 8.70, 9.27
Tasks: 678 total, 1 running, 677 sleeping, 0 stopped, 0 zombie
Cpu0 : 75.8%us, 2.0%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 77.2%us, 1.3%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 77.3%us, 1.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 77.8%us, 1.0%sy, 0.0%ni, 21.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 76.9%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 76.3%us, 2.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 12.6%us, 3.0%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 8.6%us, 2.0%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 77.6%us, 1.7%sy, 0.0%ni, 20.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 75.7%us, 2.0%sy, 0.0%ni, 21.4%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 76.2%us, 2.6%sy, 0.0%ni, 15.9%id, 5.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 76.6%us, 2.0%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 73.6%us, 2.6%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 74.5%us, 2.3%sy, 0.0%ni, 23.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 73.9%us, 2.3%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 72.9%us, 2.6%sy, 0.0%ni, 24.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 72.8%us, 2.6%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 72.7%us, 2.3%sy, 0.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 72.5%us, 2.6%sy, 0.0%ni, 24.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 73.0%us, 2.3%sy, 0.0%ni, 24.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu24 : 74.7%us, 2.7%sy, 0.0%ni, 22.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu25 : 74.5%us, 2.6%sy, 0.0%ni, 22.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu26 : 73.7%us, 2.0%sy, 0.0%ni, 24.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu27 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu28 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu29 : 74.0%us, 2.0%sy, 0.0%ni, 24.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu30 : 73.2%us, 2.3%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu31 : 73.1%us, 2.0%sy, 0.0%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132138168k total, 131711704k used, 426464k free, 88336k buffers
Swap: 5701624k total, 229572k used, 5472052k free, 13745596k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13865 root 20 0 122g 112g 3.1g S 2334.3 89.6 20726:49 java
27139 jayen 20 0 15428 1728 952 S 2.6 0.0 0:04.21 top
27161 sysadmin 20 0 15428 1712 940 R 1.0 0.0 0:00.28 top
33 root 20 0 0 0 0 S 0.3 0.0 0:06.24 ksoftirqd/7
131 root 20 0 0 0 0 S 0.3 0.0 0:09.52 events/0
1858 root 20 0 0 0 0 S 0.3 0.0 1:35.14 kondemand/0
A dump of the java stack confirms that none of the threads are anywhere near the few places where locks are used, nor are they anywhere near any disk or network i/o.
I had trouble finding a clear explanation of what 'top' means by "idle" versus "wait", but I get the impression that "idle" means "no more threads that need to be run" but this doesn't make sense in our case. We're using a "Executors.newFixedThreadPool(30)". There are a large number of tasks pending and each task lasts for 10 seconds or so.
I suspect that the explanation requires a good understanding of NUMA. Is the "idle" state what you see when a CPU is waiting for a non-local access? If not, then what is the explanation?