ESX Scheduler and NUMA issue
- by babyg_wc
On our 24 core bl685 (4sockets x 6core), we find that NUMA nodes 0 and 1 are pretty busy (unfortunately resulting in elevated cpu ready times on the VMS), whilst NUMA nodes 2 and 3 are almost unused.
I thought this just maybe a ESX4 U1 issue, so I had a colleague with a 32 core (dl785) farm investigate, and it seems that his last 3 or 4 NUMA nodes are also not really being utilised.
ESX seems to have a weakness when it comes to balancing lightly loaded NUMA boxes, Im going to enabled node interleaving in the BIOS and see if the scheduler balancers across all 24 cores, instead of just 12!...
For those of you with large core counts, I would suggest you fire up you viclient, and check Physical CPU useage (or esxtop), I would be interested to hear what your results are. Please note, that its only the lightly loaded (eg less than 30% cpu load on the esx host) that seems to have the biggest issue with load imbalance.
Thoughts/comments.
PS ive logged a SR with vmware to assist, also the other "problem" could be that we have 128gb of ram in each host, and therefore the scheduler sees no good reason why it shouldnt try and cram all vms's into the first two NUMA nodes, as we only have around 50gb of ram worth of vms on each host...