Today we encountered a really strange behaviour on two identical kvm and qemu hosts.
The host systems each have 4 x 10 Cores, which means that 40 physical cores are displayed as 80 within the operating system (Ubuntu Linux 10.04 64 Bit).
We started a Windows 2003 32 Bit VM (1 CPU, 1 GB RAM, we changed those values multiple times) on one of the nodes and noticed that it took 15 minutes until the boot process began. During those 15 minutes, a black screen is shown and nothing happens. libvirt and the host system show that the qemu-kvm process for the guest is almost idling. stracing this process only shows some FUTEX entries, but nothing special.
After those 15 minutes, the Windows VM suddenly starts booting and the Windows logo occurs. After a few seconds, the VM is ready to be used. The VM itself is very performant, so this is no performance issue.
We tried to pin the CPUs with the virsh and taskset tools, but this only made things worse.
When we boot the Windows VM with a Linux Live CD there is also a black screen for several minutes, but not as long as 15. When booting another VM on this host (Ubuntu 10.04) it also has the black screen problem, and also here the black screen is only shown for 2-3 minutes (instead of 15).
So, summerinzing this:
Each guest on each of those identical nodes suffers from idling a few minutes after being started. After a few minutes, the boot process suddenly starts.
We have observed that the idling time happens right after the bios of the guest was initialized.
One of our employees had the idea to limit the amount of CPUs with maxcpus=40 (because of 40 physical cores existing) within Grub (kernel parameter) and suddenly the "black-screen-idling"-behaviour disappeared.
Searching the KVM and Qemu mailing lists, the internet, forums, serverfault and other various sites for known bugs etc. showed no useful results. Even asking in the dev IRC channels brought no new ideas. The people there recommend us to use CPU pinning, but as stated before it didn't help.
My question is now: Is there a sort of limit of CPUs for a qemu or kvm host system? Browsing the source code of those two tools showed that KVM would send a warning if your host has more than 255 CPUs. But we are not even scratching on that limit.
Some stuff about the host system:
3.0.0-20-server
kvm 1:84+dfsg-0ubuntu16+0.14.0+noroms+0ubuntu4
kvm-pxe 5.4.4-7ubuntu2
qemu-kvm 0.14.0+noroms-0ubuntu4
qemu-common 0.14.0+noroms-0ubuntu4
libvirt 0.8.8-1ubuntu6
4 x Intel(R) Xeon(R) CPU E7-4870 @ 2.40GHz, 10 Cores