Libvirt / QEmu Machine Fails and Refuses Restart Because of Memory Allocation Errors
- by Elmar Weber
I'm having a problem with libvirt. On a system restart all virtual machines (VMs) are started without a problem and keep running. Then at some point in time a set of machines shuts down according to their log. When I try to restart the machine, I'm getting an error that the memory allocation failed, although more than enough memory is free.
server ~ # free
total used free shared buffers cached
Mem: 16176648 16025476 151172 0 285432 950300
-/+ buffers/cache: 14789744 1386904
Swap: 0 0 0
server ~ # virsh start zimbra
error: Failed to start domain zimbra
error: Unable to read from monitor: Connection reset by peer
server ~ # tail -n 4 /var/log/libvirt/qemu/zimbra.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 3072 -smp 2,sockets=2,cores=1,threads=1 -name zimbra -uuid d05ddb7a-83c4-a77b-d8bc-a322648520cf -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/zimbra.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -drive file=/var/lib/libvirt/images/zimbra.img,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=19,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:21:a9:ad,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 192.168.1.2:25 -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
char device redirected to /dev/pts/2
Failed to allocate 3221225472 B: Cannot allocate memory
2012-07-06 08:42:56.076+0000: shutting down
server ~ # uname -a
Linux server 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
The system is a Ubuntu 12.04 server. The problem seems to occurs since the last restart, which was due to a number of package upgrades and a kernel upgrade. I tried booting with the previous kernel, the problem persists. I was not able to pinpoint an exact event when the machines fail, they do it at nearly the same time. The last time a duplicity job was running, this was not always the case however.
Any suggestions on how to debug this?
Best regards,
elm