In the last week I had to reboot the host system twice and the second one by means of the power button.
The system is a Dell PE 6950 (4 Opteron dual core, 2,8 Ghz, 16 GB RAM, 900 GB disk in a RAID 10 array composed by 4 450GB 15000 rpm disks) with XenServer 5.5 U2. We're installing it and at now there are working in production a 2003 32 bit Windows server VM with 2 GB RAM, 3 vcpu and about 200 GB disk in 4 partition (12 GB boot, 20 GB program, 80 GB user data, 80 GB other data).
The first time I was compressing many Windows 2003 folders (some tens GB by means of W2003 compressed folder option) from a Windows remote console and some hours before my colleague installed Backup Exec agent that was alredy installed and that required a reboot (that was pending).
The console stopped responding, it was no more possible to connect by means of remote console or by means of the console of the XenCentre, it was still possible from the network to use the shared folder of the VM and the programs on it (2 db and a GIS program),
but the print server didn't work any more and I couldn't give remote reboot from other domain controller hosts. I couldn't stop the virtual machine neither from the XenCentre, neither from the command line of the host also forcing the reboot. I had to reboot the host server.
Yesterday it has been worst. I installed a template of another VM, CentOS 5.3 and then, put the DVD in the drive of the host. Then, before the install and after the boot I checked for defect the DVD and the W2003 VM began to respond slowly (I was connected by means of an administration remote console) the task manager showed only mid or low load, it is, only the first of the 3 vcpu was loaded (about 70%), while the other two were about at 20% and also the disk I/O was not so heavy.
Then the users were not so happy because they couldn't any more use the MS word docs on the server. I immediately stopped the check of the DVD (to do that I had to force the stop of such Centos 5.3 VM), then some users could again use their docs, but other had still problems, so I decided to reboot the VM, but it doesn't stopped, neither from XenCenter, neither from command line, neither forcing the reboot. Then I tried to reboot of the host, but it didn't worked, neither from the XenCentre, neither from the host prompt (shutdown -r now as root: it told that it was shutting down, but then it didn't did that). So I had to power off by means of the power button of the server (before I tried some Magic SyS REQ, but I saw that Xen isn't compiled with this option enabled).
What could you suggest about my problem, what can I look, search and see?
In the W2003 VM logs there are no errors or warning to explain what happened.
Some more exciting, amusing and inspiring words of poetry (the facts happened around 11 am):
\# egrep -i 'err|warning' xensource.log
[20100219 10:32:05.597|debug|culo|6301 unix-RPC|VBD.plug R:c81bcda701f6|xenops] watch: watching xenstore paths: [ /xapi/0/frontend/vbd/51712/hotplug; /local/domain/0/backend/vbd/0/51712/tapdisk-error ] with timeout 1200.000000 seconds
[20100219 10:32:05.597|debug|culo|6301 unix-RPC|VBD.plug R:c81bcda701f6|xenops] watch: fired on /local/domain/0/backend/vbd/0/51712/tapdisk-error
[20100219 10:32:14.314|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] watch: watching xenstore paths: [ /local/domain/0/backend/vbd/0/51712/shutdown-done; /local/domain/0/error/device/vbd/51712/error ] with timeout 1200.000000 seconds
[20100219 10:32:14.337|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] xenstore-rm /local/domain/0/error/backend/vbd/0
[20100219 10:32:14.337|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] xenstore-rm /local/domain/0/error/device/vbd/51712
[20100219 10:32:14.338|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] watch: fired on /local/domain/0/error/device/vbd/51712/error
[20100219 10:53:48.903|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|helpers] Ignoring exception: INTERNAL_ERROR: [ Xb.Noent ] while Vmops.destroy_domain: Destroying domid 14 guest session
[20100219 10:53:52.048|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14
[20100219 10:53:52.048|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51744
[20100219 10:53:52.085|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14
[20100219 10:53:52.086|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51728
[20100219 10:53:52.122|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14
[20100219 10:53:52.122|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51712
[20100219 10:53:52.127|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vbd/14
[20100219 10:53:52.128|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51760
[20100219 10:53:52.496|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths
[20100219 10:53:52.497|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vif/14
[20100219 10:53:52.497|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vif/0
[20100219 10:53:53.385|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths
[20100219 10:53:53.386|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vif/14
[20100219 10:53:53.386|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vif/1
[20100219 10:53:53.389| warn|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|hotplug] Warning, deleting 'vif' entry from /xapi/14/hotplug/vif/0
[20100219 10:53:53.391| warn|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|hotplug] Warning, deleting 'vif' entry from /xapi/14/hotplug/vif/1
[20100219 11:20:49.766|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|helpers] Ignoring exception: INTERNAL_ERROR: [ Xb.Noent ] while Vmops.destroy_domain: Destroying domid 11 guest session
[20100219 11:20:50.339|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/tap/11
[20100219 11:20:50.339|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/832
[20100219 11:20:50.360|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/tap/11
[20100219 11:20:50.360|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/768
[20100219 11:20:50.366|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/vbd/11
[20100219 11:20:50.366|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/5696
[20100219 11:20:50.753|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths
[20100219 11:20:50.754|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/vif/11
[20100219 11:20:50.754|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vif/1
[20100219 11:20:50.757| warn|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|hotplug] Warning, deleting 'vif' entry from /xapi/11/hotplug/vif/1
[20100219 11:28:13.803|debug|culo|6610 inet-RPC|Connection to VM console R:e9f8b76e8975|console] error: INTERNAL_ERROR: [ Unix.Unix_error(63, "connect", "") ]