Intermittent Windows Server 2008 BSOD and restart
- by Timka
Our EC2 Instance (Windows Server 2008) crashed multiple times for the past 3 months (last time was today at 1:05 EST). Upon reviewing MEMORY.DMP file we noticed that possible cause of the crashes is rhelnet.sys (RedHat PV NIC Driver).
Server's Event Viewer has the following records right after the crash:
Critical - Kernel Power:
The system has rebooted without cleanly shutting down first.
This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
BugCheck:
The computer has rebooted from a bugcheck. The bugcheck was:
0x000000d1 (0x000000000000002d, 0x0000000000000002, 0x0000000000000000, 0xfffff88001402d14).
A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 100113-35849-01.
Could this be a hardware issue? Would it help if we stop and start the instance? Or is this more likely that this is caused by the software running on the system?
[Update 10.01.2013]
Amazon Rep suggested to update RH drivers to Citrix PV drivers on our instance:
Upgrading PV Drivers
[Update 10.08.2013]
We performed a drivers upgrade on the cloned instance. Right after the upgrade we noticed the following errors in our Event viewer:
Xennet6 errors in Event Viewer (Event ID# 5001)
After digging a bit more I found this article suggesting to install the latest Citrix drivers. Unfortunately, this didn't help us at all and our cloned instance became unresponsive.
[Update 10.08.2013 2]
I recreated an instance and updated PV drivers again.
After searching on Internet I found this article where Amazon Rep explains that:
"Event ID 5001 from source Xennet6 cannot be found" message does not
indicate anything wrong, just that the PV driver is looking for a feature
that we have not implemented in our version of Xen.
I will keep my test system running for a while to see if there any issues with it.