Issues with Server 2012 using DFSR running on Hyper-V 2012
- by Bryan
We have a number of Server 2012 systems, all of which run virtualised on Hyper-V 2012 server. We are having problems with two such virtual instances, both of which are used as file servers, whereby they occasionally stop responding to requests to serve files to clients. After logging on to the server, attempts to shut it down gracefully fail (no error, it just fails to acknowledge a shutdown request).
Recovery is a case of power cycling the server(s) from the Hyper-V console.
These two servers don't server a large number of users (one serves no more than 6 users, and the other serves around 20 users), they are in the same domain, but on different physical hardware (and at different sites). They don't lock up at the same time. They both use DFSR to replicate a fairly large amount of data between themselves (200GB) over ADSL connections, this is working fine, and we have been using DFSR to do this on the previous two generations of server OS we have used (Server 2008 R2 and Server 2003 - both of which were physical installs however).
Today, when one of the servers crashed, I noticed an entry in the event log, which looked similar to the following:
Log Name: Application
Source: ESENT
Date: 27/11/2012 10:25:55
Event ID: 533
Task Category: General
Level: Warning
Keywords: Classic
User: N/A
Computer: HAL-FS-01.example.com
Description:
DFSRs (1500) \\.\E:\System Volume Information\DFSR\database_C8CC_101_CC00_EC0E\
dfsr.db: A request to write to the file "\\.\E:\System Volume Information\
DFSR\database_C8CC_101_CC00_EC0E\fsr.log" at offset 4423680 (0x0000000000438000)
for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is
likely due to faulty hardware. Please contact your hardware vendor for further
assistance diagnosing the problem.
When the server started up again, I went to find the event log entry to investigate further and found that the event log entry was no longer there (I assume it was in memory but failed to write to disk before the server was powered off, for the reason mentioned in the message). I found the above message by searching back further in the event log.
Both of these virtual servers have their E: volumes fully allocated as opposed to dynamically expanding, and there are no other issues on any of the other virtual servers (which include server 2012, server 2008 R2 and Ubuntu 12.04 x64). There are no signs of IO, memory or CPU starvation on the host systems.
I've used performance counters on the affected virtual servers to monitor memory usage (including non paged pool usage), as well as CPU and network utilisation, and none of these show any signs of trouble when the issue arises.
I would have thought our configuration isn't that uncommon, so I'm wondering if anyone else has seen this, and managed to resolve the problem?