We have a server 2003 R2 standard (which I'll refer to as SRV01) that's knocking on a bit now, but it still acts as a file, print and SQL server on our company's network. SRV01 hosts user profiles, home directories and pretty much all our business data. Note our AD is currently at 2008 R2 level.
This server is due to be upgraded in the next 12 months, but I've no budget to spend on it just yet.
A bit of history of this server follows:
When SRV01 was first commissioned, it acted as a domain controller (with the same 2003 R2 install it has today), paired with another server that ran Server 2003 R2 SBS.
A few years ago, we purchased a pair of dedicated DCs (2008 R2) and at this point we decommissioned the 2003 SBS server, and SRV01 was DCPROMOed out of the AD.
Up until very recently, SRV01 used to run Exchange 2003, however we've recently purchased a dedicated server for Exchange 2010 and upgraded (following Microsoft recommended upgrade path). Exchange 2003 was recently uninstalled. - Cleanly to the best of my knowledge.
Ever since Exchange was removed from SRV01, I'm finding that after a few days of uptime, when I attempt to logon, pressing CTRL-ALT-DEL just hides the Welcome to Windows Server 2003 banner, and never presents the logon dialog. All I see is a moveable mouse pointer and a blank background.
It's a similar story with an admin TS session, the RDP client connects and gives me a blank background, but no logon dialog is presented. The RDP session indefinitely hangs until I give up and close it.
The only way I've been able to gain access to the server is to pull the plug on it. Whilst the server does have a battery backed up RAID 5 controller, I'm unhappy about having to do this, so as a temporary measure, I've created a scheduled job to reboot SRV01 each night.
Not only do I not like the idea of scheduling a reboot of a server like this, but it is also causing problems for users that leave desktop PCs left logged on overnight. Users complain of 'Delayed Write Failures', and there has also been a number of users that have started to complain about account lockout problems, as well as users not able to connect to shares on SRV01 until they reboot their desktop PCs.
I've examined event logs on SRV01 and on the DCs looking for clues as to what the problem is, but there really is nothing untoward being logged. How could I being to investigate this problem when nothing of any relevance is being logged? Is there some additional logging that can be enabled that might give some clues as to what could be causing this problem? Could performance monitor help me out here, and if so, what counters would you consider monitoring?
It's worth mentioning that whilst the server is unresponsive via the console and TS, it does still respond to clients connecting to shares without problems for several days, but after about a week I then start to hear users reporting problems accessing shares, but this seems quite sporadic.
I've also tried leaving the console logged on (and locked), when I notice I can no longer logon via TS, I can unlock the server console without problem, but it refuses to reboot/shutdown, and subsequent attempts to reboot report that a system shutdown is already in progress and the system then completely hangs.
I've tried playing the waiting game for several hours thinking that a timeout might allow the shutdown to continue, but to no avail.