Troubleshooting failover cluster problem in W2K8 / SQL05
- by paulland
I have an active/passive W2K8 (64) cluster pair, running SQL05 Standard. Shared storage is on a HP EVA SAN (FC).
I recently expanded the filesystem on the active node for a database, adding a drive designation. The shared storage drives are designated as F:, I:, J:, L: and X:, with SQL filesystems on the first 4 and X: used for a backup destination.
Last night, as part of a validation process (the passive node had been offline for maintenance), I moved the SQL instance to the other cluster node. The database in question immediately moved to Suspect status.
Review of the system logs showed that the database would not load because the file "K:\SQLDATA\whatever.ndf" could not be found. (Note that we do not have a K: drive designation.)
A review of the J: storage drive showed zero contents -- nothing -- this is where "whatever.ndf" should have been.
Hmm, I thought. Problem with the server. I'll just move SQL back to the other server and figure out what's wrong..
Still no database. Suspect. Uh-oh. "Whatever.ndf" had gone into the bit bucket.
I finally decided to just restore from the backup (which had been taken immediately before the validation test), so nothing was lost but a few hours of sleep.
The question: (1) Why did the passive node think the whatever.ndf files were supposed to go to drive "K:", when this drive didn't exist as a resource on the active node?
(2) How can I get the cluster nodes "re-syncd" so that failover can be accomplished?
I don't know that there wasn't a "K:" drive as a cluster resource at some time in the past, but I do know that this drive did not exist on the original cluster at the time of resource move.