Bad disks in ancient server
- by Joel Coel
I have a 1998-era Netware 3.12 server that runs everything on our campus: general ledger, purchasing, payroll, student information, grades, you name it. The server has an Adaptec RAID controller with two volumes:
RAID 1, 2 17GB scsi disks, Seagate ST318417W
RAID 5, 3 4GB scsi disks, 2 Seagate ST34573W and 1 ST34572W.
We are currently in the early stages of a project to replace this system, but you don't just jump into a new system like that and so I need to keep this server running until at least November 2011.
This week we had not one but two hard drives fail. Thankfully they are from different volumes and we're able to keep running for the moment, but given the close nature of these failures I have serious doubts that I'll be able to avoid catastrophic failure from this server through the November target as is without restoring the RAID redundancy — it'll only take one more drive failure anywhere and I'm completely hosed.
We are fortunate enough to have exact match "spares" lying around for both drives, but the spares are in unknown condition. I tried swapping just them in, but the RAID controller isn't smart enough to handle this and it renders the system unbootable.
As for the RAID controller itself, there is utility I can get into during POST via a Ctrl-A shortcut, but I can't do much useful from there. To actually manage volumes I must first boot in to Netware, at which point I can use CI/O Array Management Software Version 2.0 to actually look at volume information. I suspect that the normal way to manage things is to boot from a special floppy with the controller software on it, but that floppy is long gone.
Going through the options in the RAID software, I think the only supported way to replace a disk in an existing RAID volume is to physically add the disk, boot up and configure it as a "spare" for a volume, force the volume to use the spare to replace an existing down disk (and at this point I'm only guessing) so that the down disk becomes the spare, repair the volume, remove the spare from the volume, and then shut down and remove the disk. Then start all over for the other failed disk. All this amounts to a lot of downtime, assuming I can even make it work and that my spares are any good.
As for finding reliable spares, I have no clue where to even begin looking to find a new 4GB scsi drive, or even which exact scsi system I'm looking for, as it's gone through a few different iterations over time.
Another option is to migrate this to a virtual machine (hyper-v), but all previous attempts we've made in this area have failed to get very far. When this machine was installed I was just graduating from high school, and so it requires lower level knowledge of netware and dos than I ever developed, or if I did have since forgotten (I'm not exactly a dos neophyte, either).
Part of my problem is this is a high-use server, and taking it down for a few days to figure things out isn't gonna fly very well.
As for the question, I'm looking for anything that might be helpful in this situation: a recommendation on a place to find good spares from this era, personal experience repairing RAID volumes using a similar controller or building a hyper-v vm from an old netware server, a line on a floppy with better software for the RAID controller, recommendation on a good Novell consultant in Nebraska that would be able to put things right, a whole other option I haven't considered yet, etc.
Update:
For backups, we have good (recently verified via restore) backups of the data only -- nothing for the software that actually runs things.
Update 2:
Just a progress report that I currently have a working Netware 3.12 install in VMWare Virtual Server 2.0, thanks largely to the guide I found here:
http://cerbulescubogdan.blogspot.com/2010/11/novell-netware-312-on-vmware.html
The next steps are preparing empty netware volumes to match the additional volumes on my existing server, taking a dump of everything on the C:\ drive and netware volumes on my existing server, and figuring out from that information what modules need added to netware, installing my licenses (we do still have that disk, if it's any good), and moving data over.
I have approval to bring the server down for a week after the first of the year (sadly not before), so, aside from creating empty volumes, the rest of the work will have to wait until then.
Final Update (Jan 5, 2011):
I was able to get spares working in both raid arrays without data loss this week. Both are now listed by the controller as "FAULT TOLLERANT" (yay!). I was also able to build on the progress from my last update and now have a functional "spare" server in VMWare Server 2.0. The spare can run and use our erp software, but I can't put it into production because I can't (yet) print from that box (and I have no idea why). Even so, this VM will do in a pinch if I have no other choice, and between it and the repaired RAID arrays I'm comfortable pushing on until I can junk the machine in November.