Diagnosing SAN connectivity issues (RHEL5)
- by Matthew
We are currently utilizing GFS2 to share a SAN LUN between 3 servers. However due to a feature problem with vendor software we are using, we currently have the volume unmounted on two of the boxes, and are instead exporting the GFS2 filesystem via NFS from the first one (the software requires some weird locking mechanics that GFS2 doesn't support).
As of this morning, NFS was no longer able to read/write to the volume from any of the servers, including the NFS server. I then tried checking the normal mount (the directory that is exported on the NFS server) and I received a weird input/output error just trying to CD into it. When I tried running multipath, I got a DM error, however multipath -l worked just fine. I tried unmounting the GFS2 volume, and the CLI hung. I ran init 0 which killed most services, but then the shutdown appeared to have been hung. I logged in via out of band access (hp ILO) and saw that the shutdown was hung trying to unmount GFS2 volumes.
My main priority was getting the box back online so after about 5 minutes of waiting I did a hard reset. I am now trying to figure out what went wrong. What are the correct logs to investigate? I've never run into SAN issues like this before. The SAN is connected via 2 fibre connections. Any help would really be appreciated. Everything appears to be up and functional now.