Troubleshooting an NFS server hanging after authenticated mount request
- by Christoph
I need some advice on troubleshooting an NFS server problem on Scientific Linux (RHEL) 6.1. The log on the server shows that an authenticated mount request was made:
Jan 13 16:30:02 ??? rpc.mountd[3996]: authenticated mount request from ????:784 for /shared-storage/cm/shared (/shared-storage/cm/shared)
But after that, it does not continue. On the client, it is also hanging. The interesting thing now is that I have two NFS servers, which should be identical, and the one is working perfectly, but the other exhibits the above mentioned behaviour. The problem is also not completely persistent, i. e. sometimes the mount request succeeds.
I assume that the problem must be related to the server rather than to the client, because it is working perfectly on the other server. My question is where I should search the problem. I have already re-created the exports using exportfs -r, I have restarted the NFS server, I have compared the rpcinfo outputs of both server - no success. The problem even survives a reboot. Any other ideas are appreciated.
As answer to Tim's question: I have sporadically the following in dmesg, but do not know whether it is related
e1000e 0000:0c:00.0: eth4: Detected Hardware Unit Hang:
TDH <24>
TDT <25>
next_to_use <25>
next_to_clean <24>
buffer_info[next_to_clean]:
time_stamp <1c3d12940>
next_to_watch <24>
jiffies <1c3d12940>
next_to_watch.status <0>
MAC Status <80383>
PHY Status <792d>
PHY 1000BASE-T Status <7800>
PHY Extended Status <3000>
PCI Status <10>
Further edit: The problem above does not occur on the machine that is working, so it probably is related.
Again an edit: The error is not on the (software) device that is used for NFS, but on another one. The NFS mount also does not trigger the message.