NFS v4, HA Migration, and stale handles on clients

Posted by Karl Katzke on Server Fault See other posts from Server Fault or by Karl Katzke
Published on 2009-06-05T12:55:49Z Indexed on 2012/09/11 3:40 UTC
Read the original article Hit count: 588

I'm managing a server running NFS v4 with Pacemaker/OpenAIS. NFS is configured to use TCP. When I migrate the NFS server to another node in the Pacemaker cluster, even though the metadata is persisted, connections from the clients 'hang' and eventually time out after 90 seconds. After that 90 seconds, the old mountpoint becomes 'stale' and the mounted files can no longer be accessed.

The 90 second grace period seems to be part of the server configuration and not the client configuration. I see this message on the server:

kernel: NFSD: starting 90-second grace period

If I restart the NFS client on the client nodes after I migrate (unmounting and then remounting the share), then I don't experience the problem, but connections and file transfers still interrupted.

Three questions:

  1. What is the 90 second grace period? What's it there for?
  2. How can I keep the files from going stale on the clients without restarting them after I migrate the NFS server to another node?
  3. Is it actually possible to migrate the NFS server without having large file uploads drop?

© Server Fault or respective owner

Related posts about migration

Related posts about nfs