What happens to missed writes after a zpool clear?
- by Kevin
I am trying to understand ZFS' behaviour under a specific condition, but the documentation is not very explicit about this so I'm left guessing.
Suppose we have a zpool with redundancy. Take the following sequence of events:
A problem arises in the connection between device D and the server. This causes a large number of failures and ZFS therefore faults the device, putting the pool in degraded state.
While the pool is in degraded state, the pool is mutated (data is written and/or changed.)
The connectivity issue is physically repaired such that device D is reliable again.
Knowing that most data on D is valid, and not wanting to stress the pool with a resilver needlessly, the admin instead runs zpool clear pool D. This is indicated by Oracle's documentation as the appropriate action where the fault was due to a transient problem that has been corrected.
I've read that zpool clear only clears the error counter, and restores the device to online status. However, this is a bit troubling, because if that's all it does, it will leave the pool in an inconsistent state!
This is because mutations in step 2 will not have been successfully written to D. Instead, D will reflect the state of the pool prior to the connectivity failure. This is of course not the normative state for a zpool and could lead to hard data loss upon failure of another device - however, the pool status will not reflect this issue!
I would at least assume based on ZFS' robust integrity mechanisms that an attempt to read the mutated data from D would catch the mistakes and repair them. However, this raises two problems:
Reads are not guaranteed to hit all mutations unless a scrub is done; and
Once ZFS does hit the mutated data, it (I'm guessing) might fault the drive again because it would appear to ZFS to be corrupting data, since it doesn't remember the previous write failures.
Theoretically, ZFS could circumvent this problem by keeping track of mutations that occur during a degraded state, and writing them back to D when it's cleared. For some reason I suspect that's not what happens, though.
I'm hoping someone with intimate knowledge of ZFS can shed some light on this aspect.