ZFS - destroying deduplicated zvol or data set stalls the server. How to recover?
- by ewwhite
I'm using Nexentastor on a secondary storage server running on an HP ProLiant DL180 G6 with 12 Midline (7200 RPM) SAS drives. The system has an E5620 CPU and 8GB RAM. There is no ZIL or L2ARC device.
Last week, I created a 750GB sparse zvol with dedup and compression enabled to share via iSCSI to a VMWare ESX host. I then created a Windows 2008 file server image and copied ~300GB of user data to the VM. Once happy with the system, I moved the virtual machine to an NFS store on the same pool.
Once up and running with my VMs on the NFS datastore, I decided to remove the original 750GB zvol. Doing so stalled the system. Access to the Nexenta web interface and NMC halted. I was eventually able to get to a raw shell. Most OS operations were fine, but the system was hanging on the zfs destroy -r vol1/filesystem command. Ugly. I found the following two OpenSolaris bugzilla entries and now understand that the machine will be bricked for an unknown period of time. It's been 14 hours, so I need a plan to be able to regain access to the server.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924390
and
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=593704962bcbe0743d82aa339988?bug_id=6924824
In the future, I'll probably take the advice given in one of the buzilla workarounds:
Workaround
Do not use dedupe, and do not attempt to destroy zvols that had dedupe enabled.
Update:
I had to force the system to power off. Upon reboot, the system stalls at Importing zfs filesystems. It's been that way for 2 hours now.