What free space thresholds/limits are advisable for 640 GB and 2 TB hard disk drives with ZEVO ZFS on OS X?
- by Graham Perrin
Assuming that free space advice for ZEVO will not differ from advice for other modern implementations of ZFS …
Question
Please, what percentages or amounts of free space are advisable for hard disk drives of the following sizes?
640 GB
2 TB
Thoughts
A standard answer for modern implementations of ZFS might be "no more than 96 percent full". However if apply that to (say) a single-disk 640 GB dataset where some of the files most commonly used (by VirtualBox) are larger than 15 GB each, then I guess that blocks for those files will become sub optimally spread across the platters with around 26 GB free.
I read that in most cases, fragmentation and defragmentation should not be a concern with ZFS. Sill, I like the mental picture of most fragments of a large .vdi in reasonably close proximity to each other. (Do features of ZFS make that wish for proximity too old-fashioned?)
Side note: there might arise the question of how to optimise performance after a threshold is 'broken'. If it arises, I'll keep it separate.
Background
On a 640 GB StoreJet Transcend (product ID 0x2329) in the past I probably went beyond an advisable threshold. Currently the largest file is around 17 GB –
– and I doubt that any .vdi or other file on this disk will grow beyond 40 GB. (Ignore the purple masses, those are bundles of 8 MB band files.)
Without HFS Plus: the thresholds of twenty, ten and five percent that I associate with Mobile Time Machine file system need not apply.
I currently use ZEVO Community Edition 1.1.1 with Mountain Lion, OS X 10.8.2, but I'd like answers to be not too version-specific.
References, chronological order
ZFS Block Allocation (Jeff Bonwick's Blog) (2006-11-04)
Space Maps (Jeff Bonwick's Blog) (2007-09-13)
Doubling Exchange Performance (Bizarre ! Vous avez dit Bizarre ?) (2010-03-11)
… So to solve this problem, what went in 2010/Q1 software release is
multifold. The most important thing is: we increased the threshold at
which we switched from 'first fit' (go fast) to 'best fit' (pack
tight) from 70% full to 96% full. With TB drives, each slab is at
least 5GB and 4% is still 200MB plenty of space and no need to do
anything radical before that. This gave us the biggest bang. Second,
instead of trying to reuse the same primary slabs until it failed an
allocation we decided to stop giving the primary slab this
preferential threatment as soon as the biggest allocation that could
be satisfied by a slab was down to 128K
(metaslab_df_alloc_threshold). At that point we were ready to switch
to another slab that had more free space. We also decided to reduce
the SMO bonus. Before, a slab that was 50% empty was preferred over
slabs that had never been used. In order to foster more write
aggregation, we reduced the threshold to 33% empty. This means that a
random write workload now spread to more slabs where each one will
have larger amount of free space leading to more write aggregation.
Finally we also saw that slab loading was contributing to lower
performance and implemented a slab prefetch mechanism to reduce down
time associated with that operation.
The conjunction of all these changes lead to 50% improved OLTP and 70%
reduced variability from run to run …
OLTP Improvements in Sun Storage 7000 2010.Q1 (Performance Profiles) (2010-03-11)
Alasdair on Everything » ZFS runs really slowly when free disk usage goes above 80% (2010-07-18) where commentary includes:
… OpenSolaris has changed this in onnv revision 11146 …
[CFT] Improved ZFS metaslab code (faster write speed) (2010-08-22)