We have an Apache setup with a huge disk_cache (500.000 entries, 50 GB disk
space used).
The cache grows by 16 GB every day.
My problem is that
the cache seems to be growing nearly as fast as it's possible to remove files and directories from
the cache filesystem!
The cache partition is an ext3 filesystem (100GB, "-t news") on an iSCSI storage.
The Apache server (which acts as a caching proxy) is a VM.
The disk_cache is configured with CacheDirLevels=2 and CacheDirLength=1, and includes variants. A typical file path is "/htcache/B/x/i_iGfmmHhxJRheg8NHcQ.header.vary/A/W/oGX3MAV3q0bWl30YmA_A.header".
When I try to call htcacheclean to tame
the cache (non-daemon mode, "htcacheclean-t -p/htcache -l15G"), IOwait is going through
the roof for several hours. Without any visible action. Only after hours, htcacheclean starts to delete files from
the cache partition, which takes a couple more hours. (A similar problem was brought up in
the Apache mailing list in 2009, without a solution: http://www.mail-archive.com/
[email protected]/msg42683.html)
The high IOwait leads to problems with
the stability of
the web server (the bridge to
the Tomcat backend server sometimes stalls).
I came up with my own prune script, which removes files and directories from random subdirectories of
the cache. Only to find that
the deletion rate of
the script is just slightly higher than
the cache growth rate.
The script takes ~10 seconds to read
the a subdirectory (e.g. /htcache/B/x) and frees some 5 MB of disk
space. In this 10 seconds,
the cache has grown by another 2 MB. As with htcacheclean, IOwait goes up to 25% when running
the prune script continuously.
Any idea?
Is this a problem specific to
the (rather slow) iSCSI storage?
Should I choose a different file system for a huge disk_cache? ext2? ext4?
Are there any kernel parameter optimizations for this kind of scenario? (I already tried
the deadline scheduler and a smaller read_ahead_kb, without effect).