We have an Apache setup with a huge disk_cache (500.000 entries, 50 GB
disk space used).
The
cache grows by 16 GB every day.
My problem is that the
cache seems to be growing nearly as fast as it's possible to remove files and directories from the
cache filesystem!
The
cache partition is an ext3 filesystem (100GB, "-t news") on an iSCSI storage. The Apache server (which acts as a caching proxy) is a VM. The disk_cache is configured with CacheDirLevels=2 and CacheDirLength=1, and includes variants. A typical file path is "/htcache/B/x/i_iGfmmHhxJRheg8NHcQ.header.vary/A/W/oGX3MAV3q0bWl30YmA_A.header".
When I try to call htcacheclean to tame the
cache (non-daemon mode, "htcacheclean-t -p/htcache -l15G"), IOwait is going through the roof for several hours. Without any visible action. Only after hours, htcacheclean starts to delete files from the
cache partition, which takes a couple more hours. (A similar problem was brought up in the Apache mailing list in 2009, without a solution: http://www.mail-archive.com/
[email protected]/msg42683.html)
The high IOwait leads to problems with the stability of the web server (the bridge to the Tomcat backend server sometimes stalls).
I came up with my own prune script, which removes files and directories from random subdirectories of the
cache. Only to find that the deletion rate of the script is just slightly higher than the
cache growth rate. The script takes ~10 seconds to read the a subdirectory (e.g. /htcache/B/x) and frees some 5 MB of
disk space. In this 10 seconds, the
cache has grown by another 2 MB. As with htcacheclean, IOwait goes up to 25% when running the prune script continuously.
Any idea?
Is this a problem specific to the (rather slow) iSCSI storage?
Should I choose a different file system for a huge disk_cache? ext2? ext4?
Are there any kernel parameter optimizations for this kind of scenario? (I already tried the deadline scheduler and a smaller read_ahead_kb, without effect).