Tweaking Hudson memory usage
- by rovarghe
Hudson 3.1 has some performance optimizations that greatly reduces its memory footprint. Prior to this Hudson used to always hold the entire data model (all jobs and all builds) in memory which affected scalability. Some installations configured heap sizes in excess of 1GB to counteract this. Hudson 3.1.x maintains an MRU cache and only loads jobs and builds as they are required. Because of the inability to change existing APIs and be backward compatible with plugins, there were limits to how far we could go with this approach.
Memory optimizations almost always come with a related cost, in this case its additional I/O that has to be performed to load data on request. On a small site that has frequent traffic, this is usually not noticeable since the MRU cache will usually hold on to all the data. A large site with infrequent traffic might experience some delays when the first request hits the server after a long gap. If you have a large heap and are able to allocate more memory, the cache settings can be adjusted to take advantage of this and even go back to pre-3.1 behavior.
All the cache settings can be passed as options to the JVM container (Tomcat or the default Jetty container) using the -D option. There are two caches, independant of each other, one for Jobs and the other for Builds.
For the jobs cache:
hudson.jobs.cache.evict_in_seconds ( default=60 )
Seconds from last access (could be because of a servlet request or a background cron thread) a job should be purged from the cache.
Set this to 0 to never purge based on time.
hudson.jobs.cache.initial_capacity ( default=1024 )
Initial number of jobs the cache can accomodate. Setting this to the number of jobs you typically display on your Hudson landing page or home page will speed up consecutive access to that page. If the default is too large you may consider downsizing and using that memory for the Builds cache instead.
hudson.jobs.cache.max_entries ( default=1024)
Maximum number of jobs in the cache. The default is large enough for most installations, but if you find I/O activity when always accessing the hudson home page you might consider increasing this, but first verify if the I/O is caused by frequent eviction (see above), rather than by the cache not being large enough.
For the builds cache:
The builds cache is used to store Build objects as they are read from storage. Typically this happens when a user drills down into the details of a particular Job from the hudson hom epage. The cache is shared among builds for different jobs since in most installations all jobs are not accessed with the same frequency, so a per-job builds cache would be a waste of memory.
hudson.job.builds.cache.evict_in_seconds ( default=60 )
Same as the equivalent Job cache, applied to Build.
hudson.job.builds.cache.initial_capacity" ( default=512 )
Same as equivalent Job cache setting. Note the smaller initial size. If your site stores a large number of builds and has frequent access to more builds you might consider bumping this up.
hudson.job.builds.cache.max_entries ( default=10240 )
The default max is large enough for most installations, the builds cache has bigger sized objects, so be careful about increasing the upper limit on this. See section on monitoring below.
Sample usage:
java -jar hudson-war-3.1.2-SNAPSHOT.war -Dhudson.jobs.cache.evict_in_seconds=300 \
-Dhudson.job.builds.cache.evict_in_seconds=300
Monitoring cache usage
The 'jmap' tool that comes with the JDK can be used to monitor cache performance in an indirect way by looking at the number of Job and Build objects in each cache.
Find the PID of the hudson instance and run
$ jmap -histo:live <pid | grep 'hudson.model.*Lazy.*Key$'
Here's a sample output:
num #instances #bytes class name
523: 28 896 hudson.model.RunMap$LazyRunValue$Key
1200: 3 96 hudson.model.LazyTopLevelItem$Key
These are the keys to the Jobs (LazyTopLevelItem$Key) and Builds (RunMap$LazyRunValue$Key) in the caches, so counting the number of keys is a good indicator of the number of items in the cache at any given moment. The size in bytes can be ignored, they are just the size of the keys, not the actual sizes of the objects they hold. Those sizes can only be obtained with a profiler. With the output above we can conclude that there are 3 jobs and 28 builds in memory. The 28 builds can all be from 1 job or all 3 jobs. Over time on an idle system, these should get evicted and memory cache should be empty. In practice, because of background cron threads and triggers, jobs rarely fall down to zero. Access of a job or a build by a cron thread resets the eviction timer.