The Unspoken - The Why of GC Ergonomics
- by jonthecollector
Do you use GC ergonomics, -XX:+UseAdaptiveSizePolicy,
with the UseParallelGC collector? The jist of GC ergonomics
for that collector is
that it tries to grow or shrink the heap to meet a specified goal.
The goals that you can choose are maximum pause time and/or throughput.
Don't get too excited there. I'm speaking about UseParallelGC (the
throughput collector) so there are definite limits to what pause
goals can be achieved. When you say out loud "I don't care about pause times, give me the best throughput I can get" and then say to yourself "Well, maybe 10 seconds really is too long", then think about a pause time goal. By default there is no pause time goal and
the throughput goal is high (98% of the time doing application work
and 2% of the time doing GC work). You can get more details on
this in my very first blog.
GC ergonomics
The UseG1GC has its own version of GC ergonomics, but I'll be talking
only about the UseParallelGC version.
If you use this option and wanted to know what it (GC ergonomics)
was thinking, try
-XX:AdaptiveSizePolicyOutputInterval=1
This will print out information every i-th GC (above i is 1)
about what the GC ergonomics to trying to do. For example,
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
GC overhead (%)
Young generation: 16.10 (attempted to grow)
Tenured generation: 4.67 (attempted to grow)
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
GC ergonomics tries to meet (in order)
Pause time goal
Throughput goal
Minimum footprint
The first line says that it's trying to meet the throughput
goal.
UseAdaptiveSizePolicy actions to meet *** throughput goal ***
This run has the default pause time goal (i.e., no pause time
goal) so it is trying to reach a 98% throughput.
The lines
Young generation: 16.10 (attempted to grow)
Tenured generation: 4.67 (attempted to grow)
say that we're currently spending about 16% of the time doing
young GC's and about 5% of the time doing full
GC's. These percentages are a decaying, weighted average (earlier
contributions to the average are given less weight). The source
code is available as part of the OpenJDK so you can take a look
at it if you want the exact definition. GC ergonomics is trying
to increase the throughput by growing the heap (so says the
"attempted to grow").
The last line
Tenuring threshold: (attempted to decrease to balance GC costs) = 1
says that the ergonomics is trying to balance the GC times between
young GC's and full GC's by decreasing the tenuring threshold.
During a young collection the younger objects are copied to the
survivor spaces while the older objects are copied to the
tenured generation. Younger and older are defined by the
tenuring threshold. If the tenuring threshold hold is 4, an
object that has survived fewer than 4 young collections (and
has remained in the young generation
by being copied to the part of the young generation called a survivor space)
it is younger and copied again to a survivor space. If it has
survived 4 or more young collections, it is older and gets copied to the
tenured generation. A lower tenuring threshold moves objects more
eagerly to the tenured generation and, conversely a higher
tenuring threshold keeps copying objects between survivor spaces
longer. The tenuring threshold varies dynamically with the
UseParallelGC collector. That is different than our other
collectors which have a static tenuring threshold. GC ergonomics
tries to balance the amount of work done by the young GC's and
the full GC's by varying the tenuring threshold. Want more work
done in the young GC's? Keep objects longer in the survivor
spaces by increasing the tenuring threshold.
This is an example of the output when GC ergonomics is trying to
achieve a pause time goal
UseAdaptiveSizePolicy actions to meet *** pause time goal ***
GC overhead (%)
Young generation: 20.74 (no change)
Tenured generation: 31.70 (attempted to shrink)
The pause goal was set at 50 millisecs and the last GC was
0.415: [Full GC (Ergonomics) [PSYoungGen: 2048K-0K(26624K)] [ParOldGen: 26095K-9711K(28992K)] 28143K-9711K(55616K), [Metaspace: 1719K-1719K(2473K/6528K)], 0.0758940 secs] [Times: user=0.28 sys=0.00, real=0.08 secs]
The full collection took about 76 millisecs so GC ergonomics wants to shrink
the tenured generation to reduce that pause time.
The previous young GC was
0.346: [GC (Allocation Failure) [PSYoungGen: 26624K-2048K(26624K)] 40547K-22223K(56768K), 0.0136501 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
so the pause time there was about 14 millisecs so no changes are
needed.
If trying to meet a pause time goal, the generations are typically
shrunk. With a pause time goal in play, watch the GC
overhead numbers and you will usually see the cost of setting
a pause time goal (i.e., throughput goes down). If the pause goal
is too low, you won't achieve your pause time goal and you will
spend all your time doing GC.
GC ergonomics is meant to be simple because it is meant to be
used by anyone. It was not meant to be mysterious and so
this output was added. If you don't like what GC ergonomics is
doing, you can turn it off with -XX:-UseAdaptiveSizePolicy, but
be pre-warned that you have to manage the size of the generations
explicitly. If UseAdaptiveSizePolicy is turned off, the heap
does not grow. The size of the heap (and the generations) at
the start of execution is always the size of the heap. I don't
like that and tried to fix it once (with some help from an
OpenJDK contributor) but it unfortunately never made it out
the door. I still have hope though.
Just a side note. With the default throughput goal of 98% the heap
often grows to it's maximum value and stays there. Definitely reduce
the throughput goal if footprint is important. Start with -XX:GCTimeRatio=4
for a more modest throughput goal (%20 of the time spent in GC). A higher
value means a smaller amount of time in GC (as the throughput goal).