C-states and P-states : confounding factors for benchmarking
- by Dave
I was recently looking into a performance issue in the java.util.concurrent (JUC) fork-join pool framework related to particularly long latencies when trying to wake (unpark) threads in the pool. Eventually I tracked the issue down to the power & scaling governor and idle-state policies on x86. Briefly, P-states refer to the set of clock rates (speeds) at which a processor can run. C-states reflect the possible idle states. The deeper the C-state (higher numerical values) the less power the processor will draw, but the longer it takes the processor to respond and exit that sleep state on the next idle to non-idle transition. In some cases the latency can be worse than 100 microseconds. C0 is normal execution state, and P0 is "full speed" with higher Pn values reflecting reduced clock rates. C-states are P-states are orthogonal, although P-states only have meaning at C0. You could also think of the states as occupying a spectrum as follows : P0, P1, P2, Pn, C1, C2, ... Cn, where all the P-states are at C0. Our fork-join framework was calling unpark() to wake a thread from the pool, and that thread was being dispatched onto a processor at deep C-state, so we were observing rather impressive latencies between the time of the unpark and the time the thread actually resumed and was able to accept work. (I originally thought we were seeing situations where the wakee was preempting the waker, but that wasn't the case. I'll save that topic for a future blog entry). It's also worth pointing out that higher P-state values draw less power and there's usually some latency in ramping up the clock (P-states) in response to offered load.
The issue of C-states and P-states isn't new and has been described at length elsewhere, but it may be new to Java programmers, adding a new confounding factor to benchmarking methodologies and procedures. To get stable results I'd recommend running at C0 and P0, particularly for server-side applications. As appropriate, disabling "turbo" mode may also be prudent. But it also makes sense to run with the system defaults to understand if your application exhibits any performance sensitivity to power management policies.
The operating system power management sub-system typically control the P-state and C-states based on current and recent load. The scaling governor manages P-states. Operating systems often use adaptive policies that try to avoid deep C-states for some period if recent deep idle episodes proved to be very short and futile. This helps make the system more responsive under bursty or otherwise irregular load. But it also means the system is stateful and exhibits a memory effect, which can further complicate benchmarking. Forcing C0 + P0 should avoid this issue.