Java 7 introduced support for parallel classloading. A description of that project and its goals can be found here:
http://openjdk.java.net/groups/core-libs/ClassLoaderProposal.html
The solution for parallel classloading was to add to each class loader a ConcurrentHashMap, referenced through a new field, parallelLockMap. This contains a mapping from class names to Objects to use as a classloading lock for that class name. This was then used in the following way:
protected Class loadClass(String name, boolean resolve)
throws ClassNotFoundException
{
synchronized (getClassLoadingLock(name)) {
// First, check if the class has already been loaded
Class c = findLoadedClass(name);
if (c == null) {
long t0 = System.nanoTime();
try {
if (parent != null) {
c = parent.loadClass(name, false);
} else {
c = findBootstrapClassOrNull(name);
}
} catch (ClassNotFoundException e) {
// ClassNotFoundException thrown if class not found
// from the non-null parent class loader
}
if (c == null) {
// If still not found, then invoke findClass in order
// to find the class.
long t1 = System.nanoTime();
c = findClass(name);
// this is the defining class loader; record the stats
sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0);
sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1);
sun.misc.PerfCounter.getFindClasses().increment();
}
}
if (resolve) {
resolveClass(c);
}
return c;
}
}
Where getClassLoadingLock simply does:
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null) {
lock = newLock;
}
}
return lock;
}
This approach is very inefficient in terms of the space used per map and the number of maps. First, there is a map per-classloader. As per the code above under normal delegation the current classloader creates and acquires a lock for the given class, checks if it is already loaded, then asks its parent to load it; the parent in turn creates another lock in its own map, checks if the class is already loaded and then delegates to its parent and so on till the boot loader is invoked for which there is no map and no lock. So even in the simplest of applications, you will have two maps (in the system and extensions loaders) for every class that has to be loaded transitively from the application's main class. If you knew before hand which loader would actually load the class the locking would only need to be performed in that loader. As it stands the locking is completely unnecessary for all classes loaded by the boot loader.
Secondly, once loading has completed and findClass will return the class, the lock and the map entry is completely unnecessary. But as it stands, the lock objects and their associated entries are never removed from the map.
It is worth understanding exactly what the locking is intended to achieve, as this will help us understand potential remedies to the above inefficiencies. Given this is the support for parallel classloading, the class loader itself is unlikely to need to guard against concurrent load attempts - and if that were not the case it is likely that the classloader would need a different means to protect itself rather than a lock per class. Ultimately when a class file is located and the class has to be loaded, defineClass is called which calls into the VM - the VM does not require any locking at the Java level and uses its own mutexes for guarding its internal data structures (such as the system dictionary). The classloader locking is primarily needed to address the following situation: if two threads attempt to load the same class, one will initiate the request through the appropriate loader and eventually cause defineClass to be invoked. Meanwhile the second attempt will block trying to acquire the lock. Once the class is loaded the first thread will release the lock, allowing the second to acquire it. The second thread then sees that the class has now been loaded and will return that class. Neither thread can tell which did the loading and they both continue successfully. Consider if no lock was acquired in the classloader. Both threads will eventually locate the file for the class, read in the bytecodes and call defineClass to actually load the class. In this case the first to call defineClass will succeed, while the second will encounter an exception due to an attempted redefinition of an existing class. It is solely for this error condition that the lock has to be used. (Note that parallel capable classloaders should not need to be doing old deadlock-avoidance tricks like doing a wait() on the lock object\!).
There are a number of obvious things we can try to solve this problem and they basically take three forms:
Remove the need for locking. This might be achieved by having a new version of defineClass which acts like defineClassIfNotPresent - simply returning an existing Class rather than triggering an exception.
Increase the coarseness of locking to reduce the number of lock objects and/or maps. For example, using a single shared lockMap instead of a per-loader lockMap.
Reduce the lifetime of lock objects so that entries are removed from the map when no longer needed (eg remove after loading, use weak references to the lock objects and cleanup the map periodically).
There are pros and cons to each of these approaches. Unfortunately a significant "con" is that the API introduced in Java 7 to support parallel classloading has essentially mandated that these locks do in fact exist, and they are accessible to the application code (indirectly through the classloader if it exposes them - which a custom loader might do - and regardless they are accessible to custom classloaders). So while we can reason that we could do parallel classloading with no locking, we can not implement this without breaking the specification for parallel classloading that was put in place for Java 7. Similarly we might reason that we can remove a mapping (and the lock object) because the class is already loaded, but this would again violate the specification because it can be reasoned that the following assertion should hold true:
Object lock1 = loader.getClassLoadingLock(name);
loader.loadClass(name);
Object lock2 = loader.getClassLoadingLock(name);
assert lock1 == lock2;
Without modifying the specification, or at least doing some creative wordsmithing on it, options 1 and 3 are precluded. Even then there are caveats, for example if findLoadedClass is not atomic with respect to defineClass, then you can have concurrent calls to findLoadedClass from different threads and that could be expensive (this is also an argument against moving findLoadedClass outside the locked region - it may speed up the common case where the class is already loaded, but the cost of re-executing after acquiring the lock could be prohibitive. Even option 2 might need some wordsmithing on the specification because the specification for getClassLoadingLock states "returns a dedicated object associated with the specified class name". The question is, what does "dedicated" mean here? Does it mean unique in the sense that the returned object is only associated with the given class in the current loader? Or can the object actually guard loading of multiple classes, possibly across different class loaders?
So it seems that changing the specification will be inevitable if we wish to do something here. In which case lets go for something that more cleanly defines what we want to be doing: fully concurrent class-loading.
Note: defineClassIfNotPresent is already implemented in the VM as find_or_define_class. It is only used if the AllowParallelDefineClass flag is set. This gives us an easy hook into existing VM mechanics.
Proposal: Fully Concurrent ClassLoaders
The proposal is that we expand on the notion of a parallel capable class loader and define a "fully concurrent parallel capable class loader" or fully concurrent loader, for short.
A fully concurrent loader uses no synchronization in loadClass and the VM uses the "parallel define class" mechanism.
For a fully concurrent loader getClassLoadingLock() can return null (or perhaps not - it doesn't matter as we won't use the result anyway). At present we have not made any changes to this method.
All the parallel capable JDK classloaders become fully concurrent loaders. This doesn't require any code re-design as none of the mechanisms implemented rely on the per-name locking provided by the parallelLockMap.
This seems to give us a path to remove all locking at the Java level during classloading, while retaining full compatibility with Java 7 parallel capable loaders.
Fully concurrent loaders will still encounter the performance penalty associated with concurrent attempts to find and prepare a class's bytecode for definition by the VM. What this penalty is depends on the number of concurrent load attempts possible (a function of the number of threads and the application logic, and dependent on the number of processors), and the costs associated with finding and preparing the bytecodes. This obviously has to be measured across a range of applications.
Preliminary webrevs:
http://cr.openjdk.java.net/~dholmes/concurrent-loaders/webrev.hotspot/
http://cr.openjdk.java.net/~dholmes/concurrent-loaders/webrev.jdk/
Please direct all comments to the mailing list
[email protected].