Java 7 introduced support for parallel classloading. A description of that project and its goals can be found here:
http://openjdk.java.net/groups/core-libs/ClassLoaderProposal.html
The solution for parallel classloading was to add to each class loader a ConcurrentHashMap, referenced through a new field, parallelLockMap. This contains a mapping from class names to Objects to use as a classloading lock for that class name. This was then used in
the following way:
protected Class loadClass(String name, boolean resolve)
throws ClassNotFoundException
{
synchronized (getClassLoadingLock(name)) {
// First, check if
the class has already been loaded
Class c = findLoadedClass(name);
if (c == null) {
long t0 = System.nanoTime();
try {
if (parent != null) {
c = parent.loadClass(name, false);
} else {
c = findBootstrapClassOrNull(name);
}
} catch (ClassNotFoundException e) {
// ClassNotFoundException thrown if class not found
// from
the non-null parent class loader
}
if (c == null) {
// If still not found, then invoke findClass in order
// to find
the class.
long t1 = System.nanoTime();
c = findClass(name);
// this is
the defining class loader; record
the stats
sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0);
sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1);
sun.misc.PerfCounter.getFindClasses().increment();
}
}
if (resolve) {
resolveClass(c);
}
return c;
}
}
Where getClassLoadingLock simply does:
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null) {
lock = newLock;
}
}
return lock;
}
This approach is very inefficient in terms of
the space used per map and
the number of maps. First, there is a map per-classloader. As per
the code above under normal delegation
the current classloader creates and acquires a lock for
the given class, checks if it is already loaded, then asks its parent to load it;
the parent in turn creates another lock in its own map, checks if
the class is already loaded and then delegates to its parent and so on till
the boot loader is invoked for which there is no map and no lock. So even in
the simplest of applications, you will have two maps (in
the system and extensions loaders) for every class that has to be loaded transitively from
the application's main class. If you knew before hand which loader would actually load
the class
the locking would only need to be performed in that loader. As it stands
the locking is completely unnecessary for all classes loaded by
the boot loader.
Secondly, once loading has completed and findClass will return
the class,
the lock and
the map entry is completely unnecessary. But as it stands,
the lock objects and their associated entries are never removed from
the map.
It is worth understanding exactly what
the locking is intended to achieve, as this will help us understand potential remedies to
the above inefficiencies. Given this is
the support for parallel classloading,
the class loader itself is unlikely to need to guard against
concurrent load attempts - and if that were not
the case it is likely that
the classloader would need a different means to protect itself rather than a lock per class. Ultimately when a class file is located and
the class has to be loaded, defineClass is called which calls into
the VM -
the VM does not require any locking at
the Java level and uses its own mutexes for guarding its internal data structures (such as
the system dictionary).
The classloader locking is primarily needed to address
the following situation: if two threads attempt to load
the same class, one will initiate
the request through
the appropriate loader and eventually cause defineClass to be invoked. Meanwhile
the second attempt will block trying to acquire
the lock. Once
the class is loaded
the first thread will release
the lock, allowing
the second to acquire it.
The second thread then sees that
the class has now been loaded and will return that class. Neither thread can tell which did
the loading and they both continue successfully. Consider if no lock was acquired in
the classloader. Both threads will eventually locate
the file for
the class, read in
the bytecodes and call defineClass to actually load
the class. In this case
the first to call defineClass will succeed, while
the second will encounter an exception due to an attempted redefinition of an existing class. It is solely for this error condition that
the lock has to be used. (Note that parallel capable classloaders should not need to be doing old deadlock-avoidance tricks like doing a wait() on
the lock object\!).
There are a number of obvious things we can try to solve this problem and they basically take three forms:
Remove
the need for locking. This might be achieved by having a new version of defineClass which acts like defineClassIfNotPresent - simply returning an existing Class rather than triggering an exception.
Increase
the coarseness of locking to reduce
the number of lock objects and/or maps. For example, using a single shared lockMap instead of a per-loader lockMap.
Reduce
the lifetime of lock objects so that entries are removed from
the map when no longer needed (eg remove after loading, use weak references to
the lock objects and cleanup
the map periodically).
There are pros and cons to each of these approaches. Unfortunately a significant "con" is that
the API introduced in Java 7 to support parallel classloading has essentially mandated that these locks do in fact exist, and they are accessible to
the application code (indirectly through
the classloader if it exposes them - which a custom loader might do - and regardless they are accessible to custom classloaders). So while we can reason that we could do parallel classloading with no locking, we can not implement this without breaking
the specification for parallel classloading that was put in place for Java 7. Similarly we might reason that we can remove a mapping (and
the lock object) because
the class is already loaded, but this would again violate
the specification because it can be reasoned that
the following assertion should hold true:
Object lock1 = loader.getClassLoadingLock(name);
loader.loadClass(name);
Object lock2 = loader.getClassLoadingLock(name);
assert lock1 == lock2;
Without modifying
the specification, or at least doing some creative wordsmithing on it, options 1 and 3 are precluded. Even then there are caveats, for example if findLoadedClass is not atomic with respect to defineClass, then you can have
concurrent calls to findLoadedClass from different threads and that could be expensive (this is also an argument against moving findLoadedClass outside
the locked region - it may speed up
the common case where
the class is already loaded, but
the cost of re-executing after acquiring
the lock could be prohibitive. Even option 2 might need some wordsmithing on
the specification because
the specification for getClassLoadingLock states "returns a dedicated object associated with
the specified class name".
The question is, what does "dedicated" mean here? Does it mean unique in
the sense that
the returned object is only associated with
the given class in
the current loader? Or can
the object actually guard loading of multiple classes, possibly across different class loaders?
So it seems that changing
the specification will be inevitable if we wish to do something here. In which case lets go for something that more cleanly defines what we want to be doing: fully
concurrent class-loading.
Note: defineClassIfNotPresent is already implemented in
the VM as find_or_define_class. It is only used if
the AllowParallelDefineClass flag is set. This gives us an easy hook into existing VM mechanics.
Proposal: Fully
Concurrent ClassLoaders
The proposal is that we expand on
the notion of a parallel capable class loader and define a "fully
concurrent parallel capable class loader" or fully
concurrent loader, for short.
A fully
concurrent loader uses no synchronization in loadClass and
the VM uses
the "parallel define class" mechanism.
For a fully
concurrent loader getClassLoadingLock() can return null (or perhaps not - it doesn't matter as we won't use
the result anyway). At present we have not made any changes to this method.
All
the parallel capable JDK classloaders become fully
concurrent loaders. This doesn't require any code re-design as none of
the mechanisms implemented rely on
the per-name locking provided by
the parallelLockMap.
This seems to give us a path to remove all locking at
the Java level during classloading, while retaining full compatibility with Java 7 parallel capable loaders.
Fully
concurrent loaders will still encounter
the performance penalty associated with
concurrent attempts to find and prepare a class's bytecode for definition by
the VM. What this penalty is depends on
the number of
concurrent load attempts possible (a function of
the number of threads and
the application logic, and dependent on
the number of processors), and
the costs associated with finding and preparing
the bytecodes. This obviously has to be measured across a range of applications.
Preliminary webrevs:
http://cr.openjdk.java.net/~dholmes/concurrent-loaders/webrev.hotspot/
http://cr.openjdk.java.net/~dholmes/concurrent-loaders/webrev.jdk/
Please direct all comments to
the mailing list
[email protected].