Bug 229621 - ClassLoader Deadlock Occuring with IBM JDK.
Summary: ClassLoader Deadlock Occuring with IBM JDK.
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.2.1   Edit
Hardware: Other Linux
: P3 major (vote)
Target Milestone: 3.4 RC1   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-30 09:50 EDT by Raymond Scott CLA
Modified: 2008-05-21 15:39 EDT (History)
2 users (show)

See Also:
simon_kaegi: review+


Attachments
classname lock patch (7.40 KB, patch)
2008-05-05 12:26 EDT, Thomas Watson CLA
no flags Details | Diff
updated patch (8.74 KB, patch)
2008-05-09 16:23 EDT, Thomas Watson CLA
no flags Details | Diff
3.2 maintenance stream patch (8.39 KB, patch)
2008-05-21 15:39 EDT, Thomas Watson CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Raymond Scott CLA 2008-04-30 09:50:00 EDT
Build ID: M20060921-0945

Steps To Reproduce:
This bug is difficult to reproduce but we do experience on occasion. Details are similar to Bug #227587 only this one happens under the IBM JDK.  In short you need:

1. two threads each holding a classloader lock on
different bundles.
2. The class from each of these threads now tries to access/load a class from the other bundle currently locked by the other thread.
3. This enters into a deadlock situation.

More information can be found in the aforementioned bug.
This bug was opened upon request of the eclipse team as the IBM JDK uses a different locking scheme and thus requested that this bug be tracked separately.

More information:
OS Info:
  OS Level         : Linux 2.6.9-22.0.2.ELsmp

JDK Info:
  J2RE 6.0 IBM J9 2.4 Linux amd64-64 build jvmxa6460-20080328_18302

Eclipse Info:
  Equinox 3.2.1 - eclipse.buildId=M20060921-0945
    
Thread Dumps available upon request.
Comment 1 Thomas Watson CLA 2008-05-01 10:09:55 EDT
This is very much related to bug 212262 and bug 121737.

In bug 121737 we attempted to work around this situation by introducing a mode that forces single threaded class loads.  A new global lock was introduced which would force a thread to give up a lock on a class loader and wait to obtain the global single threaded classload lock.

On modern VMs (1.6 and some 1.5 versions) this approach is useless because the VMs use a native lock that locks the classname for the classloader being used.  In addition some VMs (Sun and JRocket) also abtain the actual ClassLoader lock natively.  This type of lock introduces additional deadlock scenarios (see bug 227587 for a great explaination).

On the IBM VM the native VM still obtains the classname lock for the classloader being used but it does *not* obtain the lock on the ClassLoader object.  One way to solve the deadlock for the IBM VM could be to use the same strategy as the IBM vm for classname locking when the OSGi bundle classloader needs to define the class.  Currently the bundle class loader obtains the class loader lock to make the calls to ClassLoader.findLoadedClass and ClassLoader.defineClass atomic.  This lock is too coarse grain.  We could instead have the thread obtain a lock on the classname it is finding/defining.  But I fear this would still leave room for deadlock to occur with out of order locks from the VM.

Imagine two threads.

Thread A - uses ClassLoader.loadClass to load a class X.  In this case the native VM is *not* initiating the load and does *not* lock the classname.  This is very common in Eclipse for loading class dynamically from Bundles (for example using Bundle.loadClass() in DS or the extension registry to load impl classes).

Thread B - is executing code that needs to use class X.  In this case the native VM is initiating the load and *does* lock the classname.

The following scenario could occur:

Thread B - obtains the X classname lock natively
Thread A - obtains the X classname lock in OSGi Bundle classloader
Thread B - waits to obtain the X classname lock in OSGi Bundle classloader
Thread A - attempts to define class X.  At this point I think a native classname lock is attempted by the native VM before we are allowed to continue (this needs to be confirmed)

At this point we will have a deadlock between Thread A and B.

Comment 2 Thomas Watson CLA 2008-05-05 12:02:51 EDT
(In reply to comment #1)
> The following scenario could occur:
> 
> Thread B - obtains the X classname lock natively
> Thread A - obtains the X classname lock in OSGi Bundle classloader
> Thread B - waits to obtain the X classname lock in OSGi Bundle classloader
> Thread A - attempts to define class X.  At this point I think a native
> classname lock is attempted by the native VM before we are allowed to continue
> (this needs to be confirmed)
> 
> At this point we will have a deadlock between Thread A and B.
> 

This is not a real concern on the current IBM VMs.  I have confirmed that Thread A is allowed to define class X even when another thread B holds the classname lock natively.  I think the classname lock is only used by the native VM to prevent other threads from using the classloader natively to load the class X.

I need to do more investigation on lazy activation scenarios.  We lazy activate bundles outside any OSGi held locks, but the native classname lock could still be held by the VM while we are lazy activating a bundle.  I suspect there could be out of order locks that could cause deadlock for lazy activated bundles.
Comment 3 Thomas Watson CLA 2008-05-05 12:26:36 EDT
Created attachment 98665 [details]
classname lock patch

Here is a patch that implements a classname lock at the OSGi level.  The classname lock is disabled by default (classloader locking is used by default).  To enable classname locking you must use the config property osgi.classloader.lock=classname

There are consequences to using a more fine grained lock.  For example, we must now provide a lock when defining packages to prevent multiple threads from defining the same package using the same class loader.
Comment 4 Thomas Watson CLA 2008-05-05 17:54:36 EDT
(In reply to comment #2)
> I need to do more investigation on lazy activation scenarios.  We lazy activate
> bundles outside any OSGi held locks, but the native classname lock could still
> be held by the VM while we are lazy activating a bundle.  I suspect there could
> be out of order locks that could cause deadlock for lazy activated bundles.
> 

I spend some time coming up with scenarios that could cause deadlock when lazy loading.  So far I have not been able to construct a deadlock case.  It seems that the native classname lock only takes effect up to when the class is actually defined.  Once the class is defined all other threads will see that class and the VM will not block other threads that attempt to load the same classname, instead it just returns the defined Class without even delegating to the OSGi classloader.  We lazy activate bundles after the class is defined and we are not holding any OSGi level classname locks.  The native classname locks no longer matter at this point because we have already defined the class.

This brings up an interesting timing issue for lazy activated bundles.  Imagine the some class Q is loaded from two different threads X and Y by the native VM and  thread X is the first to load and define the class.  Once the class is defined thread X will proceed to start the bundle.  At this point thread Y may need to access class Q.  It will be able to find the already loaded class Q and continue even though thread X may not be done activating the bundle.
Comment 5 Simon Kaegi CLA 2008-05-09 16:02:49 EDT
Patch looks good...
I have two minor tweaks.

In EclipseClassLoadingHook.processClass we should also synchronize on pkgLock the first time we call manager.getBaseClassLoader().publicGetPackage(packageName). [line 69]

In ClasspathManager.lockClassName we should reset the interrupted state of the thread after catching the InterruptedException. [line 486]

+1 with those two changes.
Comment 6 Thomas Watson CLA 2008-05-09 16:23:18 EDT
Created attachment 99562 [details]
updated patch

Updated patch with recommended changes.
Comment 7 Thomas Watson CLA 2008-05-09 16:28:51 EDT
patch released for RC1.  Again note that there is no change in the locking strategy by default.  You must set osgi.classloader.lock=classname for the new classname locking to take effect.

In the future we may want to consider enabling this by default for the latest VMs.  For example, Java SE 7 could fix the class loader locking in the native VM as part of the work in JSR 277.  If these class loader enhancements become part of the Java SE 7 spec then we should enable class name locking by default for Java SE 7 and greater.
Comment 8 Thomas Watson CLA 2008-05-21 15:39:56 EDT
Created attachment 101356 [details]
3.2 maintenance stream patch

Here is a patch against R3_2_maintenance stream.