Community
Participate
Working Groups
Build ID: I20070625-1500 Steps To Reproduce: 1. Have a sufficiently complex application 2. Run it many times (this happens approximately 1/50 runs) 2a. Use Sun JDK 1.5.0_12 or Jrockit 1.5.0_12 3. Wait until it gets stuck More information: I will attach a file that contains the stack dumps when this happens. In the Sun case it seems to get stuck in "defineClass" while in the jrockit case it seems to get stuck elsewhere. When this happens we see several threads stuck waiting for the Classloader. Other observations: 1. We have not been able to reproduce this problem when we set osgi.classloader.singleThreadLoads=false (we went up to 3000 4 minute runs). 2. The application we have uses Spring, ActiveMQ and our own declarative system for wiring things. It is fairly complex. We have *not* seen this with less complex suites. 3. This is *not* the PermGen bug described here: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6320642. We have run this test with the PermGen space jacked high (256m) and the problem still occurs. Furthermore, one of the characteristics of that failure is the jdk spins and we are seeing no CPU load at all when this happens. 4. There is no Java deadlock. No Java lock cycles are causing this. Instead the JRE seems to be stuck in native code! We are continuing to investigate with our own JRockit team (they are getting into the debugging of the native code) but I figured there might be people here who have seen this or know what we can do to fix it!
Created attachment 84728 [details] Sun JDK 1.5.0_12 stack dumps on failure
Created attachment 84729 [details] Jrockit 1.5.0_12 with stack traces when failure happens
BTW, we *know* this happens on Sun and BEA JVMs. We have not tried other VMs so we don't know whether or not this happens elsewhere. Like... for example with the IBM JVM ;-)
I don't see this in the dumps, but in your system do you have other classloaders that sit on top and/or do not participate in the global lock? This is likely a red herring but I thought the Spring-OSGi stuff used its own classloaders sometimes for proxy classes. If that is true maybe that classloader is locked and the guts of the VM is unable to lock it while defining a class in another thread. Again the dumps do not seem to indicate this, but it was a thought I had while reading through your dumps. My other guess is that the VM does not like us releasing the lock that the native VM established. I know there used to be Sun VM bugs around this but I thought they had gotten fixed in the latest 1.5 VMs after 1.5.0_08.(In reply to comment #3) > BTW, we *know* this happens on Sun and BEA JVMs. We have not tried other VMs > so we don't know whether or not this happens elsewhere. Like... for example > with the IBM JVM ;-) > The IBM vm is quite different with respect to locking the classloader from the native VM while loading a class (it does not lock the classloader). Without testing the complex environment we cannot say for certain, but if the native VM lock is causing the issue then it should not be an issue on the IBM VM.
We have been "spinning" the test suite that causes the problem with the IBM VM and have not seen the problem in approximately 4000 runs. So I think you are correct that this bug does *not* show up in the IBM VM. We are continuing to spin it (probably all through the week) just to be "sure".
See bug 227587 for a detailed description of why this deadlock occurs.
*** Bug 227587 has been marked as a duplicate of this bug. ***
Note that bug 121737 introduced the osgi.classloader.singleThreadLoads. Please see that bug for details on what that option does and how it was supposed to solve the deadlock issues. It appears the option is pretty much useless with some of the latest VMs classloader->classname locking strategies.
From the JR team: I have looked through the source code attached to this CR and found two circular dependencies. These could lead to deadlocks under some conditions. It is far from sure that this is your problem, but they are worth looking into: STATE_OBJECT_FACTORY and STATE_OBJECT_FACTORY_IMPL: ============== eclipse/osgi/service/resolver/StateObjectFactory.java: public interface StateObjectFactory { ... public static final StateObjectFactory defaultFactory = new StateObjectFactoryImpl(); ============== org/eclipse/osgi/internal/resolver/StateObjectFactoryImpl.java: public class StateObjectFactoryImpl implements StateObjectFactory { ... ---------------------------------------------------------- In the StateObjectFactory interface, we initiate a static field with a call to a method in StateObjectFactoryImpl and StateObjectFactoryImpl extends from class StateObjectFactory. This means that if two threads, A and B tries to initiate the two classes at exactly the same time, we could get a deadlock (Classes are initiated the first time a static method in them is read or modified, or the first time a method in them is called): * A starts to initiate class StateObjectFactory * B starts to initiate class StateObjectFactoryImpl * A comes to 'new StateObjectFactoryImpl()' To call this function, class StateObjectFactoryImpl must be initialized, but another thread (B) is currently loading it. * A will therefore wait until B has initialized class StateObjectFactoryImpl * B comes to 'implements StateObjectFactory' Before a class is initialized, its superclass must be initialized but another thread (A) is currently loading it. * B will therefore wait until A has initialized class StateObjectFactory We have a deadlock! We have a similar case, but that might not be a problem: CONDITION and BOOLEAN_CONDITION: ============== org/osgi/service/condpermadmin/Condition.java public interface Condition { ... public final static Condition TRUE = new BooleanCondition(true); public final static Condition FALSE = new BooleanCondition(false); ... } final class BooleanCondition implements Condition{ ... ---------------------------------------------------------- (This one might not be a problem, since BooleanCondition is not a private class, and can only be created from inside Condition.) INVESTIGATING FURTHER: The most interesting dependency is the first one, StateObjectFactory => StateObjectFactoryImpl => StateObjectFactory The best would be to get rid of this dependency, for example by not setting the variable in the class initializer, but instead having an initializer method somewhere. One could have an initializer method that is called explicitly (not through a static block or something) before a class is used that can set StateObjectFactory.StateObjectFactory defaultFactory = new StateObjectFactoryImpl(); Note that just having a static variable does not give a dependency, so: public class StateObjectFactory{ ... public static final StateObjectFactory; is ok since we don't try to initialize any StateObjectFactory objects. ADDING PAUSES: Another interesting thing to try could also be to add pauses in the class loading. In the classes above, add this to the absolute top of the classes: static{ try{ Thread.sleep(1000) }catch(Exception e){} } This will make class initializing wait for a second before continuing. If the deadlock depends on any of these classes, this should make the 'bad timing' happen more often. If thread A tries to initialize StateObjectFactory, then we have a full second for another thread to initialize StateObjectFactoryImpl to see the deadlock. This could be an interesting thing to see if we can reproduce the issue more often. Another idea on how to remove the circular dependency: public interface StateObjectFactory { ... private static final StateObjectFactory defaultFactory = null; public static StateObjectFactory getFactory(){ if (defaultFactory == null){ defaultFactory = new StateObjectFactoryImpl(); } return defaultFactory; } That removes the dependancy. Now you can initialize StateObjectFactory without initializing StateObjectFactoryImpl. Besides, I think it looks nicer as well =)
(In reply to comment #9) Andy, thanks again for the detailed information. I doubt the circular dependency here is causing any deadlock though because the StateObjectFactory class is loaded very early in the launch of the framework and it should be initialized before you get to running any code from Bundles in the framework. But I will try some of the suggestions you mentioned to force a deadlock to see if this is a general issue for other classes with a similar pattern.
I, too, am having problems with osgi.classloader.singleThreadLoads=true. A had a reproduceable situation where I couldn't even see any tricky circularity. Basically, I had a deadlock between two threads trying to use a ProgressManager where ProgressManager$1.updateFor was trying to load interface IJobChangeEvent, both threads locking on the same classloader. Apparently, the loader.wait() statement in BundleLoader.lock() did not sufficiently release the lock. It looked scary in the debugger, since a perfectly simple method call could not be stepped into, debugger deterministically lost the connection to its debug target at this statement (reflecting the observation that the hang occurs inside the native JVM code). I read that you already understood the cause, which is good. So far our tool *required* singleThreadLoads=true to run fairly smoothly. Given that this strategy is dead, too, does anybody have any suggestions for now, what I should tell our users? I see these options: * Whenever a deadlock occurs, kill eclipse and *toggle* the singleThreadLoads flag, as it seems to choose between two potential deadlocks, which *usually* don't both occur together. * Strictly avoid using a Sun VM (which seems to be the only JVM most people have installed)? But which ones don't natively lock the classloader and are known to work well for Eclipse? IBM? Others? Is there anything else that can be done?
(In reply to comment #11) > Is there anything else that can be done? > Not officially, until Java SE 7 comes out which should allow us to develop a grid delegating class loaders without deadlocking as part of the modularity work of JSR 277. In the mean time the only thing I can think of you trying is the Sun VM options described in bug 121737 comment 8. I know of some products using these options successfully. You can also go add your vote to the very popular Sun bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4670071
(In reply to comment #12) > In the mean time the only thing I can think of you trying is the Sun VM options > described in bug 121737 comment 8. I know of some products using these options > successfully. You can also go add your vote to the very popular Sun bug > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4670071 Let me report that we moved to adding those two obscure Sun vm options to eclipse.ini (successfully using p2's EclipseTouchpoint ;-) ) and thus could move away from single thread loads. The system appears to be stable in this configuration. Another positive feedback: due to the hooks added in bug 208591 switching between different locking strategies didn't affect our own implementation since our hooks simply execute in whatever locking context they are invoked in. I like that architecture ;-)
Created attachment 111365 [details] Stacks showing a ghost lock (In reply to comment #13) > Let me report that we moved to adding those two obscure Sun vm options > to eclipse.ini (successfully using p2's EclipseTouchpoint ;-) ) > and thus could move away from single thread loads. > > The system appears to be stable in this configuration. FWIW: I just observed a deadlock in the debugger where the blocking lock was never taken. More specifically: As mentioned we are using these VM-args: -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass The attached stack trace shows two threads interlocked in an unearthly fashion: Thread "main" has the classname lock and waits for a DefaultClassLoader lock, which is supposed to be owned by Thread "Worker-2". Thread "Worker-2" tries to obtain the classname lock, but is *not* executing any code that should cause a DefaultClassLoader lock =:-0 (the debugger shows that all frames mentioning DefaultClassLoader indeed happened at the very instance that "main" is waiting for) I guess we should read this as: the -XX.. options are still buggy in a way that the VM still takes a lock which is invisible in the stack trace (or perhaps: had been taken without being ever returned or s.t.). Reporting here just for completeness. *Most* the times the machine runs smooth, at least significantly better than with singleThreadLoads.
If you get into a situation where you can reproduce this hang then please try the following configuration option to see if it goes away: osgi.support.class.certificate=false From the stack trace it appears to be an issue when the VM is checking the certificates of the class on the main thread and needs the classloader lock to do so. But I'm not sure where the thread Worker-2 is holding onto the class loader lock. Perhaps it is as you say and the VM is holding the lock under the covers when it should not be.
Created attachment 123096 [details] deadlock in disabled checkCerts While most the time things work fine, I added -Dosgi.support.class.certificate=false to my command line, just in case. My eclipse.ini also has: -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass config.ini has: osgi.classloader.lock=classname osgi.classloader.singleThreadLoads=false A while ago I recorded a stack dump of a deadlock that happened despite this precaution. Deadlock occurred while trying to enter ClassLoader.checkCerts. Is the switch intended to stop this invocation or were you talking about a different call chain? In another thread, loadClassInternal() again took a lock, despite being told not to do so. Still nothing reproducable on my side, it _usually_ works :-/
We should look at using the new method in Java 7 on ClassLoader.registerAsParallelCapable() method to fix the deadlock issues. If this method is available and returns true then we should lock on class name instead of on the class loader object. see http://download.java.net/jdk7/docs/api/java/lang/ClassLoader.html#registerAsParallelCapable()
Created attachment 132826 [details] work in progress Also see http://openjdk.java.net/groups/core-libs/ClassLoaderProposal.html Java 7 is a ways off yet. I would like to add some support for this into 3.5 but I think it may have to remain disabled by default with an option to enable it. This patch illustrates what I am thinking. The patch is untested. Need to get Java 7 setup on my Windows machine. Unfortunately there are no Java 7 builds available for the Mac :(
Created attachment 132840 [details] work in progres 2 The registerAsParallelCapable method is a static protected method. This forces me to use getDeclaredMethod and setAccessible in order to use it with reflection. This code now does the right thing on Java 7. Still need to do some testing to try to force a deadlock.
Using the original testcase from bug 121737 I am able to reproduce the deadlock on Java 7 without the parallel option enabled. Once I enable this option I cannot get the deadlock to happen, even when trying to force the deadlock by breaking in the class loader with the debugger.
Created attachment 132976 [details] updated patch Updated patch to include javadoc for ParallelClassLoader interface. While this is not true API (it is considered the SPI for the framework hooks), it is nice to give a description of how this interface is used by the ClasspathManager. I tried to leave any mention of Java 7 APIs out of the description since this API is not final and will not be available for some time.
Created attachment 133002 [details] final patch Did a review with John. I had a typo in the ParallelClassLoader.isParallelCapable() method name. I also added a note to the interface stating that the interface is an interim API and subject to change.
Renaming bug to reflect the content if this bug report and fix more accurately. We need to document that the osgi.classloader.singleThreadLoads is deprecated and useless on modern VMs (1.5 or greater) because of the native VM class name locking. We also need to document a the new option "osgi.classloader.type". If set to "parallel" and run on Java 7 then the OSGi class loader will lock on the class name instead of the class loader when finding/defining classes.
Hey Tom, related question - is this problem known to exist or not exist on the IBM VM? Thanks -- andy
(In reply to comment #24) > Hey Tom, related question - is this problem known to exist or not exist on the > IBM VM? Thanks -- andy > No, the IBM VM does not lock the class loader object natively. But in Equinox we still lock the class loader object when defining classes by default. This can still lead to a rare case of deadlock when circularity is involved. You can enable the same class name locking strategy that we use on Java 7 by setting the following property on IBM VM (or any other VM that does not lock the class loader natively): osgi.classloader.lock=classname
*** Bug 221329 has been marked as a duplicate of this bug. ***
*** Bug 301640 has been marked as a duplicate of this bug. ***
Is this fix included Eclipse 3.6M5? For one particular workspace it took ~10 times of which 9 locked up.
It seems like 3.6M5 has the patch, It still locks for me and this is very frequently.
The patch only works if you are running on Java SE 7. Are you?
I tried with openjdk 1.6 and and sun jdk 1.6. What's the solution for 1.6?
There isn't one. Fundamentally it's a problem in the VM. You could try with JRockit as this is much less susceptible to the problem.
-XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass also works tolerably well for many people.
Thanks for the directions....
*** Bug 215834 has been marked as a duplicate of this bug. ***
*** Bug 331818 has been marked as a duplicate of this bug. ***
(In reply to comment #33) > -XX:+UnlockDiagnosticVMOptions > -XX:+UnsyncloadClass > > also works tolerably well for many people. Except for EMF's EPackage registry which does unsynchronized classloader operations that can trip this bug, even with the above settings. https://bugs.eclipse.org/bugs/show_bug.cgi?id=340061
*** Bug 354844 has been marked as a duplicate of this bug. ***
*** Bug 362154 has been marked as a duplicate of this bug. ***
*** Bug 364202 has been marked as a duplicate of this bug. ***
*** Bug 369917 has been marked as a duplicate of this bug. ***
*** Bug 377609 has been marked as a duplicate of this bug. ***
*** Bug 389659 has been marked as a duplicate of this bug. ***
*** Bug 394363 has been marked as a duplicate of this bug. ***