Community
Participate
Working Groups
The deadlock occurs intermittently after startup. For background, please refer to this posting: http://forums.terracotta.org/forums/posts/list/2002.page#12093 These are the stack traces of the conflicting threads when reproduced by Terracotta: "Job_Executor2": at org.aspectj.weaver.loadtime.Aj$WeaverContainer.getWeaver(Aj.java:245) - locked <0xafb85598> (a java.util.Collections$SynchronizedMap) at org.aspectj.weaver.loadtime.Aj.preProcess(Aj.java:89) - locked <0x8592f6b0> (a java.net.URLClassLoader) at org.aspectj.weaver.loadtime.ClassPreProcessorAgentAdapter.transform(ClassPreProcessorAgentAdapter.java:52) at sun.instrument.TransformerManager.transform(TransformerManager.java:169) at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:365) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) at java.net.URLClassLoader.access$000(URLClassLoader.java:56) at java.net.URLClassLoader$1.run(URLClassLoader.java:195) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) - locked <0x8592f6b0> (a java.net.URLClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) - locked <0x8592f6b0> (a java.net.URLClassLoader) at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleNotifReqMessage(ServerIntermediary.java:700) at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:626) at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:266) at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208) at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59) "main": at org.aspectj.weaver.loadtime.Aj$WeaverContainer.getWeaver(Aj.java:235) - waiting to lock <0xafb85598> (a java.util.Collections$SynchronizedMap) at org.aspectj.weaver.loadtime.Aj.preProcess(Aj.java:89) - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader) at org.aspectj.weaver.loadtime.ClassPreProcessorAgentAdapter.transform(ClassPreProcessorAgentAdapter.java:52) at sun.instrument.TransformerManager.transform(TransformerManager.java:169) at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:365) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) at java.net.URLClassLoader.access$000(URLClassLoader.java:56) at java.net.URLClassLoader$1.run(URLClassLoader.java:195) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at sun.misc.Launcher$AppClassLoader.findClass(Launcher.java) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader)
I've attempted to adjust the code so that it touches a few less types in total and actively touches (outside of the synchronized block) types it really will need. new aspectjweaver.jar with this change (possible fix) in is here: http://www.eclipse.org/downloads/download.php?file=/tools/aspectj/dev/aspectjweaver_271840.jar
Using that version of aspectweaver.jar my test program seems to get stuck in a different place. The deadlock detector in VM doesn't seem to going off this time, but I'm pretty sure it is deadlocked again -- maybe something do with the implicit lock taken on <clinit>? Attached is the full thread dump. I can put together something to let you run my test program if you'd like.
Created attachment 131466 [details] thread dump using patched aspectjweaver jar
having a test program i can use myself would be great!
Created attachment 131471 [details] test case for this issue This is just the "ltw" example from the AJ distribution. ltw.tar.gz has a README.txt in with the steps for running it with terracotta in the mix
One interesting behavior I've noticed: Once a deadlock occurs for a given client process, if left running, a subsequent invocation of the same client will never deadlock. It may be (in my case) related to the log files and the fact that subsequent invocations attempt to log to the same folder and I get this notice: “WARN: The log directory, '…\client-logs', is already in use by another Terracotta process. Logging will proceed to the console only.” If I get this message, never-a-deadlock.
thanks for the instructions, I have now recreated the hang. Not really sure what I'll do about it, but at least I can play around now.
Good to hear you can recreated the problem, that is often half the battle. Terracotta is perhaps a tad esoteric here in that we start threads very early (that end up in the AJ agent concurrently). I haven't looked at your code, but some eager initialization of classes can go a long way here. We've had a long and checkered history with deadlocks (and friends like LinkageError and ClassCircularity) due to needing to load our classes in the path of application class loading.
Hi Tim, > I haven't looked at your code, but some eager initialization of classes > can go a long way here. Do you have a recommended pattern for ensuring eager class initialization? I did a bit of that in the first attempt at addressing this, just by trying to exercise a few classes before heading into the sync block - is that the best approach?
ok - it was failing about one in three or four times for me. I've done two things and now it has yet to hang in about 80 runs. First change is that we get awfully far into the process before recognizing the loader as one created internally by the VM to load generated reflection classes. We are already inside some locks by this point. So moved that test up to a higher level and we exit far sooner in this scenario now. On adding some debug I can see that even this simple testcase causes this situation to occur a few times. I also added a static initalizer that touches the two key classes that are later accessed inside the lock. Touching these will also exercise some of the trace infrastructure (and so ensure those types are touched before proper weaving occurs) Changes are committed and will be in the next development build - i'll add a note here when that happens but it'll probably be in a few hours.
the dev build with the changes in is now on the AspectJ download page
Sorry for the long delay -- I was on extended holiday. It sounds like you've already done some eager class loading. I don't know any reasonable systematic approach for this stuff other than discovering what needs eager loading and hard coding it. Unfortunately that is really just a point in time solution. I toyed with the idea of eager loading every single Terracotta class (plus our dependencies) at startup time but that proved prohibitively expensive and actually created some other problems.
believed fixed in 1.6.5
(In reply to comment #13) > believed fixed in 1.6.5 > Thanks for the help Andy!