Bug 271840 - Load-Time weaving deadlocks when used with Terracotta
Summary: Load-Time weaving deadlocks when used with Terracotta
Status: RESOLVED FIXED
Alias: None
Product: AspectJ
Classification: Tools
Component: LTWeaving (show other bugs)
Version: 1.6.4   Edit
Hardware: PC Windows Vista
: P3 critical with 1 vote (vote)
Target Milestone: 1.6.5   Edit
Assignee: aspectj inbox CLA
QA Contact:
URL: http://forums.terracotta.org/forums/p...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-09 16:00 EDT by Troy Anderson CLA
Modified: 2009-06-19 12:25 EDT (History)
3 users (show)

See Also:


Attachments
thread dump using patched aspectjweaver jar (21.06 KB, text/plain)
2009-04-09 17:51 EDT, Tim Eck CLA
no flags Details
test case for this issue (1.42 MB, application/gzip)
2009-04-09 18:37 EDT, Tim Eck CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Troy Anderson CLA 2009-04-09 16:00:13 EDT
The deadlock occurs intermittently after startup.

For background, please refer to this posting:
http://forums.terracotta.org/forums/posts/list/2002.page#12093

These are the stack traces of the conflicting threads when reproduced by Terracotta:

"Job_Executor2":
         at org.aspectj.weaver.loadtime.Aj$WeaverContainer.getWeaver(Aj.java:245)
         - locked <0xafb85598> (a java.util.Collections$SynchronizedMap)
         at org.aspectj.weaver.loadtime.Aj.preProcess(Aj.java:89)
         - locked <0x8592f6b0> (a java.net.URLClassLoader)
         at org.aspectj.weaver.loadtime.ClassPreProcessorAgentAdapter.transform(ClassPreProcessorAgentAdapter.java:52)
         at sun.instrument.TransformerManager.transform(TransformerManager.java:169)
         at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:365)
         at java.lang.ClassLoader.defineClass1(Native Method)
         at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
         at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
         at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
         - locked <0x8592f6b0> (a java.net.URLClassLoader)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
         - locked <0x8592f6b0> (a java.net.URLClassLoader)
         at javax.management.remote.generic.ServerIntermediary$RequestHandler.handleNotifReqMessage(ServerIntermediary.java:700)
         at javax.management.remote.generic.ServerIntermediary$RequestHandler.execute(ServerIntermediary.java:626)
         at com.sun.jmx.remote.generic.ServerSynchroMessageConnectionImpl$RemoteJob.run(ServerSynchroMessageConnectionImpl.java:266)
         at com.sun.jmx.remote.opt.util.ThreadService$ThreadServiceJob.run(ThreadService.java:208)
         at com.sun.jmx.remote.opt.util.JobExecutor.run(JobExecutor.java:59)
 
 "main":
         at org.aspectj.weaver.loadtime.Aj$WeaverContainer.getWeaver(Aj.java:235)
         - waiting to lock <0xafb85598> (a java.util.Collections$SynchronizedMap)
         at org.aspectj.weaver.loadtime.Aj.preProcess(Aj.java:89)
         - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader)
         at org.aspectj.weaver.loadtime.ClassPreProcessorAgentAdapter.transform(ClassPreProcessorAgentAdapter.java:52)
         at sun.instrument.TransformerManager.transform(TransformerManager.java:169)
         at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:365)
         at java.lang.ClassLoader.defineClass1(Native Method)
         at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
         at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
         at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
         at sun.misc.Launcher$AppClassLoader.findClass(Launcher.java)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
         - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
         - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
         - locked <0x8592c130> (a sun.misc.Launcher$AppClassLoader)
Comment 1 Andrew Clement CLA 2009-04-09 17:16:52 EDT
I've attempted to adjust the code so that it touches a few less types in total and actively touches (outside of the synchronized block) types it really will need.

new aspectjweaver.jar with this change (possible fix) in is here:

http://www.eclipse.org/downloads/download.php?file=/tools/aspectj/dev/aspectjweaver_271840.jar
Comment 2 Tim Eck CLA 2009-04-09 17:50:49 EDT
Using that version of aspectweaver.jar my test program seems to get stuck in a different place. The deadlock detector in VM doesn't seem to going off this time, but I'm pretty sure it is deadlocked again -- maybe something do with the implicit lock taken on <clinit>? 

Attached is the full thread dump. 

I can put together something to let you run my test program if you'd like. 
Comment 3 Tim Eck CLA 2009-04-09 17:51:43 EDT
Created attachment 131466 [details]
thread dump using patched aspectjweaver jar
Comment 4 Andrew Clement CLA 2009-04-09 17:56:45 EDT
having a test program i can use myself would be great!
Comment 5 Tim Eck CLA 2009-04-09 18:37:00 EDT
Created attachment 131471 [details]
test case for this issue

This is just the "ltw" example from the AJ distribution. ltw.tar.gz has a README.txt in with the steps for running it with terracotta in the mix
Comment 6 Troy Anderson CLA 2009-04-14 13:30:33 EDT
One interesting behavior I've noticed:  Once a deadlock occurs for a given client process, if left running, a subsequent invocation of the same client will never deadlock.  It may be (in my case) related to the log files and the fact that subsequent invocations attempt to log to the same folder and I get this notice:

“WARN: The log directory, '…\client-logs', is already in use by another Terracotta process. Logging will proceed to the console only.”

If I get this message, never-a-deadlock.
Comment 7 Andrew Clement CLA 2009-04-20 14:30:27 EDT
thanks for the instructions, I have now recreated the hang.  Not really sure what I'll do about it, but at least I can play around now.
Comment 8 Tim Eck CLA 2009-04-20 14:40:36 EDT
Good to hear you can recreated the problem, that is often half the battle. Terracotta is perhaps a tad esoteric here in that we start threads very early (that end up in the AJ agent concurrently). I haven't looked at your code, but some eager initialization of classes can go a long way here. We've had a long and checkered history with deadlocks (and friends like LinkageError and ClassCircularity) due to needing to load our classes in the path of application class loading. 
Comment 9 Andrew Clement CLA 2009-04-20 17:24:17 EDT
Hi Tim,

> I haven't looked at your code, but some eager initialization of classes 
> can go a long way here.

Do you have a recommended pattern for ensuring eager class initialization?  I did a bit of that in the first attempt at addressing this, just by trying to exercise a few classes before heading into the sync block - is that the best approach?
Comment 10 Andrew Clement CLA 2009-04-28 20:55:49 EDT
ok - it was failing about one in three or four times for me.  I've done two things and now it has yet to hang in about 80 runs.

First change is that we get awfully far into the process before recognizing the loader as one created internally by the VM to load generated reflection classes.  We are already inside some locks by this point.  So moved that test up to a higher level and we exit far sooner in this scenario now.  On adding some debug I can see that even this simple testcase causes this situation to occur a few times.

I also added a static initalizer that touches the two key classes that are later accessed inside the lock.  Touching these will also exercise some of the trace infrastructure (and so ensure those types are touched before proper weaving occurs)

Changes are committed and will be in the next development build - i'll add a note here when that happens but it'll probably be in a few hours.
Comment 11 Andrew Clement CLA 2009-04-30 11:44:21 EDT
the dev build with the changes in is now on the AspectJ download page
Comment 12 Tim Eck CLA 2009-05-12 15:39:28 EDT
Sorry for the long delay -- I was on extended holiday. It sounds like you've already done some eager class loading. I don't know any reasonable systematic approach for this stuff other than discovering what needs eager loading and hard coding it. Unfortunately that is really just a point in time solution. I toyed with the idea of eager loading every single Terracotta class (plus our dependencies) at startup time but that proved prohibitively expensive and actually created some other problems. 
Comment 13 Andrew Clement CLA 2009-06-18 00:00:44 EDT
believed fixed in 1.6.5
Comment 14 Tim Eck CLA 2009-06-19 12:25:30 EDT
(In reply to comment #13)
> believed fixed in 1.6.5
> 

Thanks for the help Andy!