Bug 120817 - [performance]Thread Safety & 8% Optimization : Use ThreadLocal for CompilationAndWeavingContext
Summary: [performance]Thread Safety & 8% Optimization : Use ThreadLocal for Compilatio...
Status: RESOLVED FIXED
Alias: None
Product: AspectJ
Classification: Tools
Component: Compiler (show other bugs)
Version: DEVELOPMENT   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: 1.5.3   Edit
Assignee: aspectj inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-14 03:02 EST by Ron Bodkin CLA
Modified: 2006-10-17 04:43 EDT (History)
0 users

See Also:


Attachments
HTML export of profiling data, showing the times spent in different parts of CompilationAndWeavingContext (11.38 KB, text/html)
2005-12-14 03:03 EST, Ron Bodkin CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ron Bodkin CLA 2005-12-14 03:02:54 EST
In profiling start up load-time weaving of 4 Web apps in Tomcat, I see 8% of total CPU time spent weaving (a total of 1000 ms out of 13000) spent in CompilationAndWeavingContext.enteringPhase and CompilationAndWeavingContext.leavingPhase. Almost all of it comes from calls in BcelClassWeaver.match(BcelShadow, List)

It looks to me this overhead could be cut a fair bit by having a strategy to use a ThreadLocal for 1.2 and later. I also think the code could be optimized by preallocating arrays and just setting values and an offset (in the per thread data structure). Is there a reasonable limit on context stack depth?

Also, is the code thread safe? It looks to me like two threads could call getContextStack at the same time, resulting in a concurrent modification exception (if one is adding a new thread for the first time while the other is trying to read)

I will attach a snapshot of profiling data from BcelClassWeaver.match that highlights the time spent in this section...
Comment 1 Ron Bodkin CLA 2005-12-14 03:03:57 EST
Created attachment 31709 [details]
HTML export of profiling data, showing the times spent in different parts of CompilationAndWeavingContext
Comment 2 Ron Bodkin CLA 2005-12-15 00:53:37 EST
The JavaDoc comments say that this class can't use ThreadLocal, but ThreadLocal has been in Java since 1.2. E.g., http://java.sun.com/j2se/1.3/docs/api/java/lang/ThreadLocal.html 

Or is the concern that ThreadLocal is too inefficient before Java 1.4?

I'd advocate just using ThreadLocal to ensure thread safety and as a first optimization here. I think there is room for more improvement too. I'd be glad to submit a patch, but I'd like some feedback on this first.
Comment 3 Ron Bodkin CLA 2005-12-15 01:47:34 EST
I did a quick test and using a thread local with an array of formatters cuts the overhead by 1/3, and should be thr   ead safe. It looks like using arrays and avoiding object allocations would further optimize this...
Comment 4 Andrew Clement CLA 2006-02-15 06:13:57 EST
CompilationAndWeavingContext is a timesink for all forms of weaving - Adrian recently changed it to only manage stacks per thread when LTW which helped my command line compilation case a little.  I'm still not sure the cost outways the benefit - I'd be tempted to make the use of this conditional on running the system in debug mode, at least for the very very expensive calls (like the one made for every shadow).

We should decide for 1.5.1
Comment 5 Andrew Clement CLA 2006-09-27 03:39:14 EDT
much of the this area has changed in the last few months.  Capturing low level context (the many 'match' events that occur) is now conditional (defaulting to OFF) and we recognize when not in a multi threaded environment and avoid using thread maps.

Do you still see this as an overhead Ron?
Comment 6 Andrew Clement CLA 2006-10-17 04:43:42 EDT
no reply - presume OK now