120817 – [performance]Thread Safety & 8% Optimization : Use ThreadLocal for CompilationAndWeavingContext

Bug 120817 - [performance]Thread Safety & 8% Optimization : Use ThreadLocal for CompilationAndWeavingContext

Summary: [performance]Thread Safety & 8% Optimization : Use ThreadLocal for Compilatio...

Status:	RESOLVED FIXED

Alias:	None

Product:	AspectJ
Classification:	Tools
Component:	Compiler (show other bugs)
Version:	DEVELOPMENT
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	1.5.3
Assignee:	aspectj inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-12-14 03:02 EST by Ron Bodkin
Modified:	2006-10-17 04:43 EDT (History)
CC List:	0 users

See Also:

Attachments
HTML export of profiling data, showing the times spent in different parts of CompilationAndWeavingContext (11.38 KB, text/html) 2005-12-14 03:03 EST, Ron Bodkin	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ron Bodkin

2005-12-14 03:02:54 EST

In profiling start up load-time weaving of 4 Web apps in Tomcat, I see 8% of total CPU time spent weaving (a total of 1000 ms out of 13000) spent in CompilationAndWeavingContext.enteringPhase and CompilationAndWeavingContext.leavingPhase. Almost all of it comes from calls in BcelClassWeaver.match(BcelShadow, List)

It looks to me this overhead could be cut a fair bit by having a strategy to use a ThreadLocal for 1.2 and later. I also think the code could be optimized by preallocating arrays and just setting values and an offset (in the per thread data structure). Is there a reasonable limit on context stack depth?

Also, is the code thread safe? It looks to me like two threads could call getContextStack at the same time, resulting in a concurrent modification exception (if one is adding a new thread for the first time while the other is trying to read)

I will attach a snapshot of profiling data from BcelClassWeaver.match that highlights the time spent in this section...

Comment 1 Ron Bodkin

2005-12-14 03:03:57 EST

Created attachment 31709 [details]
HTML export of profiling data, showing the times spent in different parts of CompilationAndWeavingContext

Comment 2 Ron Bodkin

2005-12-15 00:53:37 EST

The JavaDoc comments say that this class can't use ThreadLocal, but ThreadLocal has been in Java since 1.2. E.g., http://java.sun.com/j2se/1.3/docs/api/java/lang/ThreadLocal.html 

Or is the concern that ThreadLocal is too inefficient before Java 1.4?

I'd advocate just using ThreadLocal to ensure thread safety and as a first optimization here. I think there is room for more improvement too. I'd be glad to submit a patch, but I'd like some feedback on this first.

Comment 3 Ron Bodkin

2005-12-15 01:47:34 EST

I did a quick test and using a thread local with an array of formatters cuts the overhead by 1/3, and should be thr   ead safe. It looks like using arrays and avoiding object allocations would further optimize this...

Comment 4 Andrew Clement

2006-02-15 06:13:57 EST

CompilationAndWeavingContext is a timesink for all forms of weaving - Adrian recently changed it to only manage stacks per thread when LTW which helped my command line compilation case a little.  I'm still not sure the cost outways the benefit - I'd be tempted to make the use of this conditional on running the system in debug mode, at least for the very very expensive calls (like the one made for every shadow).

We should decide for 1.5.1

Comment 5 Andrew Clement

2006-09-27 03:39:14 EDT

much of the this area has changed in the last few months.  Capturing low level context (the many 'match' events that occur) is now conditional (defaulting to OFF) and we recognize when not in a multi threaded environment and avoid using thread maps.

Do you still see this as an overhead Ron?

Comment 6 Andrew Clement

2006-10-17 04:43:42 EDT

no reply - presume OK now