Community
Participate
Working Groups
In profiling start up load-time weaving of 4 Web apps in Tomcat, I see 8% of total CPU time spent weaving (a total of 1000 ms out of 13000) spent in CompilationAndWeavingContext.enteringPhase and CompilationAndWeavingContext.leavingPhase. Almost all of it comes from calls in BcelClassWeaver.match(BcelShadow, List) It looks to me this overhead could be cut a fair bit by having a strategy to use a ThreadLocal for 1.2 and later. I also think the code could be optimized by preallocating arrays and just setting values and an offset (in the per thread data structure). Is there a reasonable limit on context stack depth? Also, is the code thread safe? It looks to me like two threads could call getContextStack at the same time, resulting in a concurrent modification exception (if one is adding a new thread for the first time while the other is trying to read) I will attach a snapshot of profiling data from BcelClassWeaver.match that highlights the time spent in this section...
Created attachment 31709 [details] HTML export of profiling data, showing the times spent in different parts of CompilationAndWeavingContext
The JavaDoc comments say that this class can't use ThreadLocal, but ThreadLocal has been in Java since 1.2. E.g., http://java.sun.com/j2se/1.3/docs/api/java/lang/ThreadLocal.html Or is the concern that ThreadLocal is too inefficient before Java 1.4? I'd advocate just using ThreadLocal to ensure thread safety and as a first optimization here. I think there is room for more improvement too. I'd be glad to submit a patch, but I'd like some feedback on this first.
I did a quick test and using a thread local with an array of formatters cuts the overhead by 1/3, and should be thr ead safe. It looks like using arrays and avoiding object allocations would further optimize this...
CompilationAndWeavingContext is a timesink for all forms of weaving - Adrian recently changed it to only manage stacks per thread when LTW which helped my command line compilation case a little. I'm still not sure the cost outways the benefit - I'd be tempted to make the use of this conditional on running the system in debug mode, at least for the very very expensive calls (like the one made for every shadow). We should decide for 1.5.1
much of the this area has changed in the last few months. Capturing low level context (the many 'match' events that occur) is now conditional (defaulting to OFF) and we recognize when not in a multi threaded environment and avoid using thread maps. Do you still see this as an overhead Ron?
no reply - presume OK now