Bug 251019 - [plan] [memory] Possible resource leak in aspectj 1.6.1
Summary: [plan] [memory] Possible resource leak in aspectj 1.6.1
Status: NEW
Alias: None
Product: AspectJ
Classification: Tools
Component: Runtime (show other bugs)
Version: 1.6.1   Edit
Hardware: PC Linux
: P2 critical with 11 votes (vote)
Target Milestone: ---   Edit
Assignee: aspectj inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-15 18:32 EDT by Pablo Gra\~na CLA
Modified: 2013-06-24 11:07 EDT (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pablo Gra\~na CLA 2008-10-15 18:32:43 EDT
I found a possible resource leak that stops web application class loaders from being garbage collected. There is a class CFlowStack that creates a ThreadStack object as a ThreadLocal or a static map element depending on java version.

The version that uses ThreadLocal (ThreadStackFactoryImpl) apparently does not provide a mechanism for removing the ThreadLocal.

This manifests as out of PermGen memory after several redeploys.
Comment 1 Andrew Clement CLA 2008-12-02 17:45:16 EST
I need a testcase before proceeding with further investigation of this.  I'll write one when I get time unless someone wants to contribute one.
Comment 2 Simone Gianni CLA 2010-01-23 12:34:02 EST
Hi Andy, I'm hitting this bug as well.

The problem is quite simple :
- An aspect is instantiated, and it creates a CFlowCounter/CFlowStack
- (I'll keep on CFlowCounter cause that is my case)
- CFlowcounter delegates to ThreadStackFactoryImpl to create a counter.
- It delegates to ThreadCounterImpl
- This is a ThreadLocal, which loads in a Counter.

Now, I deploy a webapplication, then I redeploy it. Tomcat (as many other application servers) use a thread pool, so when I redeploy the webapplication, the old application classloader is (should be) eligible for garbage collection, but threads are still alive.

So, in some threads there is still the reference to the ThreadCounterImpl.Counter, cause it is inside a ThreadLocal.

This class itself would be not a problem cause it just holds an int. However, via getClass().getClassLoader() the entire class loader is referenced, preventing it from garbage collection. 

Since it is not garbage collected, all classes are still there in the permgen space of the JVM, which is quite limited, and this causes the app server to crash after a few redeploys.

See here http://www.jroller.com/tackline/entry/fixing_threadlocal for a discussion of the generic ThreadLocal problem.
Comment 3 Simone Gianni CLA 2010-01-23 12:47:24 EST
I cannot find an easy solution to this atm.

The first idea is cleaning the thread local. This is unfortunately nearly impossible : there is no clear boundary on when "an aspect ends", expecially when it is a singleton.

Unless AspectJ offer some kind of way to "clean all thread locals" to be called on some events, like in a servlet context listener. But this would mean to keep somewhere a list of all created ThreadLocals or something similar.

The only "natural" boundary is "when it is garbage collected", so when the class loader gets into garbage collection. But this unfortunately does not happen beacuse of the reference in the thread local as explained before.

So, we need to break that reference using a soft or weak reference.

One solution could be to put a weak reference to the Counter instance.

Right now there are the following references : Thread -> ThreadLocal entry -> Counter -> Counter class -> ClassLoader -> rest of the webapp, all strong references.

Instead what is needed is Thread -> ThreadLocal entry --weak--> Counter to detach from the thread.

This would remove the problem, but also prevent cflow to work, because it may happen that the Counter instance is removed randomly by the GC.

So we also need : ClassLoader -> Aspect class -> CFlowCounter -> * -> Counter
as strong references.

But what is that "*"? 
- A List<Counter> : we have one for each thread, so we are solving the case of "small number of threads and many class loaders" but causing the opposite problem : a leak when many threads are used on the same class loader.
- A Map<Thread,Counter> : basically duplicating what ThreadLocal was intended for, with worse performances, so dropping ThreadLocal.

Any idea?
Comment 4 Simone Gianni CLA 2010-01-23 12:52:44 EST
Can it be cleaned when it arrives to "0" during .dec() and recreated again when an .inc() needs it? Being .dec() called also when athrow happens, there should be no leak, but a possible hit on performance with all those additions/removals of a thread local value.
Comment 5 Andrew Clement CLA 2010-01-25 14:25:22 EST
removal on 0 and recreation when required doesn't sound like too bad of a solution.
Comment 6 Andrew Clement CLA 2010-03-22 17:31:40 EDT
So... if I commit the change to call remove() on the ThreadLocal when the stack reaches 0 (for counter) or empty (for stack) - can someone who can easily recreate this issue try it out for me?
Comment 7 Simone Gianni CLA 2010-03-25 13:36:42 EDT
hi Andy,
sure, I have around 20 webapps on our devel servers that suffer from not being able to undeploy cause of this bug. I will surely test it! :D
Comment 8 Sam CLA 2010-04-26 16:17:28 EDT
We may have the same problem with one of our apps.   Is there a fix available to try?
Comment 9 Simone Gianni CLA 2010-04-27 09:03:04 EDT
Hi Sam,
Andy hasn't committed it yet, at least not on main. I've been playing with it changing the "dec" methods of ThreadStackImpl and ThreadCounteImpl with a clear of a thread local, and the "inc" methods with a check for null.
Comment 10 Andrew Clement CLA 2010-04-27 12:01:57 EDT
I just built a version of aspectjrt.jar that includes the proposed solution (not 100% sure if it is the same as what Simone has done though...).  It is here:

http://www.eclipse.org/downloads/download.php?file=/tools/aspectj/dev/aspectjrt_251019.jar

let me know if you try it...
Comment 11 Daniel Serodio CLA 2012-11-13 10:35:53 EST
I'm getting a Tomcat warning on undeploy about a memory leak with AspectJ 1.7.0. Andrew, can you build a version of aspectjrt.jar that includes the proposed solution for 1.7.0? Thanks.
Comment 12 Andrew Clement CLA 2012-11-13 11:47:39 EST
I suspect runtime hasn't changed since I built that last one (we don't change runtime unless absolutely necessary) - have to tried it?
Comment 13 David Kensche CLA 2013-03-20 07:28:51 EDT
I tried 1.7.2 and Andrew's jar from from comment 10. Yet I keep getting this error. I assume there is no new SNAPSHOT that fixes this?
(In reply to comment #12)
> I suspect runtime hasn't changed since I built that last one (we don't
> change runtime unless absolutely necessary) - have to tried it?
Comment 14 Andrew Clement CLA 2013-03-20 14:21:44 EDT
I guess I put out that rebuilt rt in comment #10 to get some feedback on the proposed solution and didn't hear much positive or negative about it, so it hasn't been rolled into master.  @David are you saying you are seeing the same leak around usage of cflow? Or that you are just seeing some kind of leak?
Comment 15 David Kensche CLA 2013-03-21 03:33:51 EDT
Yes, I am getting the same error from tomcat:

20.03.2013 14:51:34 org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap
... created a ThreadLocal with key of type [org.aspectj.runtime.internal.cflowstack.ThreadStackFactoryImpl.ThreadCounterImpl] (value [org.aspectj.runtime.internal.cflowstack.ThreadStackFactoryImpl$ThreadCounterImpl@4105c033]) and a value of type [org.aspectj.runtime.internal.cflowstack.ThreadStackFactoryImpl.ThreadCounterImpl.Counter] (value [org.aspectj.runtime.internal.cflowstack.ThreadStackFactoryImpl$ThreadCounterImpl$Counter@608d759e]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

I wrote some little utility class that gets as input an ExecutorService and one or more aspect classes and removes the thread locals itself. However, I cannot execute this with all threads because some librararies use their pools. So, this does not fix the problem at all.

(In reply to comment #14)
Comment 16 Andrew Clement CLA 2013-03-22 18:14:26 EDT
I've just created a runtime jar with a slight variant of the approach in it, can you try it out? The jar is here:

http://www.eclipse.org/downloads/download.php?file=/tools/aspectj/dev/aspectjrt.bug251019_20130322.jar

If this doesn't work I will have to recreate it myself to investigate further.
Comment 17 David Kensche CLA 2013-04-02 05:13:02 EDT
Sorry for the late response, I was on holiday for the last few days. 

So I've tried the jar but unfortunately the problem persists. However, this does not always happen. I tried this some 10 times now and I think I found some potential pattern:

If I start the application and stop it at once or only click around for a few seconds and then stop the application I will run into the issue. However, as it seems if I wait for a longer time (half a minute or so) before stopping the application, all thread locals are cleanly removed. 

In our application we retrieve quite some data from the database in background threads while showing the user already a view of the data that is already in memory. This code for prefetching data runs in thread pools and uses aspectj. Now when the user clicks somewhere where the data has not yet been loaded, we interrupt the prefetching thread and load data for that location at once so as to assert that the application is responsive. Could it be that the fix does not deal with exceptions such as an InterruptedException?

BTW, I am running this on Windows 7 with 
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)
Comment 18 Andrew Clement CLA 2013-04-02 11:14:00 EDT
The fix we are working with creates the threadlocals on entry to a control flow and should then clear the threadlocal on exit from the control flow. If the code were made to violently exit then it is possible the tidy up is missed, however the tidy up should be getting done regardless of how the call stack is exited (exception or normal exit). I could probably create a debug version of this library that produced information that gave us an idea of how many outstanding thread locals there are and when they come and go, if you had the time to investigate with it? I don't have a setup that simulates this problem so I can't investigate myself.

Actually it just occurred to me that there is another alternative for you to try too... When ThreadLocals came along that option was added to the codebase for managing this state but the 'old version' was left around for users on older jdks, you could use that via this system property:

aspectj.runtime.cflowstack.usethreadlocal=false

that will use a static Hashtable.
Comment 19 David Kensche CLA 2013-04-03 02:12:30 EDT
Ok, I will try the system property. However, if you could provide me the debug-version of the jar I will try to find the problem. 

In fact, I already did some debugging and the thread locals seem to be in the threads of the GUI library (ZK) that we are using. While we are creating our own pools we explicitly clean them up on application shutdown. ZK, however, creates some event processing threads that exist outside any pool and run indefinitely. I cannot think of a way to clean these threads on shutdown because I cannot give them a task that removes the thread locals.
Comment 20 Missing name CLA 2013-04-10 12:25:56 EDT
I can confirm I get this bug using tomcat7 and aspectjrt version 1.7.2.

Will try out the bugfix jar tomorrow and report back.
Comment 21 Boris Kuschel CLA 2013-06-19 11:47:48 EDT
We are also seeing this defect in 1.6.11. What is the planned release for the fix mentioned above?
Comment 22 Andrew Clement CLA 2013-06-24 11:07:07 EDT
unsetting the target field which is currently set for something already released