400649 – Improve re-entrancy robustness of the weaver

Bug 400649 - Improve re-entrancy robustness of the weaver

Summary: Improve re-entrancy robustness of the weaver

Status:	NEW

Alias:	None

Product:	AspectJ
Classification:	Tools
Component:	LTWeaving (show other bugs)
Version:	DEVELOPMENT
Hardware:	PC Windows 7

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	aspectj inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-02-13 01:14 EST by Lyor Goldstein
Modified:	2013-02-21 02:14 EST (History)
CC List:	1 user (show)

See Also:

Attachments
The WeavingAdaptor skeleton code (2.60 KB, text/plain) 2013-02-21 02:13 EST, Lyor Goldstein	no flags	Details
The jUnit test that illustrates the problem (7.01 KB, text/plain) 2013-02-21 02:13 EST, Lyor Goldstein	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Lyor Goldstein

2013-02-13 01:14:09 EST

Aj.WeaverCotainer#getWeaver creates a new WeavingAdaptor instance for each class loader using an AdaptorKey whose equals method simply checks the hash code of the loader it is supposed to represent (multiplying the original hash code by 37 does not really do anything…). This key is then used to check if this is a new adaptor or not. Obviously, relying only on the hash code – and the one declared by the loader is “weak” – there are great chances of “re-using” the same adaptor for several distinct class loaders (which is what may be happening here – I now understand better the need for the weaverRunning mechanism).

A better AdaptorKey would greatly decrease the chance of such a “conflict” (i.e., same adaptor for several loaders) – and I think the following improvement is both easy to implement and also efficient (see below) – basically, relying only on the loader’s hash code is a very “weak” uniqueness check. Best would be to rely on the loader instance, but we don’t want to create a hard reference to it. Therefore, here is my suggestion:

private static class AdaptorKey extends WeakReference {

		private final int loaderHashcode, sysHashCode, hashValue;
private final String loaderClass;

		public AdaptorKey(ClassLoader loader) {
			super(loader, adaptorQueue);
			loaderHashCode = loader.hashCode();
			sysHashCode = System.identityHashCode(loader);	// In case the loader returns a constant hash code – which is allowed by Java rules
			loaderClass = loader.getClass().getName();
			hashValue = loaderHashCode + sysHashCode + loaderClass.hashCode();
		}

		public ClassLoader getClassLoader() {
			ClassLoader instance = (ClassLoader) get();
			// Assert instance!=null - shouldn't be asked for after a GC of the referent has occurred !
			return instance;
		}

		public boolean equals(Object obj) {
			if (!(obj instanceof AdaptorKey))
				return false;
			AdaptorKey other = (AdaptorKey) obj;
			return (other.loaderHashCode == loaderHashCode)
   &&  (other.sysHashCode == sysHashCode)
   && loaderClass.equals(other.loaderClass)
   ;
		}

		public int hashCode() {
			return hashValue;
		}
}

Comment 1 Andrew Clement

2013-02-13 18:21:30 EST

I've already built the 1.7.2 final build. To re-open it and slip a change like this in, I'd need to make sure this fixes your problems. You mention:

> which is what may be happening here – I now understand better the need for the weaverRunning mechanism

Can you say for certain?

Comment 2 Lyor Goldstein

2013-02-14 00:31:52 EST

I cannot say for certain as it is extremely difficult to reproduce. However, from the code analysis I believe I am right. Here is what I think is happening (or at least is a likely bug producing scenario):

- 2 (or more) threads using 2 (or more) loaders (i.e. each thread is using one loader) where (by chance) all the used loader have the same hash code
- Whichever thread is 1st registers its loader's AdaptorKey - let's assume thread A with loader A
- When thread B with loader B (that we assume has the same hash code as loader A) looks up its WeaverAdaptor instance, it receives thread/loader A's instance
- Both threads call the same (!) WeaverAdaptor instance "weaveClass" method
- When the method asks weaverRunning.get() it receives FALSE in both (!) cases since the "weaverRunning" member is a ThreadLocal (!) - i.e., each thread believes it is the only one using the WeaverAdaptor class when in fact it isn't.

==> Both threads use the same WeaverAdaptor instance without really locking each other out as initially intended.

I am pretty sure that this is indeed a bug (even if it might not be what is happening in my case). I believe the fix for this scenario is twofold - (1) improve the AdaptorKey uniqueness along the lines I have suggested, (2) modify the "weaverRunning" member of the WeaverAdaptor class into an AtomicBoolean

In this context, since there is a chance (slim as it may be) that 2 loaders may be mapped to the same AdaptorKey (regardless of the mapping algorithm) I don't think the AdaptorKey should hold the mapped loader's reference (weak or otherwise) since the result of the "getLoader" call may be misleading - i.e., we would get loader A even though in reality it is loader B that has had the "mischance" of being mapped to the same AdaptorKey

As far as having the 1.7.2 final build - can you maybe publish a 1.7.2.SR1 release with these 2 fixes in them ? In this context, there is no immediate need for this patch, but we would like to have it ready by the same we announce JBoss 7.x support (which I estimate to be a few more weeks from now - I can let you know 2-3 weeks beforehand)

Comment 3 Andrew Clement

2013-02-14 11:41:07 EST

Ok, so in the problem case we think secondary threads end up with the weaver for the wrong classloader.  If this is fixed, why does the threadlocal change need to be made?  The threadlocal is designed to prevent re-entrancy of a single thread into a weaver. It was added because the same thread will pass the attempted 'synchronize (classloader) }' check that we make.  Different threads can't both pass the synchronize check.

Comment 4 Lyor Goldstein

2013-02-17 01:17:19 EST

"If this is fixed" - relying on names, hash codes, etc. can only get you so far without having a hard reference to the loader - which is something you want to avoid. Therefore I think we should assume that there is a chance (which we can make very small but not zero) that 2 threads might use the same WeaverAdaptor instance for different (!) loaders.

"Why does the threadlocal change need to be made?  The threadlocal is designed to prevent re-entrancy of a single thread into a weaver. It was added because the same thread will pass the attempted 'synchronize (classloader) }' check that we make.  Different threads can't both pass the synchronize check."

    I am not sure how the same thread can (eventually) call into itself - unless something that the weaver does causes 'transform' method to be called again. In any case, if this is an issue then we need to address 2 problems - 2 different loaders using the same WeaverAdaptor AND same thread attempting to re-use the WeaverAdaptor from within itself.

If you can indeed ensure 100% (not 99.9999...) that a WeaverAdaptor is associated with a unique (!) loader instance, then I think that the ThreadLocal is enough. But in such a case, since you ensure that the WeaverAdaptor can only be used for a specific loader instance you don't need a ThreadLocal and a simple 'boolean' will do since any other thread (as you mentioned) would be blocked by the 'synchronized' lock on the associated class loader - so again, no need for a ThreadLocal.

Bottom line - my analysis of the situation is this:

- if you can ensure 100% certainty that a WeaverAdaptor is associated with a specific class loader instance then a simple 'boolean' will do the trick since the class loader serves as the mutual exclusion lock between concurrent threads using the same loader instance

- otherwise, you must use an AtomicBoolean that will take care of both scenarios - (a) same thread (somehow) attempts to re-use the WeaverAdaptor and (b) 2 loaders that are mapped (by chance) to the same WeaverAdaptor are being used by 2 separate threads.

In any case, I still recommend a "trace.warning" call when this state is detected.

Comment 5 Lyor Goldstein

2013-02-20 07:58:45 EST

Hi Andy,

I see you added my AdaptorKey fix - in this context, I also think you should get rid of the loader reference (weak or otherwise). As I have mentioned, I don't think it is needed in the AdaptorKey and might even be misleading if 2 loaders are mapped by chance to the same AdaptorKey.

Lyor

P.S. What about the "weaverRunning" field - have you been persuaded by my arguments ?

Comment 6 Andrew Clement

2013-02-20 11:42:22 EST

I just made the change you said would fix your problems, if you need to change it further let me have another patch or code snippet. I can't test some of these situations as my regression tests aren't setup for sophisticated LTW tests so I don't really want to make unguided changes.

>  What about the "weaverRunning" field - have you been persuaded by my arguments ?

not yet. I'll probably need concrete confirmation there is a problem before I change it. I'm nervous about changes here when I don't have any testcases to verify the behaviour.

Comment 7 Lyor Goldstein

2013-02-21 02:11:47 EST

(In reply to comment #6)
> I just made the change you said would fix your problems, if you need to
> change it further let me have another patch or code snippet. I can't test
> some of these situations as my regression tests aren't setup for
> sophisticated LTW tests so I don't really want to make unguided changes.

I am not sure it would fix my problems since the behavior is extremely difficult to reproduce. I am confident though that the code as originally written did pose a problem and the fix was warranted - thanks for that.

> >  What about the "weaverRunning" field - have you been persuaded by my arguments ?
> 
> not yet. I'll probably need concrete confirmation there is a problem before
> I change it. I'm nervous about changes here when I don't have any testcases
> to verify the behaviour.

Like I said, it is extremely difficult to reproduce so I can't really provide you with a "live" testcase. However, I believe you can "synthesize" a test that can prove my point as follows:

* you create an WeaverAdaptor instance

* create 2 threads whose code calls WeaverAdaptor#weaveClass but where one thread blocks once 'weaveClass' is called right after the call to "weaverRunning.set(true);" - let's call it thread A. I believe you can do this by providing Trace instance that checks when "enter" is called with the "weaveClass" name (see line 321) and then blocks on some mutex/semaphore if the caller is thread A (the one who is supposed to block). Or you can make some slight change to the 'weaveClass' code just for this test...

* once the code detects that thread A is blocked (e.g., it can release a mutex/semaphore/send a message - etc.) launch thread B that also calls "weaveClass" on the same (!) WeaverAdaptor instance accessed by the (blocked) thread A.

==> If "weaverRunning" was doing its job as it was supposed to, thread B should not go past the "if (weaverRunning.get())" (line 314) - you can check this by having some assertion fail if thread B reaches the same code location where thread A is blocked. I believe you will see the failing assertion - thus demonstrating the bug.

* Repeat the same test with "weaverRunning" defined as an AtomicBoolean and I believe you will see that now the behavior is as expected.

I am attaching 2 files for such a test that demonstrate the issue - feel free to analyze them and tell me what you think.

Comment 8 Lyor Goldstein

2013-02-21 02:13:24 EST

Created attachment 227387 [details]
The WeavingAdaptor skeleton code

Contains the same code as the original - minus the non-relevant parts

Comment 9 Lyor Goldstein

2013-02-21 02:13:56 EST

Created attachment 227388 [details]
The jUnit test that illustrates the problem

The jUnit test code