Bug 401333 - Duplicate filed / invalid class file when reweaving
Summary: Duplicate filed / invalid class file when reweaving
Status: NEW
Alias: None
Product: AspectJ
Classification: Tools
Component: Compiler (show other bugs)
Version: 1.7.1   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: aspectj inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-20 12:19 EST by Hermann Vosseler CLA
Modified: 2013-08-07 15:49 EDT (History)
3 users (show)

See Also:


Attachments
Excerpt of the Classfile diff, showing the additional (and duplicate) field descriptors and the diffs in the location where an before-advice has been applied (13.43 KB, text/plain)
2013-02-20 12:19 EST, Hermann Vosseler CLA
no flags Details
Another diff of bytecode outlines, showing the duplicate fields and additional advice (68.63 KB, text/plain)
2013-06-19 15:47 EDT, Hermann Vosseler CLA
no flags Details
Yet another example diff between good and failed build results (16.29 KB, text/plain)
2013-06-19 15:48 EDT, Hermann Vosseler CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hermann Vosseler CLA 2013-02-20 12:19:57 EST
Created attachment 227349 [details]
Excerpt of the Classfile diff, showing the additional (and duplicate) field descriptors and the diffs in the location where an before-advice has been applied

The build process for our enterprise application sometimes produces invalid class files. This behaviour is not really reproducible, but happened repeatedly, roughly once every two weeks or so. Just re-doing the same build process again usually produces a clean result.


We are building the application as a Multimodule Maven project. The Builds are performed by Jenkins starting from a clean tree. Several modules are built with the AspectJ compiler, thereby also applying aspects defined in other modules they're depending on.

The incriminating aspect is contained in the "server" module and applied to other code in that module. In the later build steps, an "operations" module consumes this "server" module and other modules, re-applies all aspects contained herein, and applies additional monitoring and maintenance aspects. The resulting JAR is packaged as part of an EAR.



In the case reported here, one of the re-woven class files contained several duplicate field descriptors of type org.aspectj.lang.JoinPoint.StaticPart


We can provide the incriminating class file and a correct version of the same class created by the next build. Also we could provide the source code on demand, but please understand that we'll refrain from posting this kind of data right into a public bug report.
Comment 1 Andrew Clement CLA 2013-02-21 11:32:37 EST
I looked at the attached diff, this seems the relevant bit:

   // Field descriptor #349 Ljava/lang/annotation/Annotation;
   private static synthetic java.lang.annotation.Annotation ajc$anno$2;
   
+  // Field descriptor #331 Lorg/aspectj/lang/JoinPoint$StaticPart;
+  private static final synthetic org.aspectj.lang.JoinPoint$StaticPart ajc$tjp_0;
+  
+  // Field descriptor #349 Ljava/lang/annotation/Annotation;
+  private static synthetic java.lang.annotation.Annotation ajc$anno$0;
+  
+  // Field descriptor #331 Lorg/aspectj/lang/JoinPoint$StaticPart;
+  private static final synthetic org.aspectj.lang.JoinPoint$StaticPart ajc$tjp_1;
+  
+  // Field descriptor #349 Ljava/lang/annotation/Annotation;
+  private static synthetic java.lang.annotation.Annotation ajc$anno$1;
+  
+  // Field descriptor #331 Lorg/aspectj/lang/JoinPoint$StaticPart;
+  private static final synthetic org.aspectj.lang.JoinPoint$StaticPart ajc$tjp_2;
+  
+  // Field descriptor #349 Ljava/lang/annotation/Annotation;
+  private static synthetic java.lang.annotation.Annotation ajc$anno$2;
+  

If I'm interpreting it correctly you are saying all those lines prefixed '+' are duplicates.  And this is in an environment using reweaving (the default), and not overweaving (which you'd have to turn on).

Given the intermittent nature, I'm not sure where I'll get to if you share the source with me.  Maybe I'll need to add some diagnostics to a special build of aspectj that you can then run with, which will trigger when this problem occurs.
Comment 2 Hermann Vosseler CLA 2013-02-25 17:36:32 EST
(In reply to comment #1)
> I looked at the attached diff, this seems the relevant bit:
.....


> If I'm interpreting it correctly you are saying all those lines prefixed '+'
> are duplicates.

Exactly.

org.aspectj.lang.JoinPoint$StaticPart ajc$tjp_0;
java.lang.annotation.Annotation ajc$anno$0;
  
org.aspectj.lang.JoinPoint$StaticPart ajc$tjp_1;
java.lang.annotation.Annotation ajc$anno$1;
  
org.aspectj.lang.JoinPoint$StaticPart ajc$tjp_2;
java.lang.annotation.Annotation ajc$anno$2;


These are present two times.



The other interesting observation (not sure if this is relevant)
is at the point, where the before advice has been applied.

   // Method descriptor #225 (Lbaag/boa/model/PufferDecide;Lbaag/boa/model/components/ExtTransId;Lbaag/boa/model/components/ProcessingInstruction;)V
   @baag.boa.common.interfaces.Scharfschaltung(value="DECIDE")
   public synchronized void Aenderungsdienst


In the "bad" classfile, I can see two invocations to the before advice
of the relevant aspect, while in the "good" classfile, there seems to be only one invocation (this is present in both files)

    100  invokevirtual baag.boa.esn.server.systemdienst.DerRoteKnopf.ajc$before$baag_boa_esn_server_systemdienst_DerRoteKnopf$1$d857e651(baag.boa.common.interfaces.Scharfschaltung, org.aspectj.lang.JoinPoint) : void [368]

....

(and the following is present only in the "bad" class)

+    203  invokevirtual baag.boa.esn.server.systemdienst.DerRoteKnopf.ajc$before$baag_boa_esn_server_systemdienst_DerRoteKnopf$1$d857e651(baag.boa.common.interfaces.Scharfschaltung, org.aspectj.lang.JoinPoint) : void [368]




> And this is in an environment using reweaving (the
> default), and not overweaving (which you'd have to turn on).

Yes, we are just using the defaults. The only non-standard(?) setting
we us is the "hasMember"


> Given the intermittent nature, I'm not sure where I'll get to if you share
> the source with me.  Maybe I'll need to add some diagnostics to a special
> build of aspectj that you can then run with, which will trigger when this
> problem occurs.

sounds like a good plan.
We'll try to use that for our regular development builds, and hopefully we'll catch the bug eventually....
Comment 3 Hermann Vosseler CLA 2013-06-19 15:13:16 EDT
BUMP

Just as a reminder: this bug continues to haunt us.
It is not just an annoyance. At times, we're almost unable to produce any successful build, as the failures tend to happen in row. Also, intermittently some of our team members weren't able to get a single successful build on their local PC during several days. Unfortunately, we couldn't find any tangible parameters triggering this behaviour (we tried to change between Java-6 or OpenJDK and we tried different AspectJ versions up to 1.7.2).

Also we weren't able to narrow it down to specific aspects. Both the affected classes and the advice directly linked to the duplicates visible in the bytecode diff seem to change randomly, as does the frequency of the incidents.

The only common denominator seems to be that the problematic behaviour happens in our final build step, where we re-apply all aspects contained in the individual modules and additionally apply some monitoring and management aspects, all in the same compilation and weaving step.
Comment 4 Hermann Vosseler CLA 2013-06-19 15:47:07 EDT
Created attachment 232570 [details]
Another diff of bytecode outlines, showing the duplicate fields and additional advice

Another example class bytecode diff.
The "evil" outline was taken from the classfile produced by the failing Build, while the reference outline was taken from the next successful build
Comment 5 Hermann Vosseler CLA 2013-06-19 15:48:38 EDT
Created attachment 232571 [details]
Yet another example diff between good and failed build results
Comment 6 Andrew Clement CLA 2013-06-19 20:07:22 EDT
Sorry for the delay Hermann, let me try and look at this asap.
Comment 7 Andrew Clement CLA 2013-06-25 12:45:35 EDT
The most recent diffs contain something I didn't see in the original (possibly just because it was an excerpt?) The duplicate WeaverState attribute - that is serious and may be what leads to the later problems. The WeaverState includes the recovery information that takes us back to the original class form against which we apply the full set of aspects. If a rogue second weaver state was introduced we may be failing to unwind to the original classfile form. Let me see if I can work out what might lead to that.
Comment 8 Andrew Clement CLA 2013-06-25 12:52:27 EDT
The intermittent nature of the failure *might* depend on which WeaverState is discovered for that final weave.  There is no defined ordering for how multiple WeaverState attributes are written out (it shouldn't happen!) so it might depend on what order they are pulled out of some collection.
Comment 9 Andrew Clement CLA 2013-06-25 13:04:17 EDT
There is one place where WeaverState attributes are created:

if (!world.isOverWeaving() || !myGen.hasAttribute(WeaverState.AttributeName)) {
  if (myType != null && myType.getWeaverState() != null) {
    myGen.addAttribute(Utility.bcelAttribute(
      new AjAttribute.WeaverState(myType.getWeaverState()), getConstantPool()));
  }
}
Comment 10 Andrew Clement CLA 2013-06-25 14:14:27 EDT
Added an assert to fire if we attempt to add a second weaver state and actually a few tests in the suite fail.

It seems if you pass the same aspect twice into the weaver (on different entries in the inpath, for example), then the duplicate weaverstate situation can arise.
Comment 11 Hermann Vosseler CLA 2013-06-26 10:33:56 EDT
This explanation makes sense.
I'd bet in our build setup the problematic final stage sees several aspects multiple times, through (transitive) dependencies at the inpath and added explicitly from the aspectpath.


What can we do to try out your new assert? Do we need to get a development build?
Comment 12 Andrew Clement CLA 2013-06-26 11:25:14 EDT
The problem is that I can only get it to happen in aspects in my experiments (and I can see why it might in aspects). I can't see how it might happen for regular classes, and your attached traces seem to show the problem in regular classes.
I can add asserts for when it attempts to do something wrong but that won't tell me how it got into the situation, just that it is going wrong. I'm trying to work out where to put debug to help me.

If you had it failing reliably I'd really try to get debug access to your setup, but I bet it is one of those things where you connect the debugger and it goes away...
Comment 13 Andrew Clement CLA 2013-06-26 13:24:53 EDT
> I'd bet in our build setup the problematic final stage sees several aspects multiple times

Do you think the final compile sees classes multiple times too, or only once? eg. the same class coming in via two different jars on the inpath?
Comment 14 Hermann Vosseler CLA 2013-06-26 13:55:07 EDT
From the aspect path and the inpath.

We're feeding several other maven modules, basically the whole core of this server component into the "weave dependencies" (that's the inpath in maven terminology). The result is one huge jar where we can be sure that any relevant aspects have been applied. The actual source classpath of this maven module holds some additional management and diagnostics aspects, which get applied here too.

But the funny thing was, we weren't able to get this build step to work, without adding 6 of the framwork modules to the aspectpath -- so these modules are present on both aspect path and inpath. Each of these modules uses and provides some fundamental aspects
Comment 15 Andrew Clement CLA 2013-06-26 14:00:30 EDT
I uploaded a modified 1.7.4.BUILD-SNAPSHOT to the maven.springframework.org:

<repository>
    <id>maven.springframework.org</id>
    <name>SpringSource snapshots</name>
    <url>http://maven.springframework.org/snapshot</url>
</repository>


Is that usable for you or do you need a downloadable copy?

This includes three places where we will throw an IllegalStateException if it looks like things are going wrong:

1) If we unpack a weaverstateinfo and on reverting to the original class file we discover it has AspectJ things in it already (checking for ajc$tjp_0).

2) If we set the 'original' form of a class file on a weaverstateinfo and that original form includes AspectJ things (ajc$tjp_0)

This will start to help us understand if it is reweaving that is getting us into the mess.

3) If we add a ajc$tjp_ field and discover there is already one there. I'm assuming one of the earlier cases will have triggered before this, but if they don't and we hit this it will tell us that reweaving isn't to blame. This should also cause the build that leads to the problem to fail rather than whatever consumes the broken class later.
Comment 16 Hermann Vosseler CLA 2013-06-26 14:18:52 EDT
to what extent are transitive dependencies relevant to this situation?

There are some classes, most notably our model beans, which are referenced for this build as transitive dependency, and are also present on both the inpath and the aspectpath, since our model module also contains an framework aspect and is targeted by other framework aspects from the other framework modules ;-)

The following is just a wild random guess: could it happen that the weaving step first treats one of these classes from the inpath (transforming it into a really different bytecode due to the weaving) and then later encounters another version of the same class through a transitive dependency of another module? Note the version visible as transitive dependency fetched from the maven artefactory has also already been woven by the same framework aspects in a previous maven build step. Especially we'd get duplicate versions of $AjcClosure1 inner classes (maybe these are investigated if a 3rd aspect is applicable to them?)
Comment 17 Hermann Vosseler CLA 2013-06-26 14:23:19 EDT
(In reply to comment #15)
> I uploaded a modified 1.7.4.BUILD-SNAPSHOT to the maven.springframework.org:
...

> Is that usable for you or do you need a downloadable copy?

let me check if I can reference or mirror it somehow into our local Nexus artefactory here (since we're behind a firewall). Otherwise I'll need something which I can upload into our local Nexus
Comment 18 Hermann Vosseler CLA 2013-06-26 14:39:10 EDT
Basically yes, I can access it

But the corresponding aspectjrt isn't available (maybe needs time to sync?)

Unresolveable build extension: Plugin org.codehaus.mojo:aspectj-maven-plugin:1.3.fix or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.aspectj:aspectjrt:jar:1.7.4-BUILD-SNAPSHOT, org.aspectj:aspectjtools:jar:1.7.4-BUILD-SNAPSHOT: Could not find artifact org.aspectj:aspectjrt:jar:1.7.4-BUILD-SNAPSHOT in nexus


Another question: will this custom diagnostics stay there for some time? so we could just change our aspectj dependecy version to point to it?
Comment 19 Hermann Vosseler CLA 2013-06-26 14:46:06 EDT
there was a typo '1.7.4-BUILD-SNAPSHOT' instead of '1.7.4.BUILD-SNAPSHOT'

But still the same problem. And the remote repository index display shows me only 1.7.3.BUILD-SNAPSHOT versions for http://maven.springframework.org/snapshot/
Comment 20 Andrew Clement CLA 2013-06-26 16:06:30 EDT
Sorry Hermann, shouldn't have kicked off that upload script 'assuming' it would work, silly thing.  it is up there now and named 1.7.4.BUILD-SNAPSHOT as you guessed.
Comment 21 Hermann Vosseler CLA 2013-06-26 18:08:36 EDT
Thanks.

Indeed, the new Version shows up in the remote index now, but we'll have to await the replication. We'll switch to this dev-version of AspectJ for our
development and continuous integration builds and will report back when we see the IllegalStateException...
Comment 22 Andrew Clement CLA 2013-06-27 17:41:38 EDT
> to what extent are transitive dependencies relevant to this situation?

transitive dependencies shouldn't figure into this.  The problem is only with classes that get woven and those must be directly passed to the compiler either on the inpath (not the aspectpath) or as source files.  My current thoughts are that we are seeing the same class twice on the inpath and that is leading to the double weaving but in none of my testing can I trigger a route through that would cause it.
Comment 23 Hermann Vosseler CLA 2013-07-01 12:14:08 EDT
so we caught some ajcore dumps produced by the new diagnostics.

They happened -- as usual -- in the final build step; within this build the failure was somewhat reproducible, happening roughly on every second build attempt. 

However, after a complete pristine rebuild with a clean maven repo the problem went away, which indicates the condition triggering the incidents might indeed be buried somewhere in the build products consumed by this last step.


I've sent the actual core dump files via mail to Andrew Clement
Comment 24 Andrew Clement CLA 2013-07-02 18:56:40 EDT
It actually hit the third spot where I included a new check:

Bug401333: Inserting duplicate field ajc$tjp_0 into XXX

Further up the trace we see XXX coming in as a reweavable type:

info processing reweavable type XXX 
info successfully verified type YYY exists.  Originates from YYY.aj
info successfully verified type ZZZ exists.  Originates from ZZZ.aj

importantly I also don't see two attempts to weave XXX (at least I don't see two info messages for weaving it).

What is the easiest way for you to consume a further debug build? Through maven or shall I just give you the new jars? (tools/weaver)

I need to think about where to log this time, it may have to be much more verbose as we need to see what leads up to the problem.
Comment 25 Hermann Vosseler CLA 2013-07-03 13:34:08 EDT
> What is the easiest way for you to consume a further debug build? Through
> maven or shall I just give you the new jars? (tools/weaver)

Maven please, if possible. Maybe you're able to assign a specific suffix to the version number? like e.g. "1.7.4.bug401333-SNAPSHOT"

Since this way we'd be able to switch our ${aspectj.version} in the root pom, causing all of our development and integration builds to run on this specific compiler version, thereby increasing the probability to trigger the problem ;-)
Comment 26 Andrew Clement CLA 2013-07-09 16:04:25 EDT
Pushed out a new debug version using your scheme:

1.7.4.bug401333-SNAPSHOT

From the class file you sent me I observed that the class file does *not* look broken on its way into the final weave (so a previous compile did not leave it in a damaged state).  The reweavable data is correct and once recovered the class file does not contain ajc$tjp_0.

This suggests that somehow we are double weaving the type in that final build - the int used to provide the ajc$tjp_NNN suffix is maintained in a class called LazyClassGen. If a new LazyClassGen is created then the counter resets.  I've added some debug in this build that should be included with the exception that was previously occurring, it will tell us how many LazyClassGens are being built and the thread stack when the construction happens.
Comment 27 Hermann Vosseler CLA 2013-07-09 18:07:18 EDT
thanks, we'll start using this for our development builds....
Comment 28 Hermann Vosseler CLA 2013-07-15 15:33:15 EDT
we had some problems fetching this new version via maven. Our Nexus repository manager couldn't download the jars and got an "authorisation required". If we try to browse this URL via HTTP, we get an prompt for autorisation at

http://maven.springframework.org/snapshot/org/aspectj/aspectjrt/1.7.4.bug401333-SNAPSHOT/maven-metadata.xml.md5

while we can access the following file just fine via HTTP:

http://maven.springframework.org/snapshot/org/aspectj/aspectjrt/1.7.4.BUILD-SNAPSHOT/maven-metadata.xml.md5


Probably the reason is that you used a symlink?


Anyway, we were able to fetch the JARs via

http://shrub.appspot.com/maven.springframework.org/snapshot/org/aspectj/



My intention to propose using a specific version number was just an attempt to keep matter simple. If the most recent build at version 1.7.4.BUILD-SNAPSHOT does indeed contain your diagnostics, we can as well use this ID in our pom
Comment 29 Hermann Vosseler CLA 2013-07-15 15:34:12 EDT
but most importantly -- we captured a new stacktrace (sent per PM) with your additional diagnostics
Comment 30 Hermann Vosseler CLA 2013-07-17 18:26:01 EDT
interesting observation regarding this bug:

one of our buildservers was configured with a single CPU. Last week, we reconfigured the VM to expose multiple cores. After this change, *every* build failed and produced the ajcore and diagnostics.

Today, as an experiment, we reconfigured the VM back to use a single CPU. And guess what: all builds since then went through without triggering the diagnostics.

While indeed we *did* observe this problem occasionally in the past, even with a single CPU, the presence of real parallelism somehow seems to have increased the probability of its formation.
Comment 31 Hermann Vosseler CLA 2013-08-07 15:49:32 EDT
some new information....

Am 24.07.2013 17:45, schrieb Andy Clement:
> I had another thought yesterday that our type demotion system (which throws
> things out to save memory and brings them back in later) might be part of the
> issue here, possibly in concert with your situation of weaving things multiple
> times (as they come in from multiple jars). It can be turned off via
> -Xset:typeDemotion=false but I'm not sure you can set those options via
> maven... so I've just pushed up a new snapshot debug build to our maven that
> has demotion forcibly turned off.  My simple test programs may not have led to
> type demotion occurring as some weakreference data structures are used with
> reference clearance being up to the JVM.  (No demotion may mean higher memory
> requirements - but hopefully we can still get a build run through to see if
> this is part of the problem).  If you could try it, it'll be a bit of extra info.
> 

After using this new debug build for about one week, yesterday and today we observed 3 new error incidents on two different machines.

Obviously the problem happens less frequently than with the standard settings, but this might just be due to the different load pattern and the increased memory requirements.