Bug 39470 - Repeating a compilation multiple times produces class files that vary in size
Summary: Repeating a compilation multiple times produces class files that vary in size
Status: RESOLVED FIXED
Alias: None
Product: AspectJ
Classification: Tools
Component: Compiler (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: 1.1.1   Edit
Assignee: Jim Hugunin CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-30 06:28 EDT by Andrew Clement CLA
Modified: 2003-08-28 08:06 EDT (History)
1 user (show)

See Also:


Attachments
New version of the BCEL MethodGen class that fixes the bug. (30.83 KB, text/plain)
2003-07-25 09:53 EDT, Andrew Clement CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Clement CLA 2003-06-30 06:28:19 EDT
I've just started using AJC to compile a large project, each time I ran 
exactly the same compilation, the resultant jar of all my classes was a 
different size.  With JAVAC, the resultant jar was always the same size.  

I can't quite work out what is causing it - When the size varies, it is always 
by a multiple of 7 bytes (sometimes 7bytes sometimes 14bytes), I thought it 
was the order I passed them to the compiler, but now I've just seen it vary 
just by me repeating compilation with exactly the same command line.

Here is a java file and an aspect that shows the problem:

public class MyClass {

  public static void main(String[] argv) {
    int foo1 = 1;
    String foo2 = "2";
    Integer foo3 = new Integer(3);
    String foo4 = "4";
    callfoo(foo1,foo2,foo3,foo4);
  }

  public static void callfoo(int input1,String input2, Integer input3,String 
input4) {
    bar_1(input1);
    bar_2(input2);
    bar_1(input3.intValue());
    bar_2(input4);
  }

  public static void bar_1(int i) {}
  public static void bar_2(String s) {}

}

And here is an aspect:

public aspect Tracer {

  public pointcut rasScope() : within(*);
  
  pointcut toStringMethod() : execution(* *.toString());
  
  pointcut publicMethods() : rasScope() &&
    execution( public * *(..)) && !toStringMethod();
    
    after() returning:  publicMethods() {
      System.err.println(thisJoinPointStaticPart.getSignature().toLongString
());
    }
}


1) Erasing all class files and compiling with 'ajc Tracer.java MyClass.java' 
results in:
  MyClass.class 2196 bytes
  Tracer.class  1856 bytes

2) Now, erasing all class files and compiling with 'ajc Tracer.java 
MyClass.java' results in:
  MyClass.class 2196 bytes
  Tracer.class  1856 bytes
So, they are the same size..

3) Now, erasing all class files and compiling with 'ajc Tracer.java 
MyClass.java' results in:
  MyClass.class 2182 bytes
  Tracer.class  1856 bytes
Class file now 14bytes smaller.

I have decompiled the resultant classes and the difference appears to be the 
kind of debug information included in the classes, for example, the method 
signature from the original source:
public static void callfoo(int input1,String input2, Integer input3,String 
input4) {

Comes out as this in some cases:
public static void callfoo(int arg0, String input2, Integer arg2, String arg3){

Or this in other cases:
public static void callfoo(int arg0, String input2, Integer arg2, String 
input4){

So, in the second case, the class file remembers the final parameter was 
called 'input4'.

I'm concerned by this behaviour - I need to convince some groups to swap from 
using JAVAC to using AJC - and I'll have trouble if the size of the resultant 
binaries appears non-deterministic like this...

I hope this isnt a dupe, I had a quick look in the buglist but couldnt find 
anything...

Andy.
Comment 1 Jim Hugunin CLA 2003-07-22 20:37:48 EDT
This bug could indicate a serious problem under the hood.  I'm raising it to a 
P2 to be sure it is investigated before 1.1.1.  This investigation may 
conclude that the bug is hard to fix and non-sinister which would get its 
priority lowered before the release.  I'll put in a little time tonight on a 
first investigation.
Comment 2 Andrew Clement CLA 2003-07-25 04:31:07 EDT
A few comments from looking into it ... might be useful?  I've learned a lot if 
nothing else :)

It seems a little odd that for some methods the targeters returned for an 
instruction handle contain duplicate slot entries.  In the case I was just 
looking at I saw two entries for type Tracer, name 'this' *both* slot 0.  In 
other cases there are also multiple entries for the same slot number, with the 
same type *but* different 'names'.  I think this is how we are losing some of 
the names?  Usually when there are duplicate entries, one is a generated name 
(e.g. arg0) whilst another will be from the localvariabletable in the class 
file (e.g. 'input1').

It appears to be when a MethodGen is built, we do some construction of 
LocalVariableGen objects - the ctor for LocalVariableGen calls setStart() which 
does a BranchInstruction.notifyTarget() - this call puts entries into the 
targeters set.  In the case of duplicate slots, we build two localvariablegen 
instances and the targeters set is just extended, rather than checked to see if 
a variable for that slot already exists and removing it first.  Because 
processing of the localvariabletable happens last, data in it contains more 
information about the localvars (i.e. names!) and this data should replace 
anything we've already built up.

Its very awkward to try mess about in the code in this area because its BCEL 
stuff.
Comment 3 Andrew Clement CLA 2003-07-25 06:25:44 EDT
Progress I think.  Persuing that idea from before - I made a patch to a BCEL 
class, this is the new main bit of code in 
org.apache.bcel.generic.MethodGen.addLocalVariable()

LocalVariableGen l = null;
	
// Check if this slot already has an entry - if so then lets update that
// entry with the 'new name' - this assumes that the 'new name' is more
// useful than any existing name.  This is probably true as we parse
// the local variable table last?
boolean found = false;
Iterator it = variable_vec.iterator();
while (it.hasNext()) {
  l = (LocalVariableGen)it.next();
  if (l.getIndex()==slot && l.getType().equals(type) ) {
    // Should we be checking that l.getStart().equals(start) and 
    // l.getEnd().equals(end) are the same too?
    //System.err.println("Processing duplicate: CurrentName="+l.getName()+"  
NewName="+name);
    l.setName(name);
    found = true;
  }
}
	
// If we didn't find it, then lets build a new one.
if (!found) {
  l = new LocalVariableGen(slot, name, type, start, end);
    
  int i;

  if((i = variable_vec.indexOf(l)) >= 0) // Overwrite if necessary
    variable_vec.set(i, l);
  else
    variable_vec.add(l);
}

As you can see, it attempts to update an existing entry for a slot if it finds 
one, before creating a new variable (which causes the duplicate targeters 
entries).

With the fix, the sizes of the classes are consistently:

2168 MyClass.class
1854 Tracer.class

Over about 200runs of ajc.

I'm currently building the large codebase that revealed the problem to me 
originally - to check it gives me consistent results now.

I haven't submitted a patch yet, I'm not sure how to when its a modification of 
an internal BCEL class - whats the process for addressing bugs like this?

I guess this means the targeter set will generally be smaller - whether this 
will influence anything else (faster compiling?) - I dont know ...
Comment 4 Andrew Clement CLA 2003-07-25 09:52:17 EDT
Final append from me on this (promise!).  I've just built my 2500 java files 
into a jar - this used to give me a size that varied by 14k each time I built 
it with ajc.  With the patch I describe above, the size remains constant across 
repeated compiles (7,109,999 bytes).

I can imagine a better fix than mine - perhaps where 'variable_vec' is changed 
to a data structure that is indexed by the slot number (so we can quickly see 
if there is already something in the slot - without building a LocalMethodGen 
instance) - but my fix is small and contained.  I'll attach the complete new 
MethodGen file to this bug for completeness.
Comment 5 Andrew Clement CLA 2003-07-25 09:53:59 EDT
Created attachment 5556 [details]
New version of the BCEL MethodGen class that fixes the bug.
Comment 6 Erik Hilsdale CLA 2003-08-05 02:20:07 EDT
This bug is fixed in 1.1rc1, in that I've gotten rid of the non-deterministic
local variable naming.  Multiple compiles will once again generate the 
exact same classfiles.

I made a workaround in LazyMethodGen to avoid using the (correct) 
BCEL patch Andy provided, but there are still issues with BCEL 
generating sub-optimal variable names.  It may be that to fix _those_ 
issues, we'll have to use the Andy's BCEL patch and damn the build 
consequences.  But since consistently good variable names doesn't seem
to be 1.1 blocker material, I've added a new P3 bug and am requesting 
this one be closed (I can't close it since I think I "accepted" 
the bug when it was assigned to Jim, thus "accepting" it for Jim.  Weird).
Comment 7 Jim Hugunin CLA 2003-08-05 17:03:16 EDT
Fixed by Andy and applied by Erik.
Comment 8 Adrian Colyer CLA 2003-08-28 08:06:15 EDT
updated target milestone field to 1.1.1