Bug 410948 - Tycho (apparently) uses old or non-standard jar routines
Summary: Tycho (apparently) uses old or non-standard jar routines
Status: RESOLVED WORKSFORME
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Tycho (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-17 14:44 EDT by David Williams CLA
Modified: 2021-04-28 16:51 EDT (History)
3 users (show)

See Also:


Attachments
output from zipinfo -v on jar in eclispe repository (91.88 KB, text/plain)
2013-06-17 14:45 EDT, David Williams CLA
no flags Details
output from zipinfo -v on jar in kepler repository (92.69 KB, text/plain)
2013-06-17 14:46 EDT, David Williams CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2013-06-17 14:44:41 EDT
I'm opening up this bug as something of a "continuation" of bug 408944. Martin looked at one of the Orbit bundles in detail ... to find out why a "diff" reports the jars as different, even though once unjarred, there is no difference, and he found a pretty good answer for that. Tycho, though, doesn't "touch" the jars it gets from Orbit, and simply passes them along. 

Looking deeper into one of our eclipse jars, I found larger differences, that implies Tycho is using an older or non-standard routine to create jar files, if I am interpreting the output of "zipinfo" correctly. 

I was trying to confirm Martin's findings, and decided arbitrarily to look at org.eclipse.core.jobs_3.5.300.v20130429-1813.jar. Looking closely, one would notice the one in our Eclipse repository is slightly different size than the one in the common sim. rel. repository ... 144 bytes different. The only difference (should be) is that the one in the common repository is created from the pack.gz file .... essentially "re-creating" it. This was done this way as a check that the pack.gz file was indeed a valid one. 

If in fact one "unzips" the jars, and does a 'diff -r' on the unzipped content, then diff says they are identical. 

But, being of different size jars, and diff saying the jars are different motivated me to look at each with "zipinfo -v" to see if I could pinpoint the difference ... it would have to be more than the value of the one language bit, that Martin discovered, since different sizes. 

I'll attach some output from zipinfo -v but in brief, where 
'<' represents "pure" jar from Tycho, (running under Java 1.7) and 
'>' represents jar created from unpack200 (running under Java 1.6). 


<   version of encoding software:                   1.0
---
>   version of encoding software:                   2.0
122,123c125,127
<   minimum software version required to extract:   1.0
<   compression method:                             none (stored)
---
>   minimum software version required to extract:   2.0
>   compression method:                             deflated
>   compression sub-type (deflation):               normal
125c129
<   extended local header:                          no
---
>   extended local header:                          yes


It seems that Tycho is using something "older" (1.0, vs. 2.0) with no compression? and no "extended local header" ... which I'd assume is where the "language encoding bit" would be, if there was one? 

Noticing this differences, thought I'd open this bug under Tycho. Ideally, we'd have identical jars in the various repositories, and its not completely clear to me why we do not, but seemed it might be related to the "method" Tycho uses to create them?
Comment 1 David Williams CLA 2013-06-17 14:45:59 EDT
Created attachment 232458 [details]
output from zipinfo -v on jar in eclispe repository
Comment 2 David Williams CLA 2013-06-17 14:46:36 EDT
Created attachment 232459 [details]
output from zipinfo -v on jar in kepler repository
Comment 3 Igor Fedorenko CLA 2013-06-18 01:35:04 EDT
I think this report conflates two separate issues -- jars in releases repositories do not include extended file attributes and multiple versions of the jar are present in various release repositories. I also think neither of these two issues are related to Tycho.

For extended file attribute, here is what I see for m2e 1.4 

$ zipinfo -v ~/downloads/technology/m2e/releases/1.4/1.4.0.20130601-0317/plugins/org.eclipse.m2e.core_1.4.0.20130601-0317.jar

...

Central directory entry #42:
---------------------------

  org/eclipse/m2e/core/lifecyclemapping/model/PluginExecutionAction.class

  offset of local header from start of archive:     29821 (0000747Dh) bytes
  file system or operating system of origin:        MS-DOS, OS/2 or NT FAT
  version of encoding software:                     2.0
  minimum file system compatibility required:       MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:     2.0
  compression method:                               deflated
  compression sub-type (deflation):                 normal
  file security status:                             not encrypted
  extended local header:                            yes
  file last modified on (DOS date/time):            2013 May 31 23:17:44
  32-bit CRC value (hex):                           3e8a243f
  compressed size:                                  628 bytes
  uncompressed size:                                1361 bytes
  length of filename:                               71 characters
  length of extra field:                            0 bytes
  length of file comment:                           0 characters
  disk number on which file begins:                 disk 1
  apparent file type:                               binary
  non-MSDOS external file attributes:               000000 hex
  MS-DOS file attributes (00 hex):                  none

  There is no file comment.

...

This jar was produced by Tycho build and from what I can tell it is compressed and with extended file attributes.

As for multiple jar versions, Tycho is expected to use prebuilt atrifacts as is, without any modifications. So if Tycho build needs to include prebuilt jar and jar.tar.gz in a p2 repository produced by the build, original jar and jar.tar.gz files are expected to be included, not some transformed/repackaged files. I believe the only way to guarantee the same exact jar/jar.tar.gz are present in all repositories is to make sure all tools involved in production of these repositories follow the same approach as Tycho. It is unreasonable to expect that different tools will be able to unpack and repack binary identical jar files.

At this point I don't know what is causing the behaviour you observe and not even sure if it is caused by some not-always-reproducible bug(s) in Tycho or the way Platform build is setup. I maybe able to have a closer look at jar file format issue, but somebody will have to provide small complete example project and exact steps to produce an "old" jar format with Tycho.
Comment 4 Jan Sievers CLA 2013-06-21 10:28:39 EDT
(In reply to comment #0)
> It seems that Tycho is using something "older" (1.0, vs. 2.0) with no
> compression? 

I cannot reproduce this claim using tycho 0.18.0.

1. git clone http://git.eclipse.org/gitroot/tycho/org.eclipse.tycho-demo.git
2. cd itp04-rcp
3. mvn clean package
4. cd itp04-rcp/eclipse-repository/target/repository/plugins
5. zipinfo -v example-bundle_*.jar | grep "version of encoding"
yields   
"version of encoding software: 2.0" 
for all entries
6. zipinfo -v example-bundle_*.jar | grep "compression method"
yields 
"compression method: deflated"
for all file entries


> and no "extended local header" 

zipinfo -v example-bundle_*.jar | grep "extended"
extended local header:                          no

that's true. However I'm not sure whether this header should be there or not, or whether it matters at all.

FWIW tycho is using the plexus-archiver 2.2 component [1] to create jar files (same as pretty much any standard maven plugin out there).
plexus archiver is not using java.util.zip/jar but rather its own zip stream implementations.
plexus archiver is also used to create the product zip/tar.gz archives.

Overall as long as tycho behaves consistently and in a reproducible way, I don't see anything to be concerned about in Tycho.
As far as I can see Tycho does behave consistently thanks to using a fixed version of plexus archiver. This is in contrast to java.util.jar and pack200, where apparently you get whatever patchlevel is  provided by the JDK you happen to run your build with.

If you see differences between original tycho build output and the output of whatever post-processing/aggregation etc. that you run after the tycho build, I would say this is not a tycho problem.
I could imagine pack200 is part of the problem as it aggressively re-orders/optimizes class files, but hard to say without a sample project.

As Igor said, we would need a small example with steps to reproduce, otherwise this looks to me like "works for me" as far as Tycho is concerned.

[1] http://search.maven.org/#artifactdetails%7Corg.codehaus.plexus%7Cplexus-archiver%7C2.2%7Cjar
Comment 5 David Williams CLA 2013-06-21 12:13:33 EDT
It will take me a while to think through your comments ... I've already learned a lot from them, thanks! ... but wanted to leave a comment so you'd know I hadn't lost interest. 

I agree if the behaviour is correct (such as, after un-jarring, contents are identical) then it is not a huge problem, but I still think having identical binary content a good and valid goal. The reason I think that is that, after years of working with consumers of our bundles, the least little deviation in binary identical jars has to be explained. Sometimes there are perfectly good explanations, sometimes not ... but in any case it causes a lot of extra work, from me! 

I'm not familiar with "plexus-archiver" or exactly why you think its better than using the jar routines in Java (I can imagine, in the past, Java 1.4 and previous, it might have been!) but at least conceptually, this seems no different than "compiler vendor and version" ... some people might prefer one, some might prefer another. 

I'll try to do reading-up on "plexus-archiver" and some experiments to see if its doing what I think its doing ... but, my hypothesis is it decides those headers just use ascii characters so it can save space by using they older spec version of non-extended headers. Myself, I'd prefer identical jars, over the small space savings ... but can imagine those working on applets or something might see any space savings as a big plus. 

I've started a small demo to test this hypothesis, but will take be a while to finish (as lots of other stuff to do). 

Thank you sincerely for your comments and "education" on how Tycho works.
Comment 6 Tobias Oberlies CLA 2013-07-04 07:08:09 EDT
(In reply to comment #4)
> As Igor said, we would need a small example with steps to reproduce, otherwise
> this looks to me like "works for me" as far as Tycho is concerned.

Feel free to re-open if there is a concrete, reproducible problem.