Bug 352923 - DefaultBundleReader unzips the same dependency several times
Summary: DefaultBundleReader unzips the same dependency several times
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Tycho (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Jan Sievers CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2011-07-23 00:02 EDT by Anthony Dahanne CLA
Modified: 2021-04-28 16:55 EDT (History)
3 users (show)

See Also:


Attachments
fix to extract only once each nestedjar (1.36 KB, patch)
2011-07-25 12:12 EDT, Anthony Dahanne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Anthony Dahanne CLA 2011-07-23 00:02:51 EDT
Build Identifier: 0.13.0-SNAPSHOT

Hello !

Before building, Tycho core resolves the classpath of each plugin to compile, they will then be added to the maven reactor.

During this phase (getBundleClasspath), Tycho will dig into each dependency of the project; in particular it will look for all the extra runtime libraries defined in build.properties (called nestedJars)

getBundleClasspath : 
https://github.com/sonatype/sonatype-tycho/blob/master/tycho-core/src/main/java/org/eclipse/tycho/core/osgitools/OsgiBundleProject.java#L312



Everytime Tycho finds a nestedJar (getEntry), it unzips the containing jar (in the .cache directory), and adds the nested jar location (now inside a .cache/exploded_dependency  directory) to the classpath.


getEntry : 
https://github.com/sonatype/sonatype-tycho/blob/master/tycho-core/src/main/java/org/eclipse/tycho/core/osgitools/DefaultBundleReader.java#L200


Here lies the performance problem :
* if the dependency contains 10 nested jars, then tycho will unzip 10 times this dependency
* if this same dependency is needed across 20 modules, then tycho will unzip it 20*10 =200 times before even publishing the reactor

Unziping a jar is usually fast, but I found this to be a serious performance issue as I compiled a project that needs a dependency (an OSGi bundle) that contains 7000+ classes (native unzip utility on my machine takes 45 seconds to unzip this jar, multiply by 200, and you have the extra time it takes to run tycho for this project...).

Sure the dependency I'm talking about is unusual (legacy stuff you know...), but still, even for other projects tycho will be faster if it just unzips the dependency once when adding nestedJars to the classpath.

I suggest that Tycho only unzips the nested jar to the .cache directory once; then if the directory is already created (outputDirectory .exists()) skip the unzip (zipUnArchiver.extract(path, outputDirectory);)and just add the location to classpath (result = new File(outputDirectory, path); ).

I could send you a patch (or pull request on github ?) if this not clear enough.
Removing the unzip step dramatically speeded up my build.

Reproducible: Always

Steps to Reproduce:
1. launch a tycho build on a project that depends on OSGi bundles that contains extra runtime library (libs defined in the build.properties)
2. tycho will unzip those bundles for each of their nestedJars, and this for module
Comment 1 Jan Sievers CLA 2011-07-25 04:25:05 EDT
your suggestion makes sense. 

The only case I can imagine where the optimization fails is if there are overlapping classpath entries of type directory, e.g. 

Bundle-ClassPath: lib/lib.jar, lib/

If we extracted lib/lib.jar we will not extract lib/ recursively because the directory already exists...
Comment 2 Anthony Dahanne CLA 2011-07-25 12:12:47 EDT
Created attachment 200293 [details]
fix to extract only once each nestedjar

Jan,
you're right that this solution (checking the directory exists) is not valid; even more because the unArchiver extracts from the bundle just the required lib;so if you have 
Bundle-ClassPath: lib/lib.jar, lib/lib2.jar
then the first call to 
zipUnArchiver.extract(path, outputDirectory);
just extracts lib.jar but not lib2.jar ...

I just orignally thought that zipUnArchiver was extracting the whole jar each time.

So, the enhancement could be just about extracting those libs once (not for every module in the reactor that needs it)

I added a patch, please tell me if you think this is acceptable.
Comment 3 Jan Sievers CLA 2011-08-01 10:20:48 EDT
(In reply to comment #2)
> I added a patch, please tell me if you think this is acceptable.

your patch has no tests to prove it works and it introdues the bug I described in comment#1.

commit  0bf42ea85b5 implements an extraction cache that handles both nested dirs and jars properly.
Comment 4 Anthony Dahanne CLA 2011-08-01 10:33:30 EDT
Hello Jan,
Thanks for reviewing my patch and committing an enhanced version + tests.