Bug 180219 - Core Headless Startup performance decreased for Eclipse 3.3
Summary: Core Headless Startup performance decreased for Eclipse 3.3
Status: RESOLVED WONTFIX
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 3.3   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: platform-runtime-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2007-03-30 11:35 EDT by Oleg Besedin CLA
Modified: 2009-09-08 17:12 EDT (History)
9 users (show)

See Also:


Attachments
Simple test class (1.39 KB, text/java)
2007-04-03 16:41 EDT, John Arthorne CLA
no flags Details
illustrative patch (2.53 KB, application/octet-stream)
2007-04-18 16:45 EDT, Thomas Watson CLA
no flags Details
Patch (10.34 KB, patch)
2007-04-26 18:01 EDT, Oleg Besedin CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Besedin CLA 2007-03-30 11:35:06 EDT
The Core Headless Startup tests spikes up and down on all platforms. In some cases headless takes twice as long in M6 comparing to Eclipse 3.2.
Comment 1 Oleg Besedin CLA 2007-03-30 11:43:44 EDT
The most likely cause of the slowdown is the introduction of JAR signing.

Positive confirmation of this hypothesis:

- On all test machines problems started going from the build I20061102-1715 to the build I20061107-0800. That's when the JAR signing was enabled.

- There are several spikes where startup test performance returns to "normal" on all machines (most noticeably, on I20070206 and I20070319). On those days our build system was having problems with signing and signing was turned off.

Negative confirmation of this hypothesis:

- No relevant changes in the source code

- No outliers in tests that could explain such drastic changes (i.e., it is not like the first test run is way too slow and pulls the average)

The strange portion:

- It seems that some builds even as they are signed (for example, the official M6 build), don't show performance degradation. 

We need to investigate why some builds don't show the performance degradation even though they are supposed to be signed.

Setting priority to P1 for now as this could have significant impact on the startup time of small RCP applications.
Comment 2 John Arthorne CLA 2007-04-03 16:38:40 EDT
The performance cost of JAR signing on startup is expected, but I don't have an explanation for the recent inconsistency. The expected cost is due to the fact that URLClassLoader will always perform signature verification when it loads classes from a signed JAR.  Since we use URLClassLoader in Main to invoke the framework, we pay the performance cost.  The rest of Eclipse is loaded using OSGi class loaders that do not perform signature verification by default.

When I investigated last November, I found the performance hit was quite consistent, either during Eclipse startup, or a simple test that just loaded a single class from a signed JAR. On a 2.5GHz Pentium with 768 MB of RAM, I was seeing core headless startup test running in 3.476 seconds with signatures removed, and 4.116 seconds with signatures added.
Comment 3 John Arthorne CLA 2007-04-03 16:41:58 EDT
Created attachment 62831 [details]
Simple test class

Here is a simple test class that I used in the past to simulate Equinox startup. This test showed a similar performance degradation as Eclipse startup when running with a signed vs. unsigned OSGi JAR.
Comment 4 Oleg Besedin CLA 2007-04-18 14:13:18 EDT
==> Apples and Oranges: Why Test Results Are So Strange <== 

After quite a bit of research it appears that, indeed, headless startup is slower in Eclipse 3.3 comparing to Eclipse 3.2.

The test results we see from the current performance tests are inaccurate. The main problem (and there are many others :-)) is that “-dev bin” option was removed from the Eclipse 3.3 testing framework but not from the Eclipse 3.2 testing framework. 

Removal of that option accounts for about 35% of the time measured by this test and represents the miraculous improvement we see in tests around February 20, 2007.

(The removal of the “-dev bin” option from the test framework was a good thing; but in order to get meaningful comparison to 3.2 same change needs to be done in the 3.2 framework.)

If we measure the “real” startup time which I define the time between we get into Main.main() and headless application start, this time increased by about 40% from Eclipse 3.2 on my WinXP machine.

The major drop in performance (about 35% or “real” time or about 80% of the time as measured by tests) was introduced by JAR signing initially enabled on November 7, 2006. 

The measurement or “real” startup time shows that were was another degradation of about 10% between M3 and M6.

There are two separate issue performance-wise with JAR signing: main class loader and reading manifests.

==> The JAR Signing : URLClassLoader <== 

The slowdown in the “normal” startup of about 30% is caused by URLClassLoader verifying contents of the org.eclipse.osgi JAR.

The classloader in question is the  Main$StartupClassLoader which extends URLClassLoader. It is used to load most (all?) of the OSGi. In particular, profiler shows two specific places affected by it:
org.eclipse.core.launcher.Main.invokeFramework
EclipseStarter.createAdaptor()
URL.openStream (the same StartupClassLoader used to find resource here)

I am not sure how much we can do here. It does not seems to be possible to easily modify URLClassLoader to omit verification. Providing our own version of URLClassLoader is, at best, risky.

==> The JAR Signing : Manifest.mf <== 

The increased manifest.mf file does not seem to affect normal startup, but does affect startup with “-dev” (that’s why original jump in the test time was about 80%, not 30%).

This is caused by two places that parse manifest.mf files entirely. 

The first places (in the update.configurator) is triggered for every bundle installed. It is in: 
ConfigurationActivator.initialize()
	org.eclipse.update.internal.configurator.BundleManifest.parse(InputStream) 


The second place seems to be triggered by the update.configurator for every bundle on the initial bundle list:
ConfigurationActivator.installBundles()
		org.eclipse.core.runtime.internal.adaptor.EclipseStorageHook.hasPackageInfo(URL)
		java.io.BufferedReader.readLine()								java.lang.String.startsWith

While those two places won’t have effect on the time as currently measured by the startup test, they will impact all runs with “-dev” which, from what I understand, includes debugging in Eclipse. The guesstimate is that they might add about 1 sec to the startup of a debug session of 200 Mb signed Eclipse-based application.

Those two places could be fixed by introducing intermediate input streams that would stop reading after encountering the first empty line. Testing of this approach with a hacked implemention of this partial read shown that it practically eliminates extra time caused by the combination of “-dev” and signed JARs. (There is still about 2% difference likely caused by the extra level of indirection in reading the input streams and extra processing to detect the empty line).

==> Still Unknown <== 

This all well describes the picture we see on WinXP machines, but Red Hat machines still remain a bit of a mystery as they did not show a drop after “-dev” option was removed, but rather being oscillating between “good” and “bad” results.

Another 10% slowdown happened between introduction of JAR signing and M6. 

Comment 5 Jeff McAffer CLA 2007-04-18 14:33:59 EDT
Good summary.  Can you open separate bug reports for the issues outlined (and note them here) or point this bug at pre-existing bugs if any?

It seems like the manifest reading issues can be addressed.  In paticular the one in the EclipseStorageHooks.  The creation of a new clasloader that can be used to read and *optionally* verify the OSGi JAR would be great but as you say, challenging.

Context note:  Can you give some time quantifiers on the performance changes?  Percentages are interesting but this particular test does very little work so we will see large, fixed percentage changes but in set in the grand scheme of starting Eclipse it may look quite different.

Can you also confirm that simply deleting the signing related files from the org.eclipse.osgi JAR recovers all/most of the lost time?
Comment 6 Thomas Watson CLA 2007-04-18 15:00:37 EDT
Introducing intermediate input streams that would stop reading after encountering the first empty line is not acceptable for org.eclipse.core.runtime.internal.adaptor.EclipseStorageHook.hasPackageInfo case.  Package info can be declared after the main section in a jar manifest file.  Perhaps we could stop at the first section that contains signer entries, but I don't think the jar manifest spec requires an order for the sections after the main section.
Comment 7 Jeff McAffer CLA 2007-04-18 15:07:40 EDT
is this only an issue in -dev mode or all the time?
Comment 8 Oleg Besedin CLA 2007-04-18 15:32:40 EDT
To clarify, numbers below are for the "real" startup time which I define as the interval between entry into Main.main() and entry into the headless Eclipse application.

The absolute values closely match John's finding - JAR signing adds about 0.3sec to the headless startup time on the WinXP 2GHz 2Gb machine.

The total slowdown of M6 comparing to Eclipse 3.2 is 0.4 sec or 40%. Removing signature from org.eclipse.osgi brings this down to 0.1 sec or 10%.

So, removal of the signature from org.eclipse.osgi will counteract most of the slowdown. About 10% or 0.1 sec was added somewhere else after JAR signing.

(Those numbers are let's say +/-5% and +/-0.05 sec.)
Comment 9 Thomas Watson CLA 2007-04-18 15:35:22 EDT
For EclipseStorageHook.hasPackageInfo this is only an issue when an attempt is
made to install a bundle.  Otherwise the information is cached for previously
installed bundles.  In the -dev case update.configurator always tries to
consolodate the installed bundles with what it thinks it is managing with
features.  The issue is it always tries to manage the bundles on the
osgi.bundles list because these bundles are included in features it is
managing.  This causes an install exception because the bundle is already
installed.  Currently update catches the exception and logs the exception only
if the bundle is not on the osgi.bundles list.

This happens each time update.configurator checks the configuration.  When
using -dev update.configurator always checks the configuration.

Regarding the 10% slowdown, do not forget we added another bundle (org.eclipse.equinox.app) which is always getting started with each restart.  That may contribute to a small portion of the slowdown.
Comment 10 Oleg Besedin CLA 2007-04-18 16:07:01 EDT
As Tom said, the two issues with manifest.mf reading are specific to the "-dev" mode.

In the "-dev" mode:

- The EclipseStorageHook.hasPackageInfo() is triggered for bundles present on the osgi.bundles list and org.eclipse.osgi (in the default installation, 4 bundles). It's timing is going to remain constant regardless of the number of "regular" bundles installed.

As such, fixing it would be a lower priority.  

- The BundleManifest.parse() from update.configurator is triggered for every installed plugin and its timing is going to increase with the number of bundles installed. 

As such, BundleManifest.parse() would be rather nice to fix. Does it need to read past the first empty line in the manifest file?

Very good point on the new bundle, that might explain the rest of the diference (depending on how much class loading occurs in the new bundle). Very good point, totally forgot about it, thank you!

As a side note - new resolver that takes into consideration "uses" caluses - will it play in the "warm" startup picture? If it is, do you have a feeling how is it performance comparing to the "old" resolver?
Comment 11 Thomas Watson CLA 2007-04-18 16:28:55 EDT
The resolver should not be run at all on a warm startup.  We load the resolver state from cache and should only resolve if new bundles are successfully uninstalled/installed/updated.

org.eclipse.update.internal.configurator.BundleManifest.parse looks like it only cares about the main section of the manifest.  It could be changed to use the method
org.eclipse.osgi.util.ManifestElement#parseBundleManifest(InputStream, Map)

I will attach a patch
Comment 12 Thomas Watson CLA 2007-04-18 16:45:37 EDT
Created attachment 64233 [details]
illustrative patch

Here is an illustrative patch to BundleManifest.  I've done no testing with this patch.  I attempted to do some testing with self-hosting but the code never gets run in that environment.  Probably because it gets the information from the platform.xml that PDE constructs.  If that is the case it makes fixing this less important IMO.  Running with -dev in the field outside of self-hosting is a corner case.
Comment 13 Oleg Besedin CLA 2007-04-18 17:39:21 EDT
Yes, just verified that BundleManifest.parse() is not triggered in the self-hosting environment.

I opened a separate bug 183071 for Update team with details but giving that this problem is not present in the normal workflow that bug could be closed as "REMIND" :-).

To summarize, if means that one of the manifest.mf reading problems is not present in the normal workflow and another (EclipseStorageHook) is going to be a constant cost with no easy way to avoid. 

I'll run some tests to see what is the real impact of extra reading in EclipseStorageHook.hasPackageInfo(). (I only have numbers that relate to the test framework for both issues combined.) From the common sense it should be small, but we'll see.
Comment 14 Oleg Besedin CLA 2007-04-18 17:42:09 EDT
One other question here is what we should do with the performance test.

- The results of tests with and without "-dev" can't be compared to each other

- Is it possible to remove the "-dev" option from the 3.2 test framework? 

It almost seems that we need to rename this test into "headless startup performance ver 2.0", make a new baseline to it and stop comparing it to the test with "-dev". However, a better startup test is needed, see bug 138927. So if we rename the test now, eventually we'll have a "headless startup performance ver 3.0". Which might be fine. 
Comment 15 Jeff McAffer CLA 2007-04-19 09:03:50 EDT
I am fine with rerunning a baseline or adding a new test.  Both amount to the same thing.  We should fix the test either way.
Comment 16 Dani Megert CLA 2007-04-19 09:17:02 EDT
A new test would allow us to see the difference. That's what I normally do if something changes.

Is JAR signing essential to all clients i.e. would it make sense to offer unsigned builds for those that prefer better performance?
Comment 17 Jeff McAffer CLA 2007-04-19 13:52:02 EDT
we will not be able to manage signed and unsigned offering of all Eclipse downloads.  People who don't care about signing but do care about the 300ms can delete the signing related files from org.eclipse.osgi.jar.  That is ok for advanced product people but for mortals its no fun (plus we are slower out of the box).  It would be interesting to see if we can resolve this with a simple classloader in the launcher.jar.  We discussed this some time ago and thought that it was too hard.  might  be useful to review and see if there is something simple we can do.
Comment 18 Oleg Besedin CLA 2007-04-20 16:24:46 EDT
===> Summary so far: <===

- The only place identified in the earlier comments that has an actual effect on the startup is Main$StartupClassLoader. Replacing it with non-secure simplified [hacked :-)] classloader reduces startup time by 0.3 sec or 30%

===> Details: <===

I did some measurements using direct instrumentation (a fancy way of saying that I added System.out.println() where needed :-)) and hacked code to get better idea of the effects of signing:

- EclipseStorageHook.hasPackageInfo - there was no statistically significant difference if reading manifest files was limited to the first empty line (there might have been a difference of less than 1% or 40 ms)

- HookRegistry.mergeFileHookConfigurators - same: there was no statistically significant difference if file reading was removed altogether and replaced by a hard-coded string (there might have been a difference of less than 1% or 40 ms)

- URLClassLoader - replacing URLClassLoader.findClass() with a hacked non-verifying classloader that assumed that classes are always in a file in a jar resulted in about 30% or 0.3 second improvement in the startup time comparing to Eclipse 3.3M3.

- with non-verifying classloader there was still about 15% degradation in the startup time in M6 comparing to M3. The profiler did not show any apparent differences in startup of (Eclipse M3) vs (Eclipse M6 with modified classloader). 


===> Comments / Rant <===

As last year, results from profilers appear to be heavily skewed comparing to runs without profilers. However, those profilers are the only tools we have.

Add to this high variability of Java. The typical standard deviation was about 10%. This means that tracking down differences of less than let's say 15% needs multiple (30+) runs and statistical processing of results.

Another problem here is the test itself. It currently includes: VM startup, launcher, OSGi initialization, update configurator, runtime startup. Each of those is a large area in itself with no single person familiar with all the code. 

If we want to do this right, we need to write a large number of smaller performance tests that include those individual pieces and sub-pieces. Ideally they should be constructed so that total startup time equals the sum of those small tests. Small tests need to be run significant (100+) number of time to provide reliable results.

As a conclusion, a fun fact I discovered today: having signed org.eclipse.equinox.launcher actually improves startup speed. Startup with unsigned org.eclipse.equinox.launcher is 0.1 sec slower than startup with the signed org.eclipse.equinox.launcher. (Run twice baseline and tests; 150 runs each time; probability that this is a random effect < 0.1% for both series.)
Comment 19 Jeff McAffer CLA 2007-04-20 21:24:34 EDT
as a point of interest, what VMs were you using for these tests?  There is a theory that the IBM VMs have an issue if you *ever* verify a JAR.  So, for example, a signed launcher.jar would result in some overhead (unknown how much) even if org.eclipse.osgi.jar signatures are not verified.
Comment 20 Oleg Besedin CLA 2007-04-23 09:46:26 EDT
I am using Sun 1.5.<close_to_latest> VM for those tests. The build machines do tests with Sun 1.4.2 VMs.

It would be interesting at some point to do startup tests with different VMs.

Another suggestion was that the difference is due to the location of the manifest.mf file in the JAR. However, in both signed and unsigned launcher JARs used in the tests, the manifest.mf was the first file in the JAR.

Comment 21 Oleg Besedin CLA 2007-04-23 17:18:15 EDT
For completeness, I'd like to cross-reference bug 171221 that describes general code patterns to avoid in regards to JAR signing.
Comment 22 Oleg Besedin CLA 2007-04-26 18:01:21 EDT
Created attachment 65121 [details]
Patch

Patch for the org.eclipse.equinox.launcher that provides "shortcut" subclass of URLClassLoader. The subclass skips Jar versification if all sources are JAR file URLs and OSGi JAR verfication option is not set.

If any of those conditions is not met or error occurs, classloader falls back to the URLClassLoader processing.

The patch inteneded more as an illustration of approach rather than something to be released right away.

The results of direct startup time measurement on my computer are:
Baseline - Eclipse M3 - 0.96 sec
Eclipse "almost" M7 - 1.20 sec (20% slower that the baseline)
Eclipse "almost" M7 with this patch - 1.0 sec (4% slower than the baseline)
Comment 23 Oleg Besedin CLA 2007-04-30 13:31:08 EDT
I guess this is time to decide if something is to be done with this problem in 3.3 time frame.

Choices seems to be:

- Accept 0.3 sec (30%) degradation in the headless startup
- Implement custom class loader along the lines of comment 22

A separate question is if we want to update 3.2 testing framework so that it does not use "-dev".
Comment 24 John Arthorne CLA 2007-04-30 13:41:59 EDT
My preference is for accepting the 300ms signing cost. It seems too late to be messing with class loaders.
Comment 25 Thomas Watson CLA 2007-04-30 14:09:02 EDT
I agree with John, it is too late to mess around here.  For example, the illustrative patch does no package defining or CodeSource defining.  Both things others may be depending on.  We had to add these features to the bundle classloader in previous releases, removing them now from the framework classloader is risky.  There may be other things we could easily overlook with such a short runway.
Comment 26 Dani Megert CLA 2007-04-30 14:43:15 EDT
+1 to not touch this for 3.3 at this point but it might be worth to be investigated for 3.4.
Comment 27 Oleg Besedin CLA 2007-04-30 16:50:14 EDT
I agree that, from the risk vs. benefits point, the "risk" portion seems to outweigh the "benefits".

For distributions where startup time is critical the workaround would be to remove signature from org.eclipse.osgi bundle.

I added a comment to the performance test linking to this bug; setting the milestone to 3.4.
Comment 28 Jeff McAffer CLA 2007-04-30 21:00:23 EDT
removing milestone and reducing priority since we are not doing this for 3.3
Comment 29 John Arthorne CLA 2009-09-08 17:12:14 EDT
3.2-3.3 performance comparison is no longer interesting. There have been two releases with related performance passes since this bug was entered.