Bug 421706 - Can't start Eclipse M3 after installing "everything"
Summary: Can't start Eclipse M3 after installing "everything"
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.10.0 Luna   Edit
Hardware: PC Linux
: P3 major (vote)
Target Milestone: Luna M6   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 421801 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-11-14 03:46 EST by David Williams CLA
Modified: 2014-03-07 15:36 EST (History)
11 users (show)

See Also:


Attachments
log from trying to install everything at candidate M3 site (95.18 KB, text/plain)
2013-11-14 03:46 EST, David Williams CLA
no flags Details
log from failed attempt to install all from "staging" (3.20 MB, text/plain)
2013-12-12 15:17 EST, David Williams CLA
no flags Details
install all and restart log without jetty overylay (6.24 KB, text/plain)
2013-12-14 01:51 EST, David Williams CLA
no flags Details
install all and restart log without jetty overylay (2.93 MB, text/plain)
2013-12-14 01:53 EST, David Williams CLA
no flags Details
install all and restart log using -clean (342.42 KB, text/plain)
2013-12-14 01:58 EST, David Williams CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2013-11-14 03:46:01 EST
Created attachment 237462 [details]
log from trying to install everything at candidate M3 site

I did an "install everything" test from candidate M3 site, starting with Eclipse SDK S-4.4M3-201310302000 and installing everything at "staging" site (except runtime components). Actually I used the repo staging is copied to for composite, if that matters.  

http://download.eclipse.org/releases/luna/201311150900/

Appeared to take an extra long time installing everything, but worse, would not startup after that, getting a "stack overflow" error (java.lang.StackOverflowError). 

Will attach log: is the one from original install everything run, and then clicking "restart now" when that was complete. 

Tried just "re-running" eclipse, but same result. 

Not sure if pure volume issue, or if there is something that is cause an infinite recursive call.
Comment 1 David Williams CLA 2013-11-14 03:51:42 EST
I did even try using -clean, but same result :) 

I've set to "major" for now, since "installing everything" is (hopefully) rare, but might be a "blocker" if found to be due to some particular (smaller set) combination of (valid) import/exports, and not just sheer volume.
Comment 2 Thomas Watson CLA 2013-11-14 08:34:00 EST
I will investigate
Comment 3 Thomas Watson CLA 2013-11-14 08:57:26 EST
Doing the install now.  I selected everything except the EclipseRT target components.  I assume that is the same as David did.  p2 detected an issue and searched for alternatives.  This took many minutes.  Then it detected that it could not install a set of things (about 7-10) but offered to install the rest.  I am letting that complete now.  But while I waited I wanted to see if David saw the same thing.
Comment 4 David Williams CLA 2013-11-14 10:05:22 EST
(In reply to Thomas Watson from comment #3)
> Doing the install now.  I selected everything except the EclipseRT target
> components.  I assume that is the same as David did.  p2 detected an issue
> and searched for alternatives.  This took many minutes.  Then it detected
> that it could not install a set of things (about 7-10) but offered to
> install the rest.  I am letting that complete now.  But while I waited I
> wanted to see if David saw the same thing.

Yes, some stuff in ACTF, I think, is "windows only" ... I was using Linux. In my original test, I was using M3 + latest I build and saw some messages in that p2 resolution that in effect said "not installing JDT, SDK (or other platform features) because a more recent version is already installed). I repeated the test with "pure" M3, just to make sure that wasn't related ... and then only saw the ACTF feature filtered out as "not installable due to filters" or similar msg.
Comment 5 Thomas Watson CLA 2013-11-14 12:49:56 EST
Here is one thing that is causing endless recursion.

http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/tree/plugins/org.eclipse.sirius/META-INF/MANIFEST.MF#n164

org.eclipse.sirius requires itself AND reexports itself!

  org.eclipse.sirius;visibility:=reexport

I have to say I have never imagined ever wanting to do that!  What in the world does it mean!
Comment 6 Thomas Watson CLA 2013-11-14 12:59:18 EST
I opened bug 421765 for the sirius manifest issue.  The framework should prevent the endless recursion, but at this point I'm not sure what should be done.  One option that seems the best to me is to fail installation of a bundle that requires itself.  I don't think the spec is clear that requiring yourself is not allowed though so that could break folks unexpectedly.
Comment 7 David Williams CLA 2013-11-14 13:12:56 EST
(In reply to Thomas Watson from comment #5)
> Here is one thing that is causing endless recursion.
> 
> http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/tree/plugins/org.
> eclipse.sirius/META-INF/MANIFEST.MF#n164
> 
> org.eclipse.sirius requires itself AND reexports itself!
> 
>   org.eclipse.sirius;visibility:=reexport
> 
> I have to say I have never imagined ever wanting to do that!  What in the
> world does it mean!

Probably just a typo, though might have been thinking of "importing what you export",

http://blog.osgi.org/2007/04/importance-of-exporting-nd-importing.html

but I believe that literally just applies to packages, not 'require bundle".
Comment 8 Pierre-Charles David CLA 2013-11-14 15:47:49 EST
A new Sirius build with the offending line removed should be available in a few minutes (see https://hudson.eclipse.org/sirius/job/sirius-master/61/). I hope this is enough to fix the immediate issue, and will investigate how we got in this situation and why it went unnoticed until now (we never had any installation problems with Sirius until now).
Comment 9 Paul Elder CLA 2013-11-15 10:03:19 EST
*** Bug 421801 has been marked as a duplicate of this bug. ***
Comment 10 Thomas Watson CLA 2013-11-15 10:14:59 EST
Just to keep folks informed.  I have a few simple options to fix the endless recursion that happens with the siruis scenario from comment 5.  But there is still other issues I need to work out for very large resolve operations.  When everything is installed at the same time we have about 2800 bundles resolving at the same time.  There are a number of uses constraint issues that are being found and attempts to solve, but the set of options have exploded the algorithm in the felix resolver.  This is taking up loads of unexpected heap and processor time for me and resulting in OOM errors now.

It is likely I will need to split the resolve into chunks instead of doing a big bang resolve.  That is what I am investigating now for M4.
Comment 11 Thomas Watson CLA 2013-11-19 15:59:35 EST
The javax.annotations bundle is really introducing lots of uses conflicts.  I finally got an approach that allows the large system to complete the resolve without running out of memory, but there are still lots of unsolvable issues when trying to apply the update to a system that includes installing javax.annotations bundle (particularly in papyrus).  This makes for some really long startup times until -clean is used to allow all importers to wire to the javax.annotations bundle.
Comment 12 Thomas Watson CLA 2013-12-03 09:56:05 EST
I released a number of fixes for M4 but need more testing and verification once M4 is closing down.  Moving to M5 for more testing and improvements.
Comment 13 Mickael Istria CLA 2013-12-05 03:21:20 EST
We noticed that org.eclipse.birt.jetty.overlay (which is a fragment contributing jetty packages to system bundle) also causes an important Heap consumption in Felix Resolver (creating a lot of org.apache.felix.resolver.Candidates) which leads to an OOM while starting Eclipse. The exact same platform without this fragment works fine.
More background at https://issues.jboss.org/browse/JBIDE-15807
Comment 14 Thomas Watson CLA 2013-12-05 08:45:13 EST
(In reply to Mickael Istria from comment #13)
> We noticed that org.eclipse.birt.jetty.overlay (which is a fragment
> contributing jetty packages to system bundle) also causes an important Heap
> consumption in Felix Resolver (creating a lot of
> org.apache.felix.resolver.Candidates) which leads to an OOM while starting
> Eclipse. The exact same platform without this fragment works fine.
> More background at https://issues.jboss.org/browse/JBIDE-15807

I opened bug 422176 to ask what this fragment is for.  I have no idea why they have a system bundle fragment that exports jetty packages.  No response from them though.  Have you tried with the latest I-Build?
Comment 15 Thomas Watson CLA 2013-12-05 09:16:32 EST
Testing updating from  I20131119-0800 I-Build or ealier to the latest I-Build I cam across an issue where several fragments ended up unresolved.  This is because the persisted meta-data for the fragments are missing the new equinox.fragment capability which is needed to locate fragments for on demand resolving.  The new equinox.fragment namespace got introduced in commit:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=4eefdb7c23063b4f79b05619160879fe61f1613a

But I neglected to make sure the meta-data we are acting upon has this new capability for fragments.  Instead of hacking in the capability in order to "fix" the persisted meta-data I decided to simply increment the version of the persisted meta-data which forces a clean operation of the osgi configuration area.

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=34dd34042093037aa0e72bcfc4a2cb1a9e316f36
Comment 16 Mickael Istria CLA 2013-12-05 10:23:50 EST
(In reply to Thomas Watson from comment #14)
> Have you tried with the latest I-Build?

Using I20131203-0800 and out target platform: Heaps goes to 1.2 GB when org.eclipse.birt.jetty.overlay is present vs 350MB when it's not.
Comment 17 Thomas Watson CLA 2013-12-09 15:53:03 EST
(In reply to Mickael Istria from comment #16)
> (In reply to Thomas Watson from comment #14)
> > Have you tried with the latest I-Build?
> 
> Using I20131203-0800 and out target platform: Heaps goes to 1.2 GB when
> org.eclipse.birt.jetty.overlay is present vs 350MB when it's not.

I tested out your scenario and found a couple more bugs, but not sure it will reduce the overall heap required here or not:

There was a bug in the felix code that would discard capabilities from fragments in some cases.  Fixed with:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=4eb5b1a47e314d9d73239d294360b427bd946e57

With that felix bug fix I had to fix a bug in equinox code that was returning "resolved" hosts for already resoled fragments which really messes with the felix resolver.  Fixed with:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=ca804e697a08ccbf8f2b1e206b33a3334e3fa4da

With these two fixes I don't see 1.2 GB being used, but I have not done real measurements of the heap.  Only going off the activity monitor on Mac.
Comment 18 Mickael Istria CLA 2013-12-10 03:32:03 EST
Thanks Thomas. Ping me when you'd like me to run the same scenario on a newer build.
FYI, I use VisualVM to monitor the Heap Size when application is running.
Comment 19 Thomas Watson CLA 2013-12-10 08:45:22 EST
(In reply to Mickael Istria from comment #18)
> Thanks Thomas. Ping me when you'd like me to run the same scenario on a
> newer build.
> FYI, I use VisualVM to monitor the Heap Size when application is running.

It would be great if you could try on the latest I-Build I20131209-2000.  But I think the heap will likely still grow to resolve org.eclipse.birt.jetty.overlay.  But it should get GC'ed after the resolve operation finishes.  At least that is what I found with your jboss tools scenario.

I could solve part of that issue if I allowed unresolved providers to get preferred over resolved ones with lower versions.  This would correctly wire most importers to the real jetty bundles instead of the strange birt.jetty.overlay one.  But this would go against specification and also hurt in scenarios with bundles that have substitutable exports (export and import the same package).
Comment 20 Mickael Istria CLA 2013-12-10 10:07:39 EST
(In reply to Thomas Watson from comment #19)
> It would be great if you could try on the latest I-Build I20131209-2000. 
> But I think the heap will likely still grow to resolve
> org.eclipse.birt.jetty.overlay.  But it should get GC'ed after the resolve
> operation finishes.  At least that is what I found with your jboss tools
> scenario.

I just tried it and had the same behaviour. Because application was taking 1.2GB of RAM, it was too slow and I didn't have time to let it continue until if calls the Garbage Collector.
Without jetty.overlay, still 330MB consumed (which is acceptable given the amount of stuff in the target application).
Comment 21 David Williams CLA 2013-12-12 15:17:23 EST
Created attachment 238302 [details]
log from failed attempt to install all from "staging"

This test/run/log may not be that useful, since it is just against "staging" repository for M4 (i.e. not everything is "up to date" ... for example, this "staging repo" still has the "jetty overlay" in it) but using our M4 candidate, I20131211-2000, eclipse still won't start after "installing everything". 

But, thought I'd attach the results here in case any of the error messages in the log are useful to you to spot other problems under "extreme conditions".   

I'll try again once "jetty overlay" is no longer present. (Also, I did not try using -clean ... just wanted to try a "quick test" ... but, at least, no "stack overflow" -- in fact, not sure why it did not start ... seemed something ended up interfering with the framework itself?).
Comment 22 Mickael Istria CLA 2013-12-13 03:22:04 EST
@David: the issue you see in log aren't Equinox issue caused by jetty.overlay, but more inconsistency in some projects (namely EGF and JWT).
Comment 23 Thomas Watson CLA 2013-12-13 08:40:23 EST
(In reply to Mickael Istria from comment #22)
> @David: the issue you see in log aren't Equinox issue caused by
> jetty.overlay, but more inconsistency in some projects (namely EGF and JWT).

I'm curious to know what the inconsistencies are that cause the issue.  Do you have some more insight?  There are lots of class not found for internals from the old framework.  Namely AbstractBundle:

For example:

java.lang.NoClassDefFoundError: org/eclipse/osgi/framework/internal/core/AbstractBundle
	at org.eclipse.egf.core.platform.internal.pde.PlatformBundle.<init>(PlatformBundle.java:60)
Comment 24 David Williams CLA 2013-12-14 01:51:29 EST
Created attachment 238347 [details]
install all and restart log without jetty overylay

jetty*overlay no longer seems to be in .../releases/staging, and the log file is not much better (and Eclipse still won't start) after "installing everything". 

(And, I know, "M4 is not done" ... but in case some "gross" errors can be spotted to be sure fixed in M4 would be good). 

[I can see a few minor things to open bugs on (such as stardust singleton) ... but all the "wiring traces" are hard to read.]
Comment 25 David Williams CLA 2013-12-14 01:53:42 EST
Created attachment 238348 [details]
install all and restart log without jetty overylay

apologies ... previous was from wrong directory ... this is the one I meant to attach.
Comment 26 David Williams CLA 2013-12-14 01:58:39 EST
Created attachment 238349 [details]
install all and restart log using -clean

This is same scenario and install as previous "long" log, but started eclipse with -clean. At least the log is shorter ... maybe it will be easier to understand and "attack". Unfortunately, what ever is going wrong still prevents Eclipse from starting! 

Let me know if I can help further in any way. (Even if, you find these useless and want me to stop attaching them :)
Comment 27 Thomas Watson CLA 2013-12-16 11:44:05 EST
(In reply to David Williams from comment #26)
> Created attachment 238349 [details]
> install all and restart log using -clean
> 
> This is same scenario and install as previous "long" log, but started
> eclipse with -clean. At least the log is shorter ... maybe it will be easier
> to understand and "attack". Unfortunately, what ever is going wrong still
> prevents Eclipse from starting! 
> 
> Let me know if I can help further in any way. (Even if, you find these
> useless and want me to stop attaching them :)

No it is useful, just may take me time to get to investigating it all.  I did find that modisco has some bad reprovide=true attribute (bug 424150) that is causing many of the resolver errors in modisco.
Comment 28 Thomas Watson CLA 2013-12-16 12:06:54 EST
I opened bug 424151 to document/discuss the fact that some interim headers/attributes are no longer supported in Luna.  This is causing many of the resolution issues.
Comment 29 Thomas Watson CLA 2014-03-07 15:36:40 EST
This one should be fixed now.