Bug 457118 - OutOfMemoryError (Java Heap Space) when resolving bundles
Summary: OutOfMemoryError (Java Heap Space) when resolving bundles
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.10.0 Luna   Edit
Hardware: All All
: P3 blocker (vote)
Target Milestone: Mars M5   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords: greatbug
Depends on:
Blocks: 457718
  Show dependency tree
 
Reported: 2015-01-09 04:34 EST by Matthieu Helleboid CLA
Modified: 2015-04-07 04:13 EDT (History)
4 users (show)

See Also:


Attachments
Time and performance measurements (17.45 KB, image/png)
2015-01-15 05:03 EST, Matthieu Helleboid CLA
no flags Details
Time and memory measurements (5.79 KB, application/unknown)
2015-01-16 05:41 EST, Matthieu Helleboid CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthieu Helleboid CLA 2015-01-09 04:34:57 EST
Hello, 

When trying to start Capella (https://polarsys.org/capella/) on Luna we encounter OutOfMemoryError (please see the stacktrace below). This problem doesn't occur on Kepler and previous versions. It seems that there is too many blames objects generated in our case. After some research it seems to be related to https://issues.apache.org/jira/browse/FELIX-3465.

You can reproduce the problem with a Capella installation for Luna from here : https://hudson.polarsys.org/capella/job/capella-gerrit/81/artifact/result/publish/

!ENTRY org.eclipse.equinox.simpleconfigurator 4 0 2015-01-09 09:56:32.376
!MESSAGE FrameworkEvent ERROR
!STACK 0
org.osgi.framework.BundleException: Exception in org.eclipse.equinox.internal.simpleconfigurator.Activator.start() of bundle org.eclipse.equinox.simpleconfigurator.
	at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:792)
	at org.eclipse.osgi.internal.framework.BundleContextImpl.start(BundleContextImpl.java:721)
	at org.eclipse.osgi.internal.framework.EquinoxBundle.startWorker0(EquinoxBundle.java:936)
	at org.eclipse.osgi.internal.framework.EquinoxBundle$EquinoxModule.startWorker(EquinoxBundle.java:319)
	at org.eclipse.osgi.container.Module.doStart(Module.java:571)
	at org.eclipse.osgi.container.Module.start(Module.java:439)
	at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1582)
	at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1562)
	at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1533)
	at org.eclipse.osgi.container.SystemModule.startWorker(SystemModule.java:242)
	at org.eclipse.osgi.container.Module.doStart(Module.java:571)
	at org.eclipse.osgi.container.Module.start(Module.java:439)
	at org.eclipse.osgi.container.SystemModule.start(SystemModule.java:172)
	at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:393)
	at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:412)
	at org.eclipse.osgi.launch.Equinox.start(Equinox.java:115)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.startup(EclipseStarter.java:318)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:231)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:648)
	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:603)
	at org.eclipse.equinox.launcher.Main.run(Main.java:1465)
	at org.eclipse.equinox.launcher.Main.main(Main.java:1438)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackage(ResolverImpl.java:943)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:865)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889)
	at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:733)
	at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741)
	at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741)
	at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741)
	at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741)
	at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741)
	at org.apache.felix.resolver.ResolverImpl.resolve(ResolverImpl.java:251)
	at org.eclipse.osgi.container.ModuleResolver$ResolveProcess.resolveSingleRevision(ModuleResolver.java:948)
	at org.eclipse.osgi.container.ModuleResolver$ResolveProcess.resolve(ModuleResolver.java:878)
	at org.eclipse.osgi.container.ModuleResolver.resolveDelta(ModuleResolver.java:111)
	at org.eclipse.osgi.container.ModuleContainer.resolveAndApply(ModuleContainer.java:479)
	at org.eclipse.osgi.container.ModuleContainer.resolve(ModuleContainer.java:437)
	at org.eclipse.osgi.container.ModuleContainer.resolve(ModuleContainer.java:427)
	at org.eclipse.osgi.container.Module.start(Module.java:416)
	at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:393)
	at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:412)
	at org.eclipse.equinox.internal.simpleconfigurator.ConfigApplier.startBundles(ConfigApplier.java:438)
	at org.eclipse.equinox.internal.simpleconfigurator.ConfigApplier.install(ConfigApplier.java:111)
	at org.eclipse.equinox.internal.simpleconfigurator.SimpleConfiguratorImpl.applyConfiguration(SimpleConfiguratorImpl.java:191)
	at org.eclipse.equinox.internal.simpleconfigurator.SimpleConfiguratorImpl.applyConfiguration(SimpleConfiguratorImpl.java:205)
Comment 1 Matthieu Helleboid CLA 2015-01-09 04:40:39 EST
Quick workaround to avoid OOMError, but the resolution is still very
slow : https://git.eclipse.org/r/#/c/39271/

Should I also open a bug on Felix Apache?
Comment 2 Thomas Watson CLA 2015-01-09 08:09:54 EST
(In reply to Matthieu Helleboid from comment #1)
> Quick workaround to avoid OOMError, but the resolution is still very
> slow : https://git.eclipse.org/r/#/c/39271/
> 
> Should I also open a bug on Felix Apache?

Yes, please do, it may be related to https://issues.apache.org/jira/browse/FELIX-4656
Comment 3 Matthieu Helleboid CLA 2015-01-09 10:43:19 EST
Ok, will do.

I tried to apply all changes from https://github.com/gnodet/felix/commits/resolver-improvements but it doesn't seem to solve the problem.
Comment 4 Matthieu Helleboid CLA 2015-01-09 10:54:28 EST
I created the bug https://issues.apache.org/jira/browse/FELIX-4762
Comment 5 Matthieu Helleboid CLA 2015-01-12 13:03:33 EST
The performance problem is also related to changes from Bug #421706
I think the difference is mostly caused by the introduction of the ModuleResolver.ResolveProcess.resolveSingleRevision method that create a ResolverImpl for each Module
* I tried to apply my patch from commit 618982b49d72803c2479a461aff1b60d97acd63b (before Bug #421706 commits) : 
  (see https://git.eclipse.org/r/#/c/39271/ PatchSet 2), the application starts in 76sec (Starting application: 76654)
* Then I tried to apply my patch from commit ca804e697a08ccbf8f2b1e206b33a3334e3fa4da ((after Bug #421706 commits) : 
  (see https://git.eclipse.org/r/#/c/39271/ PatchSet 3), the application starts in 491sec (Starting application: 491628)

Everything seems to be related to resolverImpl packages consitency check
* I also pushed https://git.eclipse.org/r/#/c/39271/ PatchSet 4 where there is no packages consitency check in resolverImpl
(see https://git.eclipse.org/r/#/c/39271/ PatchSet 4), the application starts in 5sec (Starting application: 5187)
Comment 6 Matthieu Helleboid CLA 2015-01-15 05:03:05 EST
Created attachment 249961 [details]
Time and performance measurements

Here is a list of the changes I made and propose to merge in master :

Patch 1 : SetInsteadOfList
	initial patch to avoid OOM Error
	you can forget about this one
Patch 2 : avoidMergeDuplicateExportedPackages 
	replace Patch 1
	contributed to https://issues.apache.org/jira/browse/FELIX-4762
	contributed to https://git.eclipse.org/r/#/c/39271/ (last PatchSet)
Patch 3 : ResolveRevisionsInBatch
	to improve resolution performance
	related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706
	contributed to https://git.eclipse.org/r/#/c/39644/

I also made some tests on memory and time to resolve bundles (see timeAndMemory.png)
First with Capella as a target platform for my problem, and the with all luna plugins as a target platform to avoid regression to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706
I think that Patch2 and Patch3 will solve my problems.
I would also be a very good thing if Patch2 and Patch3 could be merged on R4_4_maintenance for Luna SR1, is it possible?
Comment 7 Thomas Watson CLA 2015-01-15 08:34:20 EST
(In reply to Matthieu Helleboid from comment #6)
> R4_4_maintenance for Luna SR1, is it possible?

I will need time to review the changes.  The fix would have to be Luna SR2 (which I assume you meant).
Comment 8 Matthieu Helleboid CLA 2015-01-15 08:37:23 EST
Yes of course, I meant Luna SR2 :-)
Comment 9 Thomas Watson CLA 2015-01-15 10:18:13 EST
(In reply to Matthieu Helleboid from comment #6)
> Patch 2 : avoidMergeDuplicateExportedPackages 
> 	replace Patch 1
> 	contributed to https://issues.apache.org/jira/browse/FELIX-4762
> 	contributed to https://git.eclipse.org/r/#/c/39271/ (last PatchSet)

It seems this is solving a case where you are resolving a set of bundles with a require-bundle cycle.  Is that true?

I am wondering why the other cycle detection code in the following method is not working in this case:

ResolverImpl.mergeCandidatePackages(ResolveContext, Resource, Requirement, Capability, Map<Resource, Packages>, Candidates, Map<Resource, List<Capability>>, HashMap<Resource, List<Resource>>)
Comment 10 Matthieu Helleboid CLA 2015-01-15 10:27:36 EST
No it does not solve requires bundles cycle. But if you have
A requires B + A requires C + B requires D + C requires D, like the following
   +-->B--+
A--+      +-->D
   +-->C--+
Then with the mergeCandidatePackage method, you'll have duplicate blames for D, and that's what will cause the OOMError in my case.
Comment 11 Thomas Watson CLA 2015-01-15 11:52:42 EST
(In reply to Matthieu Helleboid from comment #10)
> No it does not solve requires bundles cycle. But if you have
> A requires B + A requires C + B requires D + C requires D, like the following
>    +-->B--+
> A--+      +-->D
>    +-->C--+
> Then with the mergeCandidatePackage method, you'll have duplicate blames for
> D, and that's what will cause the OOMError in my case.

Thanks for the info.  In order to trigger the code I had to add another level to the dependency tree and I had to use re-export for each level of the dependency:

       +-->C--+
A-->B--+      +-->E
       +-->D--+

Where :
 B re-exports C and D
 C re-exports E
 D re-exports E

In this scenario I would see duplicate blames for package E with the blamed requirement A->B

The fix looks good to me, but the variable names mergeExportedPackagesCycles and mergeExportedPackagesCyclesList are misleading.  Perhaps a name like visitedRequiredBundlesMap and visitedRequiredBundles for the list?
Comment 12 Matthieu Helleboid CLA 2015-01-15 11:57:05 EST
Ok great, you're right my description wasn't complete!

I wasn't very satisfied with the mergeExportedPackagesCycles name, I will follow your suggestion and update the gerrit commit and the patch at apache.
Comment 13 Matthieu Helleboid CLA 2015-01-15 12:31:12 EST
(In reply to Matthieu Helleboid from comment #12)
> I wasn't very satisfied with the mergeExportedPackagesCycles name, I will
> follow your suggestion and update the gerrit commit and the patch at apache.

done.

I took the liberty to quote you on https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be applied also on felix/resolver
Comment 14 Thomas Watson CLA 2015-01-15 13:17:11 EST
(In reply to Matthieu Helleboid from comment #13)
> I took the liberty to quote you on
> https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be
> applied also on felix/resolver

No problem, I also reviewed the fix with Richard Hall so he would apply it to Felix.  I will merge that fix into master soon, but want to review your other fix to the equinox bit first.

Marking bug as 'greatbug'.  Really appreciate your work on this one.  The resolver code in felix and equinox is not the most simple thing to get into.
Comment 15 Thomas Watson CLA 2015-01-15 14:02:29 EST
(In reply to Matthieu Helleboid from comment #6)
> Patch 3 : ResolveRevisionsInBatch
> 	to improve resolution performance
> 	related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706
> 	contributed to https://git.eclipse.org/r/#/c/39644/
> 

My concern about this patch is that 100 may prove to be too big for the 'batch' resolve and increase our likelihood of blowing up the felix resolver.  Did you try other batch counts to see where the balance is?
Comment 16 Matthieu Helleboid CLA 2015-01-15 14:30:21 EST
(In reply to Thomas Watson from comment #15)
> (In reply to Matthieu Helleboid from comment #6)
> > Patch 3 : ResolveRevisionsInBatch
> > 	to improve resolution performance
> > 	related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706
> > 	contributed to https://git.eclipse.org/r/#/c/39644/
> > 
> 
> My concern about this patch is that 100 may prove to be too big for the
> 'batch' resolve and increase our likelihood of blowing up the felix
> resolver.  Did you try other batch counts to see where the balance is?

I tried very quickly 1, 10 and 100 as values and in my case 100 was a better value. I can try different values tomorrow and come back to you with more data next week if you want.

The other thing is that the first resolution will always try to resolve RESOLVE_REVISIONS_BATCH_SIZE bundles, but as resolution goes, the next tries will resolve less bundles because there were already resolved. See "if (wirings.containsKey(single) || failedToResolve.contains(single)) revisions.remove(single);"

Maybe I'll change this to always "send" the same number of bundles to the Felix resolver.
Comment 17 Matthieu Helleboid CLA 2015-01-15 14:30:52 EST
(In reply to Thomas Watson from comment #14)
> (In reply to Matthieu Helleboid from comment #13)
> > I took the liberty to quote you on
> > https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be
> > applied also on felix/resolver
> 
> No problem, I also reviewed the fix with Richard Hall so he would apply it
> to Felix.  I will merge that fix into master soon, but want to review your
> other fix to the equinox bit first.
> 
> Marking bug as 'greatbug'.  Really appreciate your work on this one.  The
> resolver code in felix and equinox is not the most simple thing to get into.

Thanks also to you :-)
Comment 18 Thomas Watson CLA 2015-01-15 14:35:15 EST
I put some additional comments in gerrit review https://git.eclipse.org/r/#/c/39644/  

Sorry for being inconsistent on my ways to provide feedback.
Comment 19 Matthieu Helleboid CLA 2015-01-16 05:41:18 EST
Created attachment 249987 [details]
Time and memory measurements

I took your comments into account and pushed a new patchSet https://git.eclipse.org/r/#/c/39644/ 

I made some more tests about performance, and finally "RESOLVE_REVISIONS_BATCH_SIZE = 100" seems like a good tradeoff for time and memory.
Comment 20 Matthieu Helleboid CLA 2015-01-16 09:01:07 EST
Do you plan to backport these modifications to R4_4_maintenance for Luna SR2? 

By looking at https://bugs.eclipse.org/bugs/attachment.cgi?id=249987 I think that 100 is a good value. But for Luna SR2, maybe it would be a good thing to offer a runtime parameter to change value and keep 100 as a defaut value.
Comment 21 Thomas Watson CLA 2015-01-16 09:47:21 EST
(In reply to Matthieu Helleboid from comment #13)
> (In reply to Matthieu Helleboid from comment #12)
> > I wasn't very satisfied with the mergeExportedPackagesCycles name, I will
> > follow your suggestion and update the gerrit commit and the patch at apache.
> 
> done.
> 
> I took the liberty to quote you on
> https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be
> applied also on felix/resolver

The Felix resolver change has been released to Equinox:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=e7db81bab4bce237fcafd3d624e56d183bbd6dae

(In reply to Matthieu Helleboid from comment #19)
> Created attachment 249987 [details]
> Time and memory measurements
> 
> I took your comments into account and pushed a new patchSet
> https://git.eclipse.org/r/#/c/39644/ 
> 
> I made some more tests about performance, and finally
> "RESOLVE_REVISIONS_BATCH_SIZE = 100" seems like a good tradeoff for time and
> memory.

This change has been released to Equinox as:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=16bb483bd75d665c3ff6a554419ddf8ad93bab57

I also did a few modifications and added a simple test for the multiple path case that caused duplicates to be added.  The test is not complete, I need to add to it to force a conflict that will ensure other paths are used if the first solution is not valid.  Leaving bug open for that.
Comment 22 Thomas Watson CLA 2015-01-16 09:49:31 EST
(In reply to Matthieu Helleboid from comment #20)
> Do you plan to backport these modifications to R4_4_maintenance for Luna
> SR2? 
> 
> By looking at https://bugs.eclipse.org/bugs/attachment.cgi?id=249987 I think
> that 100 is a good value. But for Luna SR2, maybe it would be a good thing
> to offer a runtime parameter to change value and keep 100 as a defaut value.

I already did add a new configuration option to control that (see my last comment :)

For Luna, I am worried about the Equinox resolver changes.  They look good, but I fear we do not have full confidence to release the change.  I know it is not ideal, but would it be sufficient to just go with the felix resolver changes in Luna SR2?  It would result in a long initial startup, but we do cache the results so subsequent starts would not be slowed down by resolution time.
Comment 24 Matthieu Helleboid CLA 2015-01-16 10:02:45 EST
(In reply to Thomas Watson from comment #21)
> The Felix resolver change has been released to Equinox:
> 
> I also did a few modifications and added a simple test for the multiple path
> case that caused duplicates to be added.  The test is not complete, I need
> to add to it to force a conflict that will ensure other paths are used if
> the first solution is not valid.  Leaving bug open for that.

Thanks a lot for your time and reactivity!

(In reply to Thomas Watson from comment #22)
> I already did add a new configuration option to control that (see my last
> comment :)

Ok perfect :)

> For Luna, I am worried about the Equinox resolver changes.  They look good,
> but I fear we do not have full confidence to release the change.  I know it
> is not ideal, but would it be sufficient to just go with the felix resolver
> changes in Luna SR2?  It would result in a long initial startup, but we do
> cache the results so subsequent starts would not be slowed down by
> resolution time.

I can understand. Maybe a solution would be to have a default value "1" for RESOLVE_REVISIONS_BATCH_SIZE in Lune SR2, so the behavior would be the same as Luna SR1 in the default case.
Anyway, you're right the cache will do the work for subsequent starts.
Comment 25 Thomas Watson CLA 2015-01-16 14:20:45 EST
(In reply to Matthieu Helleboid from comment #24)
> I can understand. Maybe a solution would be to have a default value "1" for
> RESOLVE_REVISIONS_BATCH_SIZE in Lune SR2, so the behavior would be the same
> as Luna SR1 in the default case.
> Anyway, you're right the cache will do the work for subsequent starts.

After getting burned badly in Luna SR1 [1] I am inclined to avoid the unnecessary changes in order to keep the risk as low as possible.


[1] see bug 445122
Comment 26 Thomas Watson CLA 2015-01-16 16:12:04 EST
Finished test off with a case that would force the resolution to back off a require-bundle decision for the case where we used to have duplicate blames.

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=b57c97c7177699e33baca80cd305e20b17153526
Comment 27 Marc-André Laperle CLA 2015-02-20 01:30:39 EST
Hi! I just want to let you know that the fix for this bug in the master branch (Mars) seems to have caused a regression (infinite loop), see bug 460393.
Comment 28 Martin Halle CLA 2015-04-07 04:13:13 EDT
I am badly struggling with this bug on Luna SR2.

I read the comments carefully and it seems there will be no easy Luna fix/update.

What can I do to have my project working again? A downgrade to Kepler is not possible for me. Any instructions of how to bypass this would be REALLY helpful!