Community
Participate
Working Groups
Hello, When trying to start Capella (https://polarsys.org/capella/) on Luna we encounter OutOfMemoryError (please see the stacktrace below). This problem doesn't occur on Kepler and previous versions. It seems that there is too many blames objects generated in our case. After some research it seems to be related to https://issues.apache.org/jira/browse/FELIX-3465. You can reproduce the problem with a Capella installation for Luna from here : https://hudson.polarsys.org/capella/job/capella-gerrit/81/artifact/result/publish/ !ENTRY org.eclipse.equinox.simpleconfigurator 4 0 2015-01-09 09:56:32.376 !MESSAGE FrameworkEvent ERROR !STACK 0 org.osgi.framework.BundleException: Exception in org.eclipse.equinox.internal.simpleconfigurator.Activator.start() of bundle org.eclipse.equinox.simpleconfigurator. at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:792) at org.eclipse.osgi.internal.framework.BundleContextImpl.start(BundleContextImpl.java:721) at org.eclipse.osgi.internal.framework.EquinoxBundle.startWorker0(EquinoxBundle.java:936) at org.eclipse.osgi.internal.framework.EquinoxBundle$EquinoxModule.startWorker(EquinoxBundle.java:319) at org.eclipse.osgi.container.Module.doStart(Module.java:571) at org.eclipse.osgi.container.Module.start(Module.java:439) at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1582) at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1562) at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1533) at org.eclipse.osgi.container.SystemModule.startWorker(SystemModule.java:242) at org.eclipse.osgi.container.Module.doStart(Module.java:571) at org.eclipse.osgi.container.Module.start(Module.java:439) at org.eclipse.osgi.container.SystemModule.start(SystemModule.java:172) at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:393) at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:412) at org.eclipse.osgi.launch.Equinox.start(Equinox.java:115) at org.eclipse.core.runtime.adaptor.EclipseStarter.startup(EclipseStarter.java:318) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:231) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:648) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:603) at org.eclipse.equinox.launcher.Main.run(Main.java:1465) at org.eclipse.equinox.launcher.Main.main(Main.java:1438) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackage(ResolverImpl.java:943) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:865) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.mergeCandidatePackages(ResolverImpl.java:889) at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:733) at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741) at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741) at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741) at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741) at org.apache.felix.resolver.ResolverImpl.calculatePackageSpaces(ResolverImpl.java:741) at org.apache.felix.resolver.ResolverImpl.resolve(ResolverImpl.java:251) at org.eclipse.osgi.container.ModuleResolver$ResolveProcess.resolveSingleRevision(ModuleResolver.java:948) at org.eclipse.osgi.container.ModuleResolver$ResolveProcess.resolve(ModuleResolver.java:878) at org.eclipse.osgi.container.ModuleResolver.resolveDelta(ModuleResolver.java:111) at org.eclipse.osgi.container.ModuleContainer.resolveAndApply(ModuleContainer.java:479) at org.eclipse.osgi.container.ModuleContainer.resolve(ModuleContainer.java:437) at org.eclipse.osgi.container.ModuleContainer.resolve(ModuleContainer.java:427) at org.eclipse.osgi.container.Module.start(Module.java:416) at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:393) at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:412) at org.eclipse.equinox.internal.simpleconfigurator.ConfigApplier.startBundles(ConfigApplier.java:438) at org.eclipse.equinox.internal.simpleconfigurator.ConfigApplier.install(ConfigApplier.java:111) at org.eclipse.equinox.internal.simpleconfigurator.SimpleConfiguratorImpl.applyConfiguration(SimpleConfiguratorImpl.java:191) at org.eclipse.equinox.internal.simpleconfigurator.SimpleConfiguratorImpl.applyConfiguration(SimpleConfiguratorImpl.java:205)
Quick workaround to avoid OOMError, but the resolution is still very slow : https://git.eclipse.org/r/#/c/39271/ Should I also open a bug on Felix Apache?
(In reply to Matthieu Helleboid from comment #1) > Quick workaround to avoid OOMError, but the resolution is still very > slow : https://git.eclipse.org/r/#/c/39271/ > > Should I also open a bug on Felix Apache? Yes, please do, it may be related to https://issues.apache.org/jira/browse/FELIX-4656
Ok, will do. I tried to apply all changes from https://github.com/gnodet/felix/commits/resolver-improvements but it doesn't seem to solve the problem.
I created the bug https://issues.apache.org/jira/browse/FELIX-4762
The performance problem is also related to changes from Bug #421706 I think the difference is mostly caused by the introduction of the ModuleResolver.ResolveProcess.resolveSingleRevision method that create a ResolverImpl for each Module * I tried to apply my patch from commit 618982b49d72803c2479a461aff1b60d97acd63b (before Bug #421706 commits) : (see https://git.eclipse.org/r/#/c/39271/ PatchSet 2), the application starts in 76sec (Starting application: 76654) * Then I tried to apply my patch from commit ca804e697a08ccbf8f2b1e206b33a3334e3fa4da ((after Bug #421706 commits) : (see https://git.eclipse.org/r/#/c/39271/ PatchSet 3), the application starts in 491sec (Starting application: 491628) Everything seems to be related to resolverImpl packages consitency check * I also pushed https://git.eclipse.org/r/#/c/39271/ PatchSet 4 where there is no packages consitency check in resolverImpl (see https://git.eclipse.org/r/#/c/39271/ PatchSet 4), the application starts in 5sec (Starting application: 5187)
Created attachment 249961 [details] Time and performance measurements Here is a list of the changes I made and propose to merge in master : Patch 1 : SetInsteadOfList initial patch to avoid OOM Error you can forget about this one Patch 2 : avoidMergeDuplicateExportedPackages replace Patch 1 contributed to https://issues.apache.org/jira/browse/FELIX-4762 contributed to https://git.eclipse.org/r/#/c/39271/ (last PatchSet) Patch 3 : ResolveRevisionsInBatch to improve resolution performance related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706 contributed to https://git.eclipse.org/r/#/c/39644/ I also made some tests on memory and time to resolve bundles (see timeAndMemory.png) First with Capella as a target platform for my problem, and the with all luna plugins as a target platform to avoid regression to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706 I think that Patch2 and Patch3 will solve my problems. I would also be a very good thing if Patch2 and Patch3 could be merged on R4_4_maintenance for Luna SR1, is it possible?
(In reply to Matthieu Helleboid from comment #6) > R4_4_maintenance for Luna SR1, is it possible? I will need time to review the changes. The fix would have to be Luna SR2 (which I assume you meant).
Yes of course, I meant Luna SR2 :-)
(In reply to Matthieu Helleboid from comment #6) > Patch 2 : avoidMergeDuplicateExportedPackages > replace Patch 1 > contributed to https://issues.apache.org/jira/browse/FELIX-4762 > contributed to https://git.eclipse.org/r/#/c/39271/ (last PatchSet) It seems this is solving a case where you are resolving a set of bundles with a require-bundle cycle. Is that true? I am wondering why the other cycle detection code in the following method is not working in this case: ResolverImpl.mergeCandidatePackages(ResolveContext, Resource, Requirement, Capability, Map<Resource, Packages>, Candidates, Map<Resource, List<Capability>>, HashMap<Resource, List<Resource>>)
No it does not solve requires bundles cycle. But if you have A requires B + A requires C + B requires D + C requires D, like the following +-->B--+ A--+ +-->D +-->C--+ Then with the mergeCandidatePackage method, you'll have duplicate blames for D, and that's what will cause the OOMError in my case.
(In reply to Matthieu Helleboid from comment #10) > No it does not solve requires bundles cycle. But if you have > A requires B + A requires C + B requires D + C requires D, like the following > +-->B--+ > A--+ +-->D > +-->C--+ > Then with the mergeCandidatePackage method, you'll have duplicate blames for > D, and that's what will cause the OOMError in my case. Thanks for the info. In order to trigger the code I had to add another level to the dependency tree and I had to use re-export for each level of the dependency: +-->C--+ A-->B--+ +-->E +-->D--+ Where : B re-exports C and D C re-exports E D re-exports E In this scenario I would see duplicate blames for package E with the blamed requirement A->B The fix looks good to me, but the variable names mergeExportedPackagesCycles and mergeExportedPackagesCyclesList are misleading. Perhaps a name like visitedRequiredBundlesMap and visitedRequiredBundles for the list?
Ok great, you're right my description wasn't complete! I wasn't very satisfied with the mergeExportedPackagesCycles name, I will follow your suggestion and update the gerrit commit and the patch at apache.
(In reply to Matthieu Helleboid from comment #12) > I wasn't very satisfied with the mergeExportedPackagesCycles name, I will > follow your suggestion and update the gerrit commit and the patch at apache. done. I took the liberty to quote you on https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be applied also on felix/resolver
(In reply to Matthieu Helleboid from comment #13) > I took the liberty to quote you on > https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be > applied also on felix/resolver No problem, I also reviewed the fix with Richard Hall so he would apply it to Felix. I will merge that fix into master soon, but want to review your other fix to the equinox bit first. Marking bug as 'greatbug'. Really appreciate your work on this one. The resolver code in felix and equinox is not the most simple thing to get into.
(In reply to Matthieu Helleboid from comment #6) > Patch 3 : ResolveRevisionsInBatch > to improve resolution performance > related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706 > contributed to https://git.eclipse.org/r/#/c/39644/ > My concern about this patch is that 100 may prove to be too big for the 'batch' resolve and increase our likelihood of blowing up the felix resolver. Did you try other batch counts to see where the balance is?
(In reply to Thomas Watson from comment #15) > (In reply to Matthieu Helleboid from comment #6) > > Patch 3 : ResolveRevisionsInBatch > > to improve resolution performance > > related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=421706 > > contributed to https://git.eclipse.org/r/#/c/39644/ > > > > My concern about this patch is that 100 may prove to be too big for the > 'batch' resolve and increase our likelihood of blowing up the felix > resolver. Did you try other batch counts to see where the balance is? I tried very quickly 1, 10 and 100 as values and in my case 100 was a better value. I can try different values tomorrow and come back to you with more data next week if you want. The other thing is that the first resolution will always try to resolve RESOLVE_REVISIONS_BATCH_SIZE bundles, but as resolution goes, the next tries will resolve less bundles because there were already resolved. See "if (wirings.containsKey(single) || failedToResolve.contains(single)) revisions.remove(single);" Maybe I'll change this to always "send" the same number of bundles to the Felix resolver.
(In reply to Thomas Watson from comment #14) > (In reply to Matthieu Helleboid from comment #13) > > I took the liberty to quote you on > > https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be > > applied also on felix/resolver > > No problem, I also reviewed the fix with Richard Hall so he would apply it > to Felix. I will merge that fix into master soon, but want to review your > other fix to the equinox bit first. > > Marking bug as 'greatbug'. Really appreciate your work on this one. The > resolver code in felix and equinox is not the most simple thing to get into. Thanks also to you :-)
I put some additional comments in gerrit review https://git.eclipse.org/r/#/c/39644/ Sorry for being inconsistent on my ways to provide feedback.
Created attachment 249987 [details] Time and memory measurements I took your comments into account and pushed a new patchSet https://git.eclipse.org/r/#/c/39644/ I made some more tests about performance, and finally "RESOLVE_REVISIONS_BATCH_SIZE = 100" seems like a good tradeoff for time and memory.
Do you plan to backport these modifications to R4_4_maintenance for Luna SR2? By looking at https://bugs.eclipse.org/bugs/attachment.cgi?id=249987 I think that 100 is a good value. But for Luna SR2, maybe it would be a good thing to offer a runtime parameter to change value and keep 100 as a defaut value.
(In reply to Matthieu Helleboid from comment #13) > (In reply to Matthieu Helleboid from comment #12) > > I wasn't very satisfied with the mergeExportedPackagesCycles name, I will > > follow your suggestion and update the gerrit commit and the patch at apache. > > done. > > I took the liberty to quote you on > https://issues.apache.org/jira/browse/FELIX-4762 and the patch seems to be > applied also on felix/resolver The Felix resolver change has been released to Equinox: http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=e7db81bab4bce237fcafd3d624e56d183bbd6dae (In reply to Matthieu Helleboid from comment #19) > Created attachment 249987 [details] > Time and memory measurements > > I took your comments into account and pushed a new patchSet > https://git.eclipse.org/r/#/c/39644/ > > I made some more tests about performance, and finally > "RESOLVE_REVISIONS_BATCH_SIZE = 100" seems like a good tradeoff for time and > memory. This change has been released to Equinox as: http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=16bb483bd75d665c3ff6a554419ddf8ad93bab57 I also did a few modifications and added a simple test for the multiple path case that caused duplicates to be added. The test is not complete, I need to add to it to force a conflict that will ensure other paths are used if the first solution is not valid. Leaving bug open for that.
(In reply to Matthieu Helleboid from comment #20) > Do you plan to backport these modifications to R4_4_maintenance for Luna > SR2? > > By looking at https://bugs.eclipse.org/bugs/attachment.cgi?id=249987 I think > that 100 is a good value. But for Luna SR2, maybe it would be a good thing > to offer a runtime parameter to change value and keep 100 as a defaut value. I already did add a new configuration option to control that (see my last comment :) For Luna, I am worried about the Equinox resolver changes. They look good, but I fear we do not have full confidence to release the change. I know it is not ideal, but would it be sufficient to just go with the felix resolver changes in Luna SR2? It would result in a long initial startup, but we do cache the results so subsequent starts would not be slowed down by resolution time.
I missed removing the unused constant, now fixed: http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=a464c1c67a96df050efb67d69a1692886ecc134e
(In reply to Thomas Watson from comment #21) > The Felix resolver change has been released to Equinox: > > I also did a few modifications and added a simple test for the multiple path > case that caused duplicates to be added. The test is not complete, I need > to add to it to force a conflict that will ensure other paths are used if > the first solution is not valid. Leaving bug open for that. Thanks a lot for your time and reactivity! (In reply to Thomas Watson from comment #22) > I already did add a new configuration option to control that (see my last > comment :) Ok perfect :) > For Luna, I am worried about the Equinox resolver changes. They look good, > but I fear we do not have full confidence to release the change. I know it > is not ideal, but would it be sufficient to just go with the felix resolver > changes in Luna SR2? It would result in a long initial startup, but we do > cache the results so subsequent starts would not be slowed down by > resolution time. I can understand. Maybe a solution would be to have a default value "1" for RESOLVE_REVISIONS_BATCH_SIZE in Lune SR2, so the behavior would be the same as Luna SR1 in the default case. Anyway, you're right the cache will do the work for subsequent starts.
(In reply to Matthieu Helleboid from comment #24) > I can understand. Maybe a solution would be to have a default value "1" for > RESOLVE_REVISIONS_BATCH_SIZE in Lune SR2, so the behavior would be the same > as Luna SR1 in the default case. > Anyway, you're right the cache will do the work for subsequent starts. After getting burned badly in Luna SR1 [1] I am inclined to avoid the unnecessary changes in order to keep the risk as low as possible. [1] see bug 445122
Finished test off with a case that would force the resolution to back off a require-bundle decision for the case where we used to have duplicate blames. http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=b57c97c7177699e33baca80cd305e20b17153526
Hi! I just want to let you know that the fix for this bug in the master branch (Mars) seems to have caused a regression (infinite loop), see bug 460393.
I am badly struggling with this bug on Luna SR2. I read the comments carefully and it seems there will be no easy Luna fix/update. What can I do to have my project working again? A downgrade to Kepler is not possible for me. Any instructions of how to bypass this would be REALLY helpful!