Community
Participate
Working Groups
Hi, when i try to "import projects" of the Apache Camel projects which contains a bit more than 700 Maven modules, I have the following issues: - takes several hours - UI freeze - ends with Out Of memory Here is a heap dump during the import: https://drive.google.com/open?id=0B-rING_Zzzceb1JfZ3BIUlVpSWc Here is the github repo url that you can use to reproduce the error: https://github.com/apache/camel I'm currently using Oxygen M6
Created attachment 267935 [details] MavenProjectCobnfigurator$UpdateMavenConfigurationJob seems to retain all the memory
Let me know if you are planning/interested to improve m2e project import memory footprint. I won't have time to work on this myself in the near future, but I should be able to help vet the ideas and review/merge the changes.
I'm interested to work on it but it is currently not on the top of my todo list, so don't know when i will have time to work on it personally. i reported the issue to provide details on a known issue and gather ideas from communities (including you ;)) on the way to improve it. It might be also a candidate to FEEP program.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug. If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie.
the issue still occurs
I'm raising priority and severity. I hope I'll be able to audit that soon and provide a patch.
I've generated a new smaller dump on the same project to ease analysis, and then started my analysis. The issue is that the are too many MavenProject instance created during the update. The same pom.xml file gets parsed as a MavenProject objects as many time as it is referenced. So in the case of Camel with is 763 modules, then everyone of those 763 MavenProject contain 1 copy of the MavenProject for the parent pom. This could most likely be factorized by sharing the pool of MavenProject and making sure the existing project is reused for a given pom(+extra settings?) input.
Refining a bit: the culprit piece of codes can be (not exclusively) ProjectConfigurationManager, line 281 & 286: iterating on each project separately and invoking facade.getMavenProjects() results in having one copy of the parent hierarchy of the current project. This hierarchy is not shared, resulting in many duplicated MavenProject instance remaining in memory. This could be improved, but still, one issue is that if projects were imported separately, 1 by 1, then changing this method wouldn't help and we would still get too many copies of the same MavenProject. So this needs to be one layer below, in facade.getMavenProject(...) which resovles to ProjectRegistry.getMavenProject(facade, monitor) This method needs to smarten up and be aware of the other MavenProjects that were already instantiated in hierarchies of other modules to reuse them instead of duplicating them. It seems like there is no Maven API to build a MavenProject providing some contextual pre-existing MavenProjects to reuse. So as a 1st approach, we could go either for replacement of the MavenProject, and/or a greedier feeding of the contextProjects: if we load a project, then create the MavenFacade of the parent projects and populate the contextProjects with those so they'd be reused.
ProjectRegistryManager.readProjectWithDependencies L807 is probably where the fix should happen. It doesn't make sense to improve solely the case of project import, because the model and hierarchy are also necessary during the execution of m2e and m2e recreates the model if projects are already existing. So if it takes hours and GB of RAM at import, fixing import would still make it take hours and GB of RAM upon restart.
Maven 3.6.0 introduces a new cache https://jira.apache.org/jira/browse/MNG-6311, that might help. But we need to make sure it won't introduce memory leaks in the context of the IDE
So just to reformulate the issue: the MavenProject model always keeps a reference to parent, and those references are duplicated (so 2 nodes under which have same parent would have 2 duplicated parent description although those are the same values). The cost of memory for a given project is basically linear with the depth of the project: a project which has a hierarchy of 8 parents will have 8 models loaded, resulting in ~8MB. In the case of Camel, there are many modules which have a long parency chain (8 parents) and costing ~9MB. The sum of all those modules goes to several GBs. If the hierarchy were "flat" then each module wouldn't reference parents and would be only ~1MB, resulting in an affordable ~700MB cost. The goal is to reduce the number of instances of MavenProject without breaking m2e > Maven 3.6.0 introduces a new cache https://jira.apache.org/jira/browse/MNG-6311, that might help. But we need to make sure it won't introduce memory leaks in the context of the IDE This could indeed help a lot. @Fred: Do you know whether the parent reference is useful in m2e or in the MavenFacade or in the MavenProject model once we've parsed the pom? If not, we could try setting it to null once model is built so it can be garbage collected. It wouldn't be optimal but could just make things work.
References to MavenProject.getParent() in the whole m2e code is * M2EUtils.getDefiningProjects() * DefaultMavenDependencyResolver.addProjectStructureRequirements() * ManageDependenciesDialog.ContentProvider.getParent()
(In reply to Fred Bricon from comment #10) > Maven 3.6.0 introduces a new cache > https://jira.apache.org/jira/browse/MNG-6311, This is "only" a model cache, not a project cache. To get an idea of whether how much it would save memory, we need measure how much the model represents of memory consumption on a MavenProject. If it's the vast majority, then it can be sufficient to use the cache. If not, then we will really need something specific in m2e to avoid creating too many MavenProject instances. About the cache being field we don't control for the DefaultProjectBuilder, one question is whether we do need to keep an instance of the DefaultProjectBuilder for m2e or if we can just use one for Import at least? If the model is majority and if we can have a DefualtProjectBuilder that's only used for the import, then it could be enough to have the import story acceptable without risk for other m2e parts.
Created attachment 276307 [details] Screenshot - influence of model on overall size On this tree, we can see that for one example module (from Camel import), retaining 5.6MB of memory, the 1.6MB are the "payload" of the current MavenProject (if we exclude the 4MB parent). Out of those 1.6MB, about 800kB are taken by the model. It's about 50% of the size. If we can take advantage of the model cache, it would reduce by 50% at best memory consumption. It's definitely worth it, but it's still no enough as the import of a project like Camel would take 3GB instead of 6GB. We need something more drastic (additionally of the model cache or in place of it). We basically need to divide it by 10 to consider it usable. Trying to avoid duplication of the parent MavenProject still seems to be the important thing here.
Created attachment 276309 [details] Test project showing memory consumption with various build strategies The attached sample project shows how things could be drastically improved in m2e import (and only import at the moment, but it seems to be worth it). In the import, projects are imported one-by-one cascading to multiple ProjectBuilder.build(pomFile, request) which leads to duplicated MavenProject, while there is a Projectbuilder.build(/*List*/ pomFiles, request) which reuses the already procudes MavenProjects. So the import should take advantage of this method, and build all the projects that have the same "request" together with this method and populate the registry with those. For Apache Camel, it would turn multiple gigabytes to a ~800MB, preventing from many many garbage collections attempts and then making the operation just work.
Mickael, is there any code to try out?
(In reply to Peter Palaga from comment #16) > Mickael, is there any code to try out? Not yet. I'm working on it and m2e internals do interate instead of grouping in many places. I hope I'm getting close to a raw patch to try out and.I'll be able to share it by the end of this week. But no guarantee on the timing.
Thanks for the update, Mickael
New Gerrit change created: https://git.eclipse.org/r/132484
(In reply to Eclipse Genie from comment #19) > New Gerrit change created: https://git.eclipse.org/r/132484 @Pete @Aurelien: do you think you could try this patch? Locally, while I constantly got OOM when importing Camel, with this patch, I manage to get the import of the 767 modules performed completely and using ~2GB of RAM max. It would be really nice if you could give it a try, see how it behaves, and -more importantly- verify that the result are projects working "as expected" in the IDE (ie that configuration seems correct). @Fred: I'll extract some small "preparation" commits from this patch to make review incremental and easier. @All: m2e project management is a can of worms. m2e has a lot of tweaks to not keep too many project in memory, which is more or less a workaround to the issue I'm trying to fix here at import (ie quadratic memory consumption when refreshing projects). As a result, m2e discards and re-parses the MavenProject from pom files very very frequently, leading to much CPU waste. With this new possibility of saving memory, when we have this first case decently covered, we can try to propagate this to other operations and revise how caching is done. I've opened bug 541200 on this topic, there will probably be other ones.
New Gerrit change created: https://git.eclipse.org/r/132487
New Gerrit change created: https://git.eclipse.org/r/132490
New Gerrit change created: https://git.eclipse.org/r/132493
I was able to install the artifacts from https://ci.eclipse.org/m2e/job/m2e-gerrit/270/ combined with http://download.eclipse.org/eclipse/updates/4.10-I-builds/. You may want to have a look at the installation details to check whether I have the right artifacts: https://paste.fedoraproject.org/paste/LnbF1fWEXRJ7zufalhUs1Q I can confirm that I could import Camel using -Xmx3g. Before the import, I removed all Eclipse metadata from the source tree using find . -type d -name ".settings" -exec rm -Rf {} + \ && find . -type f -name ".classpath" -exec rm -Rf {} + \ && find . -type f -name ".project" -exec rm -Rf {} + I imported in a situation when the project had all maven deps cached in the local maven repo because it was recently built via CLI on this machine. I'll be using this instance in the coming days. I am especially curious if the UI freezes reported in Bug#541040 will go away.
Created attachment 276587 [details] Updating Maven Dependencies during startup The startup of the patched workbench took minutes today. I could notice "Initializing Java Tooling" and "Updating Maven Dependencies" tasks for the most of the time. I'd say seeing "Updating Maven Dependencies" during startup is not normal. There was also a UI freeze of tens of seconds. I tried storing a thread dump during the freeze, but I suspect, it was taken shortly after the UI freeze was over (attached). But "Updating Maven Dependencies" can still be seen there. When I restarted the workbench afterwards it started quickly within seconds. Hence the problem does not occur on every startup.
(In reply to Peter Palaga from comment #25) > The startup of the patched workbench took minutes today. I could notice > "Initializing Java Tooling" and "Updating Maven Dependencies" tasks for the > most of the time. I'd say seeing "Updating Maven Dependencies" during > startup is not normal. Are you sure this issue is related to that patch? I don't see any reason why the change I submitted should affect when projects are built or not at startup. What would be interesting would be to sort out what does trigger the update: is this m2e internal or something else not related? Did you happen to modify one pom file before the restart? > There was also a UI freeze of tens of seconds. I > tried storing a thread dump during the freeze, but I suspect, it was taken > shortly after the UI freeze was over (attached). I think the stack contains the freeze. See the main thread (which is the UI Thread) is processing some NavigatorContentService stuff. On big trees, it can indeed take a while. But I don't think it's related to that change that has 0 relationship with UI. > But "Updating Maven > Dependencies" can still be seen there. When I restarted the workbench > afterwards it started quickly within seconds. Hence the problem does not > occur on every startup. Ok, this drives even more to the idea that the issue is "external" to this patch.
(In reply to Peter Palaga from comment #25) > But "Updating Maven > Dependencies" can still be seen there. When I restarted the workbench > afterwards it started quickly within seconds. Hence the problem does not > occur on every startup. It seems to come from ProjectRegistryRefreshJob which is fed with "dirty" projects for further refreshing. Complexity and cost of refresh is similar to the one of an import (so it takes minutes and GB if you have all modules in it). The question is what was this job triggered and what fed it with changes to process? I don't know the answer, but I don't believe something in the proposed patch can be a part of it.
(In reply to Mickael Istria from comment #26) > (In reply to Peter Palaga from comment #25) > > The startup of the patched workbench took minutes today. I could notice > > "Initializing Java Tooling" and "Updating Maven Dependencies" tasks for the > > most of the time. I'd say seeing "Updating Maven Dependencies" during > > startup is not normal. > > Are you sure this issue is related to that patch? No am not sure, but at the same time, I do not remember to have seen this with stock Eclipse 2018-09 or Photon. > I don't see any reason why the change I submitted should affect when > projects are built or not at startup. What would be interesting would be to > sort out what does trigger the update: is this m2e internal or something > else not related? Did you happen to modify one pom file before the restart? Last change I did in any of the projects in the workspace was done using stock Eclipse 2018-09 yesterday. After that I git-commited all changes, closed 2018-09, deleted Eclipse metafiles, started the patched workbench and imported Camel and WildFly Camel. Did no changes. Closed the workbench yesterday and opened it today. > > There was also a UI freeze of tens of seconds. I > > tried storing a thread dump during the freeze, but I suspect, it was taken > > shortly after the UI freeze was over (attached). > > I think the stack contains the freeze. See the main thread (which is the UI > Thread) is processing some NavigatorContentService stuff. On big trees, it > can indeed take a while. But I don't think it's related to that change that > has 0 relationship with UI. Interesting. > > But "Updating Maven > > Dependencies" can still be seen there. When I restarted the workbench > > afterwards it started quickly within seconds. Hence the problem does not > > occur on every startup. > > Ok, this drives even more to the idea that the issue is "external" to this > patch. Maybe.
Would you be so kind to just retry a workbench restart 4-5 times and report whether there is a kind of pattern popping-up?
Restarted five times without any issues.
I could not reproduce the freeze on startup even if I removed the projects from the workspace, deleted eclipse metadata from the trees, re-imported the projects and restarted the workbench.
New Gerrit change created: https://git.eclipse.org/r/132656
Gerrit change https://git.eclipse.org/r/132487 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=f69fa0086effca3438f750cba9ce6660b8d0c067
Gerrit change https://git.eclipse.org/r/132493 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=3286bf1719a34397c694ad67e9353913cb888865
Gerrit change https://git.eclipse.org/r/132490 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=25b27d33879c2f15692cfe89e76d5e06d97dd19c
New Gerrit change created: https://git.eclipse.org/r/132895
New Gerrit change created: https://git.eclipse.org/r/132897
So compared with the import of Eclipse Che (295 projects) on Eclipse Java EE 2018-12 M2, w/ 3GB of heap. Before - Imported and configured 295 project(s) in 1535 sec - Full import+build took 31 min 19 sec After - Imported and configured 295 project(s) in 275 sec - Full import+build took 8 min 53 sec With 2GB of heap, Eclipse becomes completely unresponsive without the patch, but completes the import if it's applied. Very nice results Mickael!
SO Che can import in Eclipse with just 1GB of heap BUT, only if automatic source download is disabled, else it runs into an OOME. I've noticed that before, DownloadSourcesJob quickly eats up a lot of heap, which can be problematic
(In reply to Fred Bricon from comment #39) > SO Che can import in Eclipse with just 1GB of heap BUT, only if automatic > source download is disabled, else it runs into an OOME. I've noticed that > before, DownloadSourcesJob quickly eats up a lot of heap, which can be > problematic Is there already a bugzilla ticket for this?
I think there is a low-hanging fruit we could pick: I've looked at Aurelien's original heap dump, and I've noticed that we have tons of separate string instances that basically reference string constants. For example, the string "RELEASE" is referenced ~180000 times with 56 bytes per instance for ~10MB of space. I suspect that Maven project descriptions contain many strings which are reused a lot, think of the artifact reference scope ("test"), for exmample.
(In reply to Thomas M??der from comment #41) > I think there is a low-hanging fruit we could pick: > > I've looked at Aurelien's original heap dump, and I've noticed that we have > tons of separate string instances that basically reference string constants. > For example, the string "RELEASE" is referenced ~180000 times with 56 bytes > per instance for ~10MB of space. I suspect that Maven project descriptions > contain many strings which are reused a lot, think of the artifact reference > scope ("test"), for exmample. Please open another bug report for that. There is a myriad of places where m2e can be improved. This ticket should keep focus on the MavenProject instance number which are by far the main cause of the issue. Having distinct tickets would allow to not forget any idea and make sure we have enough visibility to pick the most profitable ones first.
@Mickael, please ensure you don't introduce regressions by running the tests from https://github.com/tesla/m2e-core-tests. mvn clean integration-test -Puts,its -fae -Dm2e-core.url=file:///path/to/m2e-core/org.eclipse.m2e.site/target/repository Right now, I see Tests run: 535, Failures: 56, Errors: 11, Skipped: 0 for org.eclipse.m2e.tests Tests run: 45, Failures: 3, Errors: 0, Skipped: 0 for org.eclipse.m2e.editor.xml.tests
(In reply to Fred Bricon from comment #43) > @Mickael, please ensure you don't introduce regressions by running the tests > from > https://github.com/tesla/m2e-core-tests. Would be good if m2e actually made it easier for contributors to do that. Bug 541525 and Bug 541526
Thanks to Fred review, I managed to fix a few things in one of the commit, and could investigate another issue highlighted by the ProjectRegistryManagerTest . The issue is in Maven itself, I've opened https://issues.apache.org/jira/browse/MNG-6529 . This seems to be a blocker for usage in m2e.
Maven currently has 2 blockers for this: * https://issues.apache.org/jira/browse/MNG-6533 * https://issues.apache.org/jira/browse/MNG-6529
New Gerrit change created: https://git.eclipse.org/r/135223
Gerrit change https://git.eclipse.org/r/135223 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=c696e3cd599f3cd62770c723c06add4784cc004b
Is this one fixed in 2019-03? I guess it is not, because I see the usual out of memory exceptions when migrating from 2018-12.
(In reply to Peter Palaga from comment #49) > Is this one fixed in 2019-03? I guess it is not, because I see the usual out > of memory exceptions when migrating from 2018-12. See the bug is still not marked as resolved, so it's not fixed yet. The patches are ready, but we're waiting for Maven to release a 3.6.1 before being able to merge patches in m2e and fix this issue. Maven is a bit too flexible on release dates, we've been lobbying on the dev@maven.apache.org mailing-list to have the 3.6.1 released ASAP, but without so good success at the moment.
3.6.1 was recently released. Now hopefully this issue can be closed soon.
Gerrit change https://git.eclipse.org/r/132895 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=d0cd1dbaf370f7e7602c21ed653abc1e4f825f0a
Gerrit change https://git.eclipse.org/r/132897 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=15d9b8d7181c39369615107519830590b6d7a2b4
With * Eclipse SDK nightly build from https://download.eclipse.org/eclipse/downloads/drops4/I20190508-1800/ * Changed the eclipse.ini to -Xmx2048m (I guess 1024M is relly not suitable for the size of this project) * Added m2e snapshot from download.eclipse.org/technology/m2e/snapshots/1.12.0/latest/ * Make sure "Download Sources" is desactivated as per comment #39 (Fred, it'd be great if you can open a separate ticket for that). Then * took Apache Camel master archive from https://github.com/apache/camel/archive/master.zip * Expanded it in some random directory, run `mvn clean package -DskipTests` to populate .m2/repository and pre-fetch depedencies (otherwise m2e fetching all of them pollutes too much the experiment) -> Be ready to wait a looooong time. * Delete directory * Expanded archive in another directory And finally * File > Import > Existing Maven Projects... > new directory > Finish * Stopwatch on... * Waiting for Progress report to not show any running job * Stopwatch off... 16min34s * Check some random folders, they seem to be usable as Java/Maven projects (completion, validation, navigation...) work. There is still room for improvement. I see for example that some projects seems to be processed more than once, and "Update Maven . But IDE was not frozen, RAM didn't exceed 1880MB, so I guess we can mark this issue as resolved. Would you see any other performance issue in m2e (and I know you will), please open new bugs with similar level of details, and -why not- a heap dump if this seems to be caused by excessive memory consumption. We'll do our best to fix those.
@Mickael importing https://github.com/eclipse/che, I get "unable to read Maven project" for most (all) projects
(In reply to Fred Bricon from comment #55) > @Mickael importing https://github.com/eclipse/che, I get "unable to read > Maven project" for most (all) projects Can you please elaborate the steps one can use to attempt to reproduce it. I just tried it, with the same IDE as the one described earlier, and I could import all Che modules. The only error I see on some pom.xml are "plugin execution not covered...".
Just importing the che projects as Maven projects, it's an error marker projects get eventually. But I can't reproduce on my Linux box. I'll keep investigating on my mac.
I've managed to isolate the regression and opened Bug #547172 to address the issue.
New Gerrit change created: https://git.eclipse.org/r/141997
Gerrit change https://git.eclipse.org/r/141997 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=fe5d8590794dc4a35a2eb12d9a1d694e7fb8bb10
Gerrit change https://git.eclipse.org/r/132656 was merged to [master]. Commit: http://git.eclipse.org/c/m2e/m2e-core.git/commit/?id=c345d47d4ac0d9e2335a0bb65e55ec02062acd60
Moved to https://github.com/eclipse-m2e/m2e-core/issues/