Community
Participate
Working Groups
The Complete for Eclipse phase of Capilano Install Manager updates of the RAD 7.5 (3.4 based)product are very slow on 1GB machines. Some improvements in this came with the fix to bugzilla https://bugs.eclipse.org/bugs/show_bug.cgi?id=238312. ++++ Here are some data taken for a 2GB RAM Desktop and a 1GB RAM Laptop. I tried this on my desktop which has 2 GB of RAM and is a 3GHz machine. These are the times I gathered from the IM log file - Initial Install of RAD 20080824_1401 on a clean system with default features selected - Elapsed time 02:03.87 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4 Update to RAD 0080825_1442 - Elapsed time 02:28.49 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4 Rollback to RAD 20080824_1401 - Elapsed time 02:21.24 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4 Install of RSA 20080821_2306 - shell share - Elapsed time 02:44.25 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4 Update of RSA to 20080825_1125 - Elapsed time 05:46.36 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4 ++++ On my laptop T40 that has only 1G of RAM and is 1.6GHz machine, the update times for RAD were to the tune of 10 minutes. Looks like there is a lot of thrashing in memory going on in this case. We discussed this performance issue in some detail with Pascal in the Capilano CCB meeting today.
There is a timing patch in bug 224579 that when applied provides timing info that would really help to identify where we're spending our time here. Would it be possible to use that to test.
I ran this on a P3-1GHz machine with 1 GB of RAM. I didn't observe any thrashing, due to the smaller memory size. It took 11.5 minutes to complete. It consumed 7:19 minutes of CPU time, and read over 1.2GB bytes of data. The peak working set was 591MB. (I wish I would have seen the previous append about the patch before I ram this :-) )
I tried to profile this by simply running with the -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product arguments, but the slowdown only occurs the first you run this. (i.e. the profiled run was much quicker than the initial run, even through sampling was enabled). I'll try this again the next time I go through an update cycle.
(In reply to comment #2) > I ran this on a P3-1GHz machine with 1 GB of RAM. I didn't observe any > thrashing, due to the smaller memory size. Correction. I ran this on a P4-3GHz with 2 GB of RAM.
I took a Yourkit profile of the Java process. If you are interested in getting it let me know, it is .5 MB zipped. I'm not sure that I caught the problem. This is what I did. 1) I did an update. 2) As soon as the java process started (to do the -application org.eclipse.equinox.p2.reconciler.application) I killed it. 3) I let the install finish. 4) I manually started the product, with profiling turned on and with the -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product parameters. The process did read 1GB of data (which I think is the root of this problem), but it also ran quicker than I expected.
With the instrumented plugin for org.eclipse.equinox.p2.reconciler, these were the times I observed in the update of RAD scenario - 0ms - start 0ms - isUpToDate done 40ms - startEarly done 4271ms - getCurrentProfile done 4271ms - checkConfigIni done 4381ms - removeOldRepos done 4681ms - watchDropins done 4891ms - watchConfiguration done 4961ms - readTimestamps done 4961ms - isUpToDate false 8632ms - synchronizeCacheExtensions done 124361ms - createProfileChangeRequest done 138915ms - createProvisioningPlan done 506552ms - executePlan done 506776ms - writeTimestamps done 506776ms - applyConfiguration done 506776ms - synchronize done 506806ms - writeTimestamps done
Thanks Nalini that helps. The first hiccup... 124361ms - createProfileChangeRequest done ...is a bit misleading as the time spent creating a profile change request is tiny however in creating the request it triggers the loading of the extension location repository where I imagine the actual file updating has happened. In this case I believe we're taking a positively huge hit as a result of bug 244628. I'm seeing a similar time spent in the test case I constructed to mimic your environment where I'm updating 2000 dummy bundles and in the process doing more than 1GB of writing. The second major hiccup... 506552ms - executePlan done ...I'm still trying to sort out but suspect that it is occuring in FrameworkAdmin when we try to resolve the bundle state repeatedly as bundles are updated. While profiling in the past I have seen a memory spike here but in my current test case I'm actually not seeing such a dramatic time spent so far. I'm still playing around with restricting physical memory though so hopefully will be able to reproduce.
Created attachment 112112 [details] 3.4.x patch for bulk artifact removal (first hiccup) Here's a patch for the first problem that should eliminate about 100s in your use-case. Rather than removing each artifact one-by-one when a change is required it the implementation keeps track of what's been removed and then just rewrites the whole contents of the repository in one write to disk.
Created attachment 112414 [details] Further patch to RepositoryListener This patch builds on the previous patch to more efficiently serialize large metadata repositories when artifacts are both added and removed.
Created attachment 112417 [details] Extension Location patch This patch contains two optimzations in the extension location bundle: 1) optimizes the "isInterested" method in the SiteListener when comparing the current file with those in the site's "list". We use a Set lookup instead of iterating over a list. 2) optimizes the code path when calling ExtensionLocationMetadataRepositoryFactory.create to use "validate" instead of "load" when testing for the existence of the underlying LocalMetadataRepository. This allows us to avoid an extra load (and subsequent parse)of the repo contents.
Created attachment 112419 [details] rollback repo experimental patch While doing testing I ran into a situation where an inordinate amount of time was being spent working with the rollback repo. I was updating back and forth probably in excess of 50 times so the rollback repo became really large. This patch eliminates the use of the rollback repo when using the reconciler application and might be useful to eliminate a large rollback repo as a source of problems.
I want to set some expectations here. My original patch is unlikely to drop the time by 100s since that was only the case when I was updating 2000 plugins. The real test case only updates 33 artifacts so doesn't really have the same characteristics. Even with all patches installed I saw only a relatively small improvement in performance. On the order of 15%. My simulated low memory environment never produced the sort of results logged here so I would be curious to see what impact they have.
I finally got some results that are more inline with what you're seeing by reducing my available memory down close to zero. Here are the results I saw: 16ms - start 78ms - isUpToDate done 391ms - startEarly done 4687ms - getCurrentProfile done 4687ms - checkConfigIni done 5312ms - removeOldRepos done 6125ms - watchDropins done 30141ms - watchConfiguration done 30281ms - readTimestamps done 30281ms - isUpToDate false 30328ms - synchronizeCacheExtensions done 30516ms - createProfileChangeRequest done 42031ms - createProvisioningPlan done 573375ms - executePlan done 573500ms - writeTimestamps done 573500ms - applyConfiguration done 573500ms - synchronize done 573562ms - writeTimestamps done I also captured a profiler snapshot so could get a pretty good idea on where we're spending our time; in a nutshell it's spent reading zip files and to a lesser degree reading and writing xml. This is done both when scanning the extension location to create the extension location repository, reading and writing the profile but mostly in framework admin when validating and updating the bundles in an install. CPU utilization for the most part is very low (<5%). This isn't all that surprising as in a memory starved environment we're spending our time re-reading and writing blocks to and from disk instead of doing work. To have a significant impact on this problem we would have to either substantially re-think the way framework admin works or more likely use it more effectively. I don't think either are changes we could make easily in the 3.4 stream without significantly de-stabilizing the installation process as a whole. As a result I'm going to suggest we defer changes in this area until 3.5.
(In reply to comment #13) If your machine has low memory and is thrashing, then I would expect the times to be bad, but I don't think that that is the only problem here. I am still finding that this phase takes a fairly long time, even on fast machines with lots of memory. I just finished an update using a Q9300 (Quad core machine) with 4 GB of RAM. This phase of the update took 4.5 minutes of elapsed time. It consumed 2:14 minutes of CPU time, read over 750MB of data, and hit a peak working set of 763 MB.
(In reply to comment #13) > CPU utilization for the most part is very low (<5%). > (In reply to comment #14) > This phase of the update took 4.5 minutes of elapsed time. It consumed 2:14 > minutes of CPU time, > These two comments don't seem to agree.
(In reply to comment #15) In my case it is a quad processor so there are 4.5x4=18 CPU minutes. So it consumed on average 12.5% of the CPU. Also in my case it was not memory constrained, so there would be much less paging activity. Given that Simon's machine was doing a lot of paging, I would expect his CPU utilization to be lower than mine.
Bug title described an uninstall perf problem: Complete for Eclipse phase of uninstall is very slow on 1GB mem machines but the discussion is only of install and update, so changed bug title to reflect content.
Created attachment 116137 [details] Patch to only call manipulator load and save once per operation. This patch should reduce the time spent in executing the plan, by only reading / writing the bundles.info once.
Closing as fixed.