Bug 246444 - Complete for Eclipse phase of install/update is very slow on 1GB mem machines
Summary: Complete for Eclipse phase of install/update is very slow on 1GB mem machines
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.4   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.4.2   Edit
Assignee: Simon Kaegi CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2008-09-05 18:04 EDT by Stuart Frische CLA
Modified: 2008-10-25 15:02 EDT (History)
11 users (show)

See Also:


Attachments
3.4.x patch for bulk artifact removal (first hiccup) (1.90 KB, patch)
2008-09-09 14:04 EDT, Simon Kaegi CLA
no flags Details | Diff
Further patch to RepositoryListener (3.79 KB, patch)
2008-09-12 09:19 EDT, Simon Kaegi CLA
no flags Details | Diff
Extension Location patch (8.35 KB, patch)
2008-09-12 09:51 EDT, Simon Kaegi CLA
no flags Details | Diff
rollback repo experimental patch (1.13 KB, patch)
2008-09-12 09:55 EDT, Simon Kaegi CLA
no flags Details | Diff
Patch to only call manipulator load and save once per operation. (3.76 KB, patch)
2008-10-25 14:36 EDT, Pascal Rapicault CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stuart Frische CLA 2008-09-05 18:04:24 EDT
The Complete for Eclipse phase of Capilano Install Manager updates of the RAD 7.5 (3.4 based)product are very slow on 1GB machines.  Some improvements in this came with the fix to bugzilla https://bugs.eclipse.org/bugs/show_bug.cgi?id=238312. 

++++
Here are some data taken for a 2GB RAM Desktop and a 1GB RAM Laptop.

I tried this on my desktop which has 2 GB of RAM and is a 3GHz machine.
These are the times I gathered from the IM log file -
Initial Install of RAD 20080824_1401 on a clean system with default features selected - Elapsed time 02:03.87 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4

Update to RAD 0080825_1442 - Elapsed time 02:28.49 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4

Rollback to RAD 20080824_1401 - Elapsed time 02:21.24 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4

Install of RSA 20080821_2306 - shell share - Elapsed time 02:44.25 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4

Update of RSA to 20080825_1125 - Elapsed time 05:46.36 for: C:\Program Files\IBM\SDP\eclipse.exe -application org.eclipse.equinox.p2.reconciler.application -product org.eclipse.equinox.p2.reconciler.dropins.product -nosplash --launcher.suppressErrors -vmArgs -Dosgi.bundles.defaultStartLevel=4

++++
On my laptop T40 that has only 1G of RAM and is 1.6GHz machine, the update times for RAD were to the tune of 10 minutes. Looks like there is a lot of thrashing in memory going on in this case.  

We discussed this performance issue in some detail with Pascal in the Capilano CCB meeting today.
Comment 1 Simon Kaegi CLA 2008-09-05 18:31:46 EDT
There is a timing patch in bug 224579 that when applied provides timing info that would really help to identify where we're spending our time here. Would it be possible to use that to test.
Comment 2 Gary Karasiuk CLA 2008-09-08 07:32:59 EDT
I ran this on a P3-1GHz machine with 1 GB of RAM. I didn't observe any thrashing, due to the smaller memory size.

It took 11.5 minutes to complete. It consumed 7:19 minutes of CPU time, and read over 1.2GB bytes of data. The peak working set was 591MB.

(I wish I would have seen the previous append about the patch before I ram this :-) ) 
Comment 3 Gary Karasiuk CLA 2008-09-08 08:07:17 EDT
I tried to profile this by simply running with the
 
  -application org.eclipse.equinox.p2.reconciler.application 
  -product org.eclipse.equinox.p2.reconciler.dropins.product

arguments, but the slowdown only occurs the first you run this. (i.e. the profiled run was much quicker than the initial run, even through sampling was enabled).

I'll try this again the next time I go through an update cycle. 
Comment 4 Gary Karasiuk CLA 2008-09-08 10:06:16 EDT
(In reply to comment #2)
> I ran this on a P3-1GHz machine with 1 GB of RAM. I didn't observe any
> thrashing, due to the smaller memory size.
Correction. I ran this on a P4-3GHz with 2 GB of RAM. 
Comment 5 Gary Karasiuk CLA 2008-09-08 11:12:54 EDT
I took a Yourkit profile of the Java process. If you are interested in getting it let me know, it is .5 MB zipped.

I'm not sure that I caught the problem. This is what I did.

1) I did an update.
2) As soon as the java process started (to do the -application org.eclipse.equinox.p2.reconciler.application) I killed it.
3) I let the install finish.
4) I manually started the product, with profiling turned on and with the 
 -application org.eclipse.equinox.p2.reconciler.application 
 -product org.eclipse.equinox.p2.reconciler.dropins.product
parameters.

The process did read 1GB of data (which I think is the root of this problem), but it also ran quicker than I expected. 
Comment 6 Nalini Ganapati CLA 2008-09-08 15:02:16 EDT
With the instrumented plugin for org.eclipse.equinox.p2.reconciler, these were the times I observed in the update of RAD scenario -

0ms - start
0ms - isUpToDate done
40ms - startEarly done
4271ms - getCurrentProfile done
4271ms - checkConfigIni done
4381ms - removeOldRepos done
4681ms - watchDropins done
4891ms - watchConfiguration done
4961ms - readTimestamps done
4961ms - isUpToDate false
8632ms - synchronizeCacheExtensions done
124361ms - createProfileChangeRequest done
138915ms - createProvisioningPlan done
506552ms - executePlan done
506776ms - writeTimestamps done
506776ms - applyConfiguration done
506776ms - synchronize done
506806ms - writeTimestamps done
Comment 7 Simon Kaegi CLA 2008-09-08 21:52:03 EDT
Thanks Nalini that helps.

The first hiccup... 124361ms - createProfileChangeRequest done

...is a bit misleading as the time spent creating a profile change request is tiny however in creating the request it triggers the loading of the extension location repository where I imagine the actual file updating has happened.

In this case I believe we're taking a positively huge hit as a result of bug 244628. I'm seeing a similar time spent in the test case I constructed to mimic your environment where I'm updating 2000 dummy bundles and in the process doing more than 1GB of writing.

The second major hiccup... 506552ms - executePlan done
...I'm still trying to sort out but suspect that it is occuring in FrameworkAdmin when we try to resolve the bundle state repeatedly as bundles are updated. While profiling in the past I have seen a memory spike here but in my current test case I'm actually not seeing such a dramatic time spent so far. I'm still playing around with restricting physical memory though so hopefully will be able to reproduce.
Comment 8 Simon Kaegi CLA 2008-09-09 14:04:18 EDT
Created attachment 112112 [details]
3.4.x patch for bulk artifact removal (first hiccup)

Here's a patch for the first problem that should eliminate about 100s in your use-case. Rather than removing each artifact one-by-one when a change is required it the implementation keeps track of what's been removed and then just rewrites the whole contents of the repository in one write to disk.
Comment 9 Simon Kaegi CLA 2008-09-12 09:19:50 EDT
Created attachment 112414 [details]
Further patch to RepositoryListener

This patch builds on the previous patch to more efficiently serialize large metadata repositories when artifacts are both added and removed.
Comment 10 Simon Kaegi CLA 2008-09-12 09:51:18 EDT
Created attachment 112417 [details]
Extension Location patch

This patch contains two optimzations in the extension location bundle:
1) optimizes the "isInterested" method in the SiteListener when comparing the current file with those in the site's "list". We use a Set lookup instead of iterating over a list.
2) optimizes the code path when calling ExtensionLocationMetadataRepositoryFactory.create to use "validate" instead of "load" when testing for the existence of the underlying LocalMetadataRepository. This allows us to avoid an extra load (and subsequent parse)of the repo contents.
Comment 11 Simon Kaegi CLA 2008-09-12 09:55:54 EDT
Created attachment 112419 [details]
rollback repo experimental patch

While doing testing I ran into a situation where an inordinate amount of time was being spent working with the rollback repo. I was updating back and forth probably in excess of 50 times so the rollback repo became really large. This patch eliminates the use of the rollback repo when using the reconciler application and might be useful to eliminate a large rollback repo as a source of problems.
Comment 12 Simon Kaegi CLA 2008-09-12 10:09:16 EDT
I want to set some expectations here. My original patch is unlikely to drop the time by 100s since that was only the case when I was updating 2000 plugins. The real test case only updates 33 artifacts so doesn't really have the same characteristics.

Even with all patches installed I saw only a relatively small improvement in performance. On the order of 15%. My simulated low memory environment never produced the sort of results logged here so I would be curious to see what impact they have.
Comment 13 Simon Kaegi CLA 2008-09-16 20:56:39 EDT
I finally got some results that are more inline with what you're seeing by reducing my available memory down close to zero.  Here are the results I saw:

16ms - start
78ms - isUpToDate done
391ms - startEarly done
4687ms - getCurrentProfile done
4687ms - checkConfigIni done
5312ms - removeOldRepos done
6125ms - watchDropins done
30141ms - watchConfiguration done
30281ms - readTimestamps done
30281ms - isUpToDate false
30328ms - synchronizeCacheExtensions done
30516ms - createProfileChangeRequest done
42031ms - createProvisioningPlan done
573375ms - executePlan done
573500ms - writeTimestamps done
573500ms - applyConfiguration done
573500ms - synchronize done
573562ms - writeTimestamps done

I also captured a profiler snapshot so could get a pretty good idea on where we're spending our time; in a nutshell it's spent reading zip files and to a lesser degree reading and writing xml. This is done both when scanning the extension location to create the extension location repository, reading and writing the profile but mostly in framework admin when validating and updating the bundles in an install.

CPU utilization for the most part is very low (<5%). This isn't all that surprising as in a memory starved environment we're spending our time re-reading and writing blocks to and from disk instead of doing work.

To have a significant impact on this problem we would have to either substantially re-think the way framework admin works or more likely use it more effectively. I don't think either are changes we could make easily in the 3.4 stream without significantly de-stabilizing the installation process as a whole. As a result I'm going to suggest we defer changes in this area until 3.5.
Comment 14 Gary Karasiuk CLA 2008-09-17 06:36:58 EDT
(In reply to comment #13)
If your machine has low memory and is thrashing, then I would expect the times to be bad, but I don't think that that is the only problem here.

I am still finding that this phase takes a fairly long time, even on fast machines with lots of memory. I just finished an update using a Q9300 (Quad core machine) with 4 GB of RAM.  

This phase of the update took 4.5 minutes of elapsed time. It consumed 2:14 minutes of CPU time, read over 750MB of data, and hit a peak working set of 763 MB. 

Comment 15 Mike Wilson CLA 2008-09-17 14:18:37 EDT
(In reply to comment #13)
> CPU utilization for the most part is very low (<5%).
>

(In reply to comment #14)
> This phase of the update took 4.5 minutes of elapsed time. It consumed 2:14
> minutes of CPU time, 
>

These two comments don't seem to agree.
Comment 16 Gary Karasiuk CLA 2008-09-18 10:55:40 EDT
(In reply to comment #15)
In my case it is a quad processor so there are 4.5x4=18 CPU minutes. So it consumed on average 12.5% of the CPU.

Also in my case it was not memory constrained, so there would be much less paging activity. Given that Simon's machine was doing a lot of paging, I would expect his CPU utilization to be lower than mine.

Comment 17 Dave Stevenson CLA 2008-09-23 17:31:26 EDT
Bug title described an uninstall perf problem:

   Complete for Eclipse phase of uninstall is very slow on 1GB mem machines

but the discussion is only of install and update, so changed bug title to reflect content.
Comment 18 Pascal Rapicault CLA 2008-10-25 14:36:53 EDT
Created attachment 116137 [details]
Patch to only call manipulator load and save once per operation.

This patch should reduce the time spent in executing the plan, by only reading / writing the bundles.info once.
Comment 19 Pascal Rapicault CLA 2008-10-25 15:02:52 EDT
Closing as fixed.