Bug 329386 - Optimize manifest TouchPointData memory footprint for MetadataRepositories
Summary: Optimize manifest TouchPointData memory footprint for MetadataRepositories
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.7 M4   Edit
Assignee: Dean Roberts CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 329384 330463 333894
  Show dependency tree
 
Reported: 2010-11-03 14:38 EDT by Dean Roberts CLA
Modified: 2011-01-10 13:31 EST (History)
7 users (show)

See Also:


Attachments
(Patch 1) Don't write manifest data during build (3.79 KB, patch)
2010-11-17 09:38 EST, Dean Roberts CLA
no flags Details | Diff
(Patch 2) Install without manifest data (includes tests) (22.26 KB, patch)
2010-11-17 09:39 EST, Dean Roberts CLA
no flags Details | Diff
(Patch 3) Ignore existing manifest data when reading to memory (3.38 KB, patch)
2010-11-17 09:40 EST, Dean Roberts CLA
no flags Details | Diff
Current version and xml tolerence change suitable for HEAD (3.7) (2.33 KB, patch)
2010-11-17 09:40 EST, Dean Roberts CLA
no flags Details | Diff
XML tolerence change suitable for 3.6 branch (2.25 KB, text/plain)
2010-11-17 09:44 EST, Dean Roberts CLA
no flags Details
Build and install from repositories without manifest data (26.29 KB, patch)
2010-11-17 15:38 EST, Dean Roberts CLA
no flags Details | Diff
Optional - Do not read existing manifest repo data into memory (3.68 KB, patch)
2010-11-17 15:40 EST, Dean Roberts CLA
no flags Details | Diff
Touchpoint no longer reads manifest from touchpoint data (22.60 KB, patch)
2010-12-02 15:16 EST, DJ Houghton CLA
no flags Details | Diff
Publisher only publishes required attributes (2.16 KB, patch)
2010-12-02 16:49 EST, DJ Houghton CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Roberts CLA 2010-11-03 14:38:43 EDT
Build Identifier: 

In certain (typical?) repositories such as the Helios repository manifest TouchPointData represents a significant contribution to memory footprint.  Since each manifest entry in the repository is unique string interning does not provide any savings.

Strategies for moving the manifests out of memory, or perhaps out of the repository in general need to be investigated and implemented.



Reproducible: Always
Comment 1 Pascal Rapicault CLA 2010-11-05 04:41:48 EDT
The manifest got originally added because we thought that the eclipse touchpoint could do additional verification as part of the installation. For example, we thought it could load the state of the framework, extract the information from the bundle, build a new state and do some checks.
This never got implemented and will probably never be.

What kind of saving are we talking about by doing that?
Comment 2 Dean Roberts CLA 2010-11-05 08:06:49 EDT
For Helios the savings should be about 10 meg or 35%.  For an Eclipse I-build repository the savings is about 1 Meg or 13%

The detailed analysis of the savigns of this and two other complementary strategies can be found in the parent bug at https://bugs.eclipse.org/bugs/show_bug.cgi?id=329384

I made a few code changes that have allowed me to install an SDK using a modified Helios repository with no manfifest data.  Snif test seems to indicate it is ok.  Considerably more testing is required.

The on disk savings of a single child repository from Helios without manifest data is minimal shrinking 90 KB in compressed form and 700 KB in uncompressed form.
Comment 3 Jeff McAffer CLA 2010-11-08 13:21:02 EST
It would be interesting to know about in-memory space and loading time numbers.  Disk space is IMHO a secondary topic.
Comment 4 Dean Roberts CLA 2010-11-08 13:27:11 EST
I agree.  The disk space number was just thrown in there for interests sake.    The nubmers sumerized in comment #2 and listed in detail in the referenced parent bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=329384 I believe are the numbers you are interested in.
Comment 5 Pascal Rapicault CLA 2010-11-09 22:34:51 EST
Setting target since I understand that you will be fixing this in 3.7.
Comment 6 Dean Roberts CLA 2010-11-16 10:09:07 EST
(Comment originally made in incorrect defect)

DJ, John and myself had a discussion about the compatibility impact of proposed
changes to address this defect.

There will be three patches associated with this change.

Patch 1) Change code such that manifest data is not persisted to repositories
         during a build and increment the repository version so that 
         unpatched products don't fail during install with a missing manifest 
         error.

Patch 2) Modify install code such that manifest data is not required during
         install and increase the required version range so that patched 
         products can read the new repository version.

Patch 3) Modify code so that manifest data contained in repositories created 
         by unpatched products is not persisted in memory.

During the discussion there was an argument made for omitting patch 3.  Without
patch 3 the memory savings will only be enjoyed by new Eclipse versions reading
repositories created by new Eclipse versions. 

The primary motivation for omitting patch 3 was a concern over repositories
containing IUs with the same ID and version having but having different
physical representations on disk.  This happens, for example, when a patched
eclipse mirrors a repository with manifest data in it.  The repository is read
but the manifest data is ignored.  When the in memory IU is persisted it is
persisted without the manifest data.

Personally I believe we should include patch 3 but welcome input from the
community.  My arguments for inclusion are:

1) The change is minor (1 line)

2) What is important is the model representation of the IU.  With the patches
   in place the manifest data is ignored.  Thus the IU is conceptually 
   identical regardless of whether it was read from bytes that contained or 
   did not contain manifest information.

3) I believe a new Eclipse reading older repositories will not be a rare 
   occurance and the memory savings should be as widely available as possible.

Assuming we proceed, the following staging is proposed.

1) Release patch 1, 2 and 3 to 3.7 HEAD
2) Back port patch 2 to older streams.  3.6 for sure ... how far back do
   we want to go?  Suggestions please.

Patches 1 and 3 could never be released to older Eclipse streams since we would
not want to end up in a situation where repositories built by a 3.6.x Eclipse
would be unable to update a 3.6.(x - n) product.  Presumably particular
customers could take these patches if they where required and the ramifications
understood.

Once I add repository version number changes and checking and get some feed
back from this comment I will attach the three patches discussed.
Comment 7 Dean Roberts CLA 2010-11-16 10:09:59 EST
(Pasting comment into correct defect)

To be clear, the changes that Dean proposes in Patch 3 would only benefit 3.7
clients who are reading pre-3.7 repositories. 

Pros: Allows 3.7 clients to read older repositories. Could be important to
products built on Eclipse.
Cons: Questions about bundle uniqueness. When doing mirroring, etc you can end
up in a state with 2 IUs with the same id and version but different touchpoint
data. Is this invalid?

Comments from people about their use in delivering repositories would be
helpful in deciding whether or not to release this part of the code.
Comment 8 Dean Roberts CLA 2010-11-17 09:38:02 EST
Created attachment 183295 [details]
(Patch 1) Don't write manifest data during build
Comment 9 Dean Roberts CLA 2010-11-17 09:39:09 EST
Created attachment 183296 [details]
(Patch 2) Install without manifest data (includes tests)
Comment 10 Dean Roberts CLA 2010-11-17 09:40:03 EST
Created attachment 183297 [details]
(Patch 3) Ignore existing manifest data when reading to memory
Comment 11 Dean Roberts CLA 2010-11-17 09:40:55 EST
Created attachment 183299 [details]
Current version and xml tolerence change suitable for HEAD (3.7)
Comment 12 Dean Roberts CLA 2010-11-17 09:44:33 EST
Created attachment 183300 [details]
XML tolerence change suitable for 3.6 branch

This patch is suitable for any older Eclipse branch that receives Patch 2.  It will allow that branch to read repositories written without manifest data.
Comment 13 DJ Houghton CLA 2010-11-17 11:21:53 EST
Patch 1:
- We can't just remove the methods on BundleInfo, they are API. If we need to remove them or don't feel they are useful anymore, then we need to come up with a strategy to roll this change out to users.
- I don't know the code well, but I would have expected the code change in BundlesAction in Patch 2 to be a part of this patch.

Patch 3:
- you are making a constant into API in MetadataFactory. Is this intentional?
- I don't understand the reason for the code change in TouchpointDataHandler#getTouchpointData
Comment 14 DJ Houghton CLA 2010-11-17 11:22:58 EST
Also, I have opened bug 330463 to track any changes we want to release into the R3.6.x stream. Once we decide on which changes and their shape, please attach patches there. Thanks.
Comment 15 DJ Houghton CLA 2010-11-17 13:20:07 EST
Also I believe the patches are missing the code which increments the metadata repository version when writing it out, and increases the tolerance when reading it in.
Comment 16 DJ Houghton CLA 2010-11-17 13:20:51 EST
Sorry, ignore comment #15. That seems to be the content of Patch 4.
Comment 17 Dean Roberts CLA 2010-11-17 13:37:10 EST
The partitioning of the patches is discussed in comment #6.  

Patch 1 and Patch 2 are not combined because Patch 1 prevents the build from
writing manifest data to new repositories.  As such, Patch 1 is not intended
for older Eclipse streams.

We could leave the manifest field and its getters and setters on BundleInfo,
they would just never be called by any of our code after the patches are
applied.  But I would worry that an API user would make the calls and then
expect something to happen to the information.  I don't really know how to
proceed here and bow to your experience.

Patch 3

The change in TouchpointDataHandler#getTouchpointData prevents the creation of
of an extra null element in the ITouchPointData[] of TouchPoints that have
manifest data.  The original code reads the size attribute in the xml and
creates an ITouchPointData[].  However, in the new code we do not persist the
manifest instruction when we come across it.

Making the constant public was expedient for checking for an empty instruction.
 However, I did not appreciate the API change.  I could revert that change and
do a different empty check if you think that is more appropriate.
Comment 18 Dean Roberts CLA 2010-11-17 15:38:11 EST
Created attachment 183343 [details]
Build and install from repositories without manifest data

This single patch contains all manifest changes for head EXCEPT the optional patch of not reading existing manifest data from disk into memory.
Comment 19 Dean Roberts CLA 2010-11-17 15:40:11 EST
Created attachment 183344 [details]
Optional - Do not read existing manifest repo data into memory

We still need to decide if we want to include this part of the patch.  It will reduce memory footprint for new Eclipses reading repositories created by old Eclipses at the expense of possible confusion with IUs with same version and ID but different on disk representations.
Comment 20 Pascal Rapicault CLA 2010-11-20 14:25:59 EST
I looked at the patch, overall it looks good (some minor comments below). However I have a general question on how we want to handle the migration from any of the 3.6 versions to 3.7 if we are removing the manifest from the metadata repositories for 3.7.

Comments on the patch:
In the Util class, the code setting the BundleInfo version should use the version of the capability instead of the version of the IU. The modified code looks something like that:
			if (nameSpace.equals("osgi.bundle")) { //$NON-NLS-1$
				bundleInfo.setSymbolicName(capability.getName());
				bundleInfo.setVersion(capability.getVersion().toString()); <<-- NEW LINE
			} else if (nameSpace.equals("osgi.fragment")) { //$NON-NLS-1$
				String fragmentName = capability.getName();
				bundleInfo.setFragmentHost(getFragmentHost(unit, fragmentName));
				bundleInfo.setVersion(capability.getVersion().toString());
			}

Log an exception if we can't find the fragment host. I know this should not happen because we should have good metadata, but I think it worth doing the check.
Comment 21 DJ Houghton CLA 2010-12-02 15:10:25 EST
We have discussed this further and have decided to proceed with the following in HEAD:
1. release code in the eclipse touchpoint so it doesn't read the manifest from the touchpoint data
2. do NOT rev the repository version
3. do NOT change the metadata repository I/O code
4. change the publisher so rather than publishing the whole manifest, it only publishes the legacy attributes required by 1.

I will attach/release modified patches based on this desired behaviour.
Comment 22 DJ Houghton CLA 2010-12-02 15:16:14 EST
Created attachment 184394 [details]
Touchpoint no longer reads manifest from touchpoint data

Here is a patch (and changes to the tests) which makes the Eclipse touchpoint no longer read the manifest from the touchpoint data. It instead gets values (to create bundle info objects) from the IU itself. I have modified the patch from the original to include update copyrights and also include the changes outlined by Pascal in comment #20.
Comment 23 DJ Houghton CLA 2010-12-02 16:49:56 EST
Created attachment 184414 [details]
Publisher only publishes required attributes

Here is a patch which modifies the publisher so it only puts the bundle id, bundle version, and fragment host in the manifest stored in the touchpoint data. These are the only 3 properties required by old versions of the eclipse touchpoint to create BundleInfo objects.
Comment 24 DJ Houghton CLA 2010-12-02 16:50:49 EST
Both patches are released to HEAD.
Closing.
Comment 25 Dean Roberts CLA 2010-12-07 13:33:24 EST
The patches as finally released have no memory savings for a 3.7 Eclipse reading a 3.6 repository.

Benchmarking of the milestone candidate on 3.7 integration build repo shows a peak heap alloc of 84 MB and a retained alloc of 38 MB.

This is compared to the patch as originally submitted which shows a peak alloc of 37 MB and a retained alloc of 19 MB.

I find these numbers a bit disappointing.
Comment 26 Jeff McAffer CLA 2010-12-07 22:12:12 EST
disappointing yes.  Is there something we can do about that?  What where did we lose out?

For the record, what are the numbers from before the patch?  That is, what savings do we get to write home about with the current fix?  Did we get 80% of the savings? or 20%
Comment 27 Dean Roberts CLA 2010-12-08 09:11:52 EST
Actually it turns out the shared IU patch did not go into this release, so that accounts for some of the difference.

The manifest change that went in is a compromise required so that a 3.6 Eclipse can read a 3.7 repository.  We do expect to see better memory savings when a 3.7 Eclipse reads a 3.7 repository since the 3.7 repository will  have less manifest data stored in it.
Comment 28 Jeff McAffer CLA 2010-12-08 21:14:06 EST
OK. So then it will be interesting to see the numbers when loading the Indigo M4 repo. If I understand right, that should have the reduced manifest effect and thus the smaller footprint.  That combined with the upcoming IU sharing changes should make for noteworthy numbers.
Comment 29 Dean Roberts CLA 2010-12-09 10:42:10 EST
Well not exactly,

I checked with Kim and builds are done with a builder based on version n-1, where n is the Milestone being built.  So the M4 Indigo M4 milestone still has all the manifest data.  Additionally, even if M4 had the new minimum manifest data, it is still not an interesting repository since all earlier milestones where built with older builders, and would thus have maximum manifest data.

That being said, Kim was able to provide me with an M4 test build that she did with all the new bundles, so a repo with minimum manifest data.

To get some kind of measurement, I took this repo, duplicated it 10 times and hand crafted a composite repo containing the 10 copies.  I then took the M4 candidate build with maximum manifest data and did the same thing.

I then used the M4 candidate build to run my test repo loads on both of these repositories and analyzed the YourKit memory numbers.  This simulates a 3.7 build reading both a 3.7 repository and a 3.6 repository.

However, these results had one further problem.  Since the manifests in all 10 copies of the fake composite repo are indentical, the new StringPool masks the savings provided by the manifest fix.  Recall that the manifest fix is most helpful with large repositories of unrepeated IUs such as the Helios repository.

As a hack to simulate reading large unique repositories I modified the String.intern code to not intern the bundle manifest strings.

So you could argue that, at this point, any memory numbers I claim are meaningless ... but I think the result is a reasonable aproximation of the memory savings you will see reading a composite repository without large amounts of repeated data.

And the results are a peak heap usage decrease from 92 MB to 71 MB (23%), which leads to an allocated heap reduction of 128 MB to 100 MB (22%).  And once garbage collection is forced we see a reduction in retained memory of 48 MB to 39 MB (19%).

So is there time and do we want to report these kinds of numbers in New and Noteworthy.  I can get them to John if we do.

Just for the interested, once we add the Shared IUs fix to this kind of repository we will see a 75% reduction in retained memory.