Bug 329384 - Reduce memory footprint of Composite Repositories
Summary: Reduce memory footprint of Composite Repositories
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.7 M5   Edit
Assignee: Dean Roberts CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on: 324873 329385 329386 330463 331762
Blocks: 333894
  Show dependency tree
 
Reported: 2010-11-03 14:25 EDT by Dean Roberts CLA
Modified: 2011-01-19 17:01 EST (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Roberts CLA 2010-11-03 14:25:00 EDT
Build Identifier: 

I have been doing some memory footprint profiling on composite repositories.  So far I have found three strategies that should produce significant footprint savings.

For profiling I have been considering two different composite repository scenarios.

1) Helios.  A repository with few duplicate IUs but many, many IUs in general.
2) Eclipse I-build repository just before 3.7 M3 declared.  Many duplicate IUs.

I would be interested in looking at problematic repositories that may have different characteristics than these.  If anybody in the community has repositories they think should be considered I would like to hear about them.

The three footprint reduction strategies that seem immediately promising are:

1) Sharing IUs across a CompositeRepositoriy's children.
2) Sharing StringPool across a CompositeRepository's children
3) Optimizing manifest TouchPointData

After performing a composite repository and a forced garbage collection heap size was increased by the following:
	Helios:	36 Meg
	I-Build:	28 Meg

Testing and analysis indicates the following strategies should have these approximate effects
	
1) Shared IUs
	Helios	8 Meg
	I-Build	21 Meg

2) Adding Optimizing manifest TouchPointData on top of shared IUs
	Helios 10 Meg additional savings
	I-Build  1 Meg additional savings

3) Adding shared StringPool on top of #1 & #2
	Helios 3 Meg additional savings
	I-Build 2 Meg additional savings
	
In total the combined strategies lead to the following savings:
	Helios:	20.9 Meg (57%) reduction for a retained size of 15.6 Meg
	I-Build:	24 Meg (84%) reduction for a retained size of 4.6 Meg

As well as retained used heap savings the strategies conserved allocated heap space which has other benefits such as decreased GC time.
	Helios: 164 Meg max heap allocated down to 122 Meg.
	I-Build 86 Meg max heap allocated down to 40 Meg.

From this investigation I propose we attempt all three strategies since each strategy has differing impacts on the different repository types.  I will open up a new dependant bug report for each strategy since some strategies have implementation details that need to be investigated and fleshed out.





Reproducible: Always
Comment 1 Pascal Rapicault CLA 2010-11-08 16:18:11 EST
There is the babel repo: www.eclipse.org/babel/downloads.php
Comment 2 Dean Roberts CLA 2010-11-15 10:36:05 EST
DJ, John and myself had a discussion about the compatibility impact of proposed changes to address this defect.

There will be three patches associated with this change.

Patch 1) Change code such that manifest data is not persisted to repositories
         during a build and increment the repository version so that 
         unpatched products don't fail during install with a missing manifest 
         error.

Patch 2) Modify install code such that manifest data is not required during
         install and increase the required version range so that patched 
         products can read the new repository version.

Patch 3) Modify code so that manifest data contained in repositories created 
         by unpatched products is not persisted in memory.

During the discussion there was an argument made for omitting patch 3.  Without patch 3 the memory savings will only be enjoyed by new Eclipse versions reading repositories created by new Eclipse versions. 

The primary motivation for omitting patch 3 was a concern over repositories containing IUs with the same ID and version having but having different physical representations on disk.  This happens, for example, when a patched eclipse mirrors a repository with manifest data in it.  The repository is read but the manifest data is ignored.  When the in memory IU is persisted it is persisted without the manifest data.

Personally I believe we should include patch 3 but welcome input from the community.  My arguments for inclusion are:

1) The change is minor (1 line)

2) What is important is the model representation of the IU.  With the patches
   in place the manifest data is ignored.  Thus the IU is conceptually 
   identical regardless of whether it was read from bytes that contained or 
   did not contain manifest information.

3) I believe a new Eclipse reading older repositories will not be a rare 
   occurance and the memory savings should be as widely available as possible.

Assuming we proceed, the following staging is proposed.

1) Release patch 1, 2 and 3 to 3.7 HEAD
2) Back port patch 2 to older streams.  3.6 for sure ... how far back do
   we want to go?  Suggestions please.

Patches 1 and 3 could never be released to older Eclipse streams since we would not want to end up in a situation where repositories built by a 3.6.x Eclipse would be unable to update a 3.6.(x - n) product.  Presumably particular customers could take these patches if they where required and the ramifications understood.

Once I add repository version number changes and checking and get some feed back from this comment I will attach the three patches discussed.
Comment 3 DJ Houghton CLA 2010-11-15 14:30:11 EST
To be clear, the changes that Dean proposes in Patch 3 would only benefit 3.7 clients who are reading pre-3.7 repositories. 

Pros: Allows 3.7 clients to read older repositories. Could be important to products built on Eclipse.
Cons: Questions about bundle uniqueness. When doing mirroring, etc you can end up in a state with 2 IUs with the same id and version but different touchpoint data. Is this invalid?

Comments from people about their use in delivering repositories would be helpful in deciding whether or not to release this part of the code.
Comment 4 Dean Roberts CLA 2010-11-16 10:10:54 EST
Pasted Comment #2 and Comment #3 into correct defect: https://bugs.eclipse.org/bugs/show_bug.cgi?id=329386
Comment 5 Dean Roberts CLA 2010-11-17 10:02:56 EST
Posted the following question to the p2-dev mailing list about the use of uncompressed metadata repositories and the impact of license text on their size.

=====

Hi folks,

I am trying go get a feel for how widely used uncompressed metadata repositories are.

A typical content.xml file contains many copies of identical license text.  Memory use is not an issue since the implementation uses a StringPool to extern string references.  Compressed repositories do not present an issue for disk foot print or network traffic as the content.xml compresses extremely well, typically 95% or more.

However, uncompressed metadata repositories may pose a significant concern here if they are widely used.

So does anybody have an opinion on how widely used uncompressed repositories are?

Thanks
Comment 6 DJ Houghton CLA 2011-01-19 17:01:58 EST
I think this can be closed now since all dependent bug reports have been closed. If there are remaining outstanding issues then please re-open or open a new report.