Bug 329385 - [repository] Share StringPool for Composite Repositories
Summary: [repository] Share StringPool for Composite Repositories
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.7 M4   Edit
Assignee: Dean Roberts CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 329384 331762 333894
  Show dependency tree
 
Reported: 2010-11-03 14:32 EDT by Dean Roberts CLA
Modified: 2011-01-10 15:54 EST (History)
3 users (show)

See Also:


Attachments
Replace StringPool implementation with String.intern() (5.59 KB, patch)
2010-11-19 11:20 EST, Dean Roberts CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Roberts CLA 2010-11-03 14:32:14 EDT
Build Identifier: 

Currently MetaDataRepository uses StringPool to intern shared strings.  However, the entry point is on XMLParser and results in each child repository of a CompositeRepository having its own copy of duplicate strings.

Additional savings can be gained by using the same StringPool to intern the strings for all child repositories.

Reproducible: Always
Comment 1 Dean Roberts CLA 2010-11-19 11:20:02 EST
Created attachment 183481 [details]
Replace StringPool implementation with String.intern()

Through empirical testing and discussions with GC team members it is appropriate to use String.intern() instead of our own StringPool implementation.  This has the effect of sharing the pooled strings amongst all repositories in a composite repository.

Although the Java Language Specification is silent on whether interned Strings are garbage collected all modern VMs do, in fact, garbage collect interned Strings.  The IBM 1.4.2 VM started this practice in 2005.  The Sun vms where doing it prior to this.

Interned Strings are placed immediately in OLD space requiring a major collection to clean them up. However, given the long lived (minutes) nature of repository strings even our StringPool implementation would have ended up in OLD space, especially considering the heap churn evident during repository loads.

As a final check I analyzed pre and post patch test runs using YourKit on 32 and 64 bit flavours of Java 5, and Java 6 Sun and IBM VMs.  (YOurKit did not like the 1.4.2).  There was no unwanted GC behaviour during the runs or significant changes in execution time.
Comment 2 Dean Roberts CLA 2010-11-19 11:24:35 EST
I would suggest that this fix is a good candidate for back porting to the 3.6 stream.

It is a trivial fix and will provide significant memory savings.  Especially in 3.6 where we are unlikely to back port the SharedIU change.

In loading the 3.7 I-Build repository it saves 11 Meg, 25% of retained memory.  But perhaps more importantly it saves 50% or 90 Meg of heap expansion during the load.
Comment 3 Pascal Rapicault CLA 2010-11-20 13:22:20 EST
This is a very interesting change since it allows for sharing across all the repos (and more), therefore the gain is potentially huge!
Back in 2003/2004 we had attempted to use this and the performance at the time were rather deceiving both from a point of view of time to add to intern and GC (not GC'ed at the time).
Obviously things have changed since then, so if we are confident about the speed and the GC characteristics, then I don't why we should refrain from using this and also backport this in 3.6.x.
Comment 4 Ian Bull CLA 2010-11-20 13:43:25 EST
I looked a the patch quickly (didn't apply it, just tried to do the patch math in my head), and it appears that we are doing the intern() on all strings (versions, IDs, names, etc...).  I'm not complaining about that approach, but I wonder if there is any reason to only intern() strings that we know are likely to be duplicated?
Comment 5 Pascal Rapicault CLA 2010-11-20 13:48:50 EST
The patch just change the invocation from canonicalize to intern, to that end the logic of what is 'shared' is unmodified before / after this patch. We could likely try to be smarter about what we end up sharing or not by making the decision of interning higher up in the call hierarchy but this is the scope of another bug.
Comment 6 DJ Houghton CLA 2010-12-03 08:57:00 EST
Released patch (with updated copyright stmts) to HEAD.