[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [equinox-dev] interning strings in RegistryCacheReader


There are two notions of interning at play here.  There is String.intern() and then there is the uniquification(tm) of objects read from the registry.  You will note that in readCachedString() there is a case for the value coming in from the stream being an index.  This allows for uniquification of Strings within the registry itself.  The net result is that on second run, all strings written and read using the cached version of the method are written/read once.  There is then a separate choice as to whether or not that string should also be interned using String.intern().  

When you look through the objects using YourKit, see if you can compare identity to see if the strings that appear as duplicates, really are duplicates.  In many cases I would expect up to about 3 copies of the string between ones read from files, ones in constants etc.  If you start seeing more than that, it gets interesting.

More generally, we are very much interested in ideas for how to improve this model.  IMHO it was flawed from the outset but we did not notice until it was too late.  3.0 maintains soft references to configuration elements so at least some of this stuff goes away but in many cases (in the UI in particular), people keep pointers to the registry structure.  This greatly inhibits the runtime's ability to manage the space.  Our challenge now is to a) come up with a better model and b) introduce it in a compatible way.

Please (please!) contribute ideas if you have them.

Jeff



"Ed Burnette" <Ed.Burnette@xxxxxxx>
Sent by: equinox-dev-admin@xxxxxxxxxxx

07/21/2004 12:42 PM

Please respond to
equinox-dev

To
<equinox-dev@xxxxxxxxxxx>
cc
Subject
[equinox-dev] interning strings in RegistryCacheReader





I've been looking at Eclipse startup in YourKit 3.0 beta and about half of the memory used is taken up with Strings. I looked at the strings and the same strings are repeated over and over again, for example "org.eclipse.ui.defaultAcceleratorConfiguration". I traced this back to the org.eclipse.core.internal.registry.RegistryCacheReader class. It has two methods, readString() and readCachedString() which take an 'intern' boolean parameter that would cause String.intern() to be called on the strings, thus eliminating the dups. Only a few callers pass true to the functions though.

http://www.eclipse.org/eclipse/development/performance/bloopers.html talks about this a little and it says "On some JVM implementations the performance of intern() degrades dramatically. Interning the registry strings eagerly and early seeds the intern() table increasing the collision rate". This makes it sound like at some point in the past, somebody tried using intern() all the time and didn't like the results.  Can anybody shed some light on the design decision not to use intern() and whether or not this caveat is still true?