[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [equinox-dev] interning strings in RegistryCacheReader

Title: Message
There must be a lot of uniquification going on because when I changed every call to set the intern flag to true it only saved a small amount. YourKit 3.0 beta says there were 4182 fewer String objects for a total savings of 388K (which left 189,331 String objects for a total of 18M). So... interning in RegistryCacheReader may not be worth it.
With the tool I can only see the first 500 strings values, but without the intern change I can eyeball dozens of duplicates of these strings near the top:
Looking at this small sample, maybe uniquification is not working for KeySequenceBindingDefinition's.
(I wish this tool would do a report of duplicate Strings; maybe they will accept an enhancement request.)
Looking at the memory dump more closely, the org.eclipse.core package was only responsible for 39% of the memory usage on startup. jdt was 32%, pde 13%, ui 5%, osgi 3%, jface 2%, and swt 1%.
Most of core's memory is in the indexing package which is at 29% (11M just in core.internal.indexing.Buffer), resources at 7% (ResourceInfo, ProjectInfo), dtree at 5% (DeltaDataTree).
Most of jdt is in the corext TypeInfo array at 19%, and related things like JavaModelCache 6%. The jdt ui only takes 4%, mostly in the javaeditor ASTProvider.
Most of pde is in the Plugin extensions vector at 10%. 244 Plugin classes took 5.3M. Most of that (8% of total) was in the PluginElement attributes Hashtable, 4% (1.8M) in the value strings for PluginAttribute.
This is just the live objects once startup is done. I didn't look at garbage collected objects.
I don't know if this is info you already have or not but I hope it's helpful. I just started looking at this to try and figure out why a) my workbench takes so long to come up, and b) why it is so slow doing garbage collection when I haven't used it in a while no matter what memory options I use.
-----Original Message-----
From: equinox-dev-admin@xxxxxxxxxxx [mailto:equinox-dev-admin@xxxxxxxxxxx] On Behalf Of Jeff McAffer
Sent: Wednesday, July 21, 2004 2:10 PM
To: equinox-dev@xxxxxxxxxxx
Subject: Re: [equinox-dev] interning strings in RegistryCacheReader

There are two notions of interning at play here.  There is String.intern() and then there is the uniquification(tm) of objects read from the registry.  You will note that in readCachedString() there is a case for the value coming in from the stream being an index.  This allows for uniquification of Strings within the registry itself.  The net result is that on second run, all strings written and read using the cached version of the method are written/read once.  There is then a separate choice as to whether or not that string should also be interned using String.intern().  

When you look through the objects using YourKit, see if you can compare identity to see if the strings that appear as duplicates, really are duplicates.  In many cases I would expect up to about 3 copies of the string between ones read from files, ones in constants etc.  If you start seeing more than that, it gets interesting.

More generally, we are very much interested in ideas for how to improve this model.  IMHO it was flawed from the outset but we did not notice until it was too late.  3.0 maintains soft references to configuration elements so at least some of this stuff goes away but in many cases (in the UI in particular), people keep pointers to the registry structure.  This greatly inhibits the runtime's ability to manage the space.  Our challenge now is to a) come up with a better model and b) introduce it in a compatible way.

Please (please!) contribute ideas if you have them.