[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [equinox-dev] interning strings in RegistryCacheReader


duplicates or triplicates I would not be surprised in cases that intern() is not called.  What we were really trying to get rid of was the 100s of copies of some really common strings (like extension point ids which occur in every contributed extension).  This is the uniquification step.  Anyway, this area is interesting to explore....

For more string fun, take a look at the story wrt ResourceBundles.  Sigh.  This one at least has some tractable solutions.

As for slow GC, this is more likely because all the memory is swapped out (if you leave a program sit a while it slowly decays as other stuff is swapped in).  It seems that the nature of Java VMs or GCs is that they painfully swap individual pages back in to do a global GC.  You can watch the virtual memory counter slowly tick up by 1K or so as each page is swapped in.  double sigh.

Jeff

p.s., I suspect tha the index store related strings will go away after some GC work.



"Ed Burnette" <Ed.Burnette@xxxxxxx>
Sent by: equinox-dev-admin@xxxxxxxxxxx

07/21/2004 04:52 PM

Please respond to
equinox-dev

To
<equinox-dev@xxxxxxxxxxx>
cc
Subject
RE: [equinox-dev] interning strings in RegistryCacheReader





There must be a lot of uniquification going on because when I changed every call to set the intern flag to true it only saved a small amount. YourKit 3.0 beta says there were 4182 fewer String objects for a total savings of 388K (which left 189,331 String objects for a total of 18M). So... interning in RegistryCacheReader may not be worth it.
 
With the tool I can only see the first 500 strings values, but without the intern change I can eyeball dozens of duplicates of these strings near the top:
 
org.eclipse.ui.defaultAcceleratorConfiguration
org.eclipse.ui.contexts.dialogAndWindow
org.eclipse.ui.textEditorScope
org.eclipse.jdt.ui.javaEditorScope
 
Looking at this small sample, maybe uniquification is not working for KeySequenceBindingDefinition's.
 
(I wish this tool would do a report of duplicate Strings; maybe they will accept an enhancement request.)
 
Looking at the memory dump more closely, the org.eclipse.core package was only responsible for 39% of the memory usage on startup. jdt was 32%, pde 13%, ui 5%, osgi 3%, jface 2%, and swt 1%.
 
Most of core's memory is in the indexing package which is at 29% (11M just in core.internal.indexing.Buffer), resources at 7% (ResourceInfo, ProjectInfo), dtree at 5% (DeltaDataTree).
 
Most of jdt is in the corext TypeInfo array at 19%, and related things like JavaModelCache 6%. The jdt ui only takes 4%, mostly in the javaeditor ASTProvider.
 
Most of pde is in the Plugin extensions vector at 10%. 244 Plugin classes took 5.3M. Most of that (8% of total) was in the PluginElement attributes Hashtable, 4% (1.8M) in the value strings for PluginAttribute.
 
This is just the live objects once startup is done. I didn't look at garbage collected objects.
 
I don't know if this is info you already have or not but I hope it's helpful. I just started looking at this to try and figure out why a) my workbench takes so long to come up, and b) why it is so slow doing garbage collection when I haven't used it in a while no matter what memory options I use.
 
-----Original Message-----
From:
equinox-dev-admin@xxxxxxxxxxx [mailto:equinox-dev-admin@xxxxxxxxxxx] On Behalf Of Jeff McAffer
Sent:
Wednesday, July 21, 2004 2:10 PM
To:
equinox-dev@xxxxxxxxxxx
Subject:
Re: [equinox-dev] interning strings in RegistryCacheReader


There are two notions of interning at play here.  There is String.intern() and then there is the uniquification(tm) of objects read from the registry.  You will note that in readCachedString() there is a case for the value coming in from the stream being an index.  This allows for uniquification of Strings within the registry itself.  The net result is that on second run, all strings written and read using the cached version of the method are written/read once.  There is then a separate choice as to whether or not that string should also be interned using String.intern().  


When you look through the objects using YourKit, see if you can compare identity to see if the strings that appear as duplicates, really are duplicates.  In many cases I would expect up to about 3 copies of the string between ones read from files, ones in constants etc.  If you start seeing more than that, it gets interesting.


More generally, we are very much interested in ideas for how to improve this model.  IMHO it was flawed from the outset but we did not notice until it was too late.  3.0 maintains soft references to configuration elements so at least some of this stuff goes away but in many cases (in the UI in particular), people keep pointers to the registry structure.  This greatly inhibits the runtime's ability to manage the space.  Our challenge now is to a) come up with a better model and b) introduce it in a compatible way.


Please (please!) contribute ideas if you have them.


Jeff