duplicates or triplicates I would not
be surprised in cases that intern() is not called. What we were really
trying to get rid of was the 100s of copies of some really common strings
(like extension point ids which occur in every contributed extension).
This is the uniquification step. Anyway, this area is interesting
For more string fun, take a look at
the story wrt ResourceBundles. Sigh. This one at least has
some tractable solutions.
As for slow GC, this is more likely
because all the memory is swapped out (if you leave a program sit a while
it slowly decays as other stuff is swapped in). It seems that the
nature of Java VMs or GCs is that they painfully swap individual pages
back in to do a global GC. You can watch the virtual memory counter
slowly tick up by 1K or so as each page is swapped in. double sigh.
p.s., I suspect tha the index store
related strings will go away after some GC work.
<Ed.Burnette@xxxxxxx> Sent by: equinox-dev-admin@xxxxxxxxxxx
07/21/2004 04:52 PM
Please respond to
RE: [equinox-dev] interning
strings in RegistryCacheReader
There must be a lot of uniquification
going on because when I changed every call to set the intern flag to true
it only saved a small amount. YourKit 3.0 beta says there were 4182 fewer
String objects for a total savings of 388K (which left 189,331 String objects
for a total of 18M). So... interning in RegistryCacheReader may not be
With the tool I can only see the
first 500 strings values, but without the intern change I can eyeball dozens
of duplicates of these strings near the top:
Looking at this small sample,
maybe uniquification is not working for KeySequenceBindingDefinition's.
(I wish this tool would do a report
of duplicate Strings; maybe they will accept an enhancement request.)
Looking at the memory dump more
closely, the org.eclipse.core package was only responsible for 39% of the
memory usage on startup. jdt was 32%, pde 13%, ui 5%, osgi 3%, jface 2%,
and swt 1%.
Most of core's memory is in the
indexing package which is at 29% (11M just in core.internal.indexing.Buffer),
resources at 7% (ResourceInfo, ProjectInfo), dtree at 5% (DeltaDataTree).
Most of jdt is in the corext TypeInfo
array at 19%, and related things like JavaModelCache 6%. The jdt ui only
takes 4%, mostly in the javaeditor ASTProvider.
Most of pde is in the Plugin extensions
vector at 10%. 244 Plugin classes took 5.3M. Most of that (8% of total)
was in the PluginElement attributes Hashtable, 4% (1.8M) in the value strings
This is just the live objects
once startup is done. I didn't look at garbage collected objects.
I don't know if this is info you
already have or not but I hope it's helpful. I just started looking at
this to try and figure out why a) my workbench takes so long to come up,
and b) why it is so slow doing garbage collection when I haven't used it
in a while no matter what memory options I use.
From: equinox-dev-admin@xxxxxxxxxxx [mailto:equinox-dev-admin@xxxxxxxxxxx]
On Behalf Of Jeff McAffer
Sent: Wednesday, July 21, 2004 2:10 PM
Subject: Re: [equinox-dev] interning strings in RegistryCacheReader
There are two notions of interning at play here. There is String.intern()
and then there is the uniquification(tm) of objects read from the registry.
You will note that in readCachedString() there is a case for the
value coming in from the stream being an index. This allows for uniquification
of Strings within the registry itself. The net result is that on
second run, all strings written and read using the cached version of the
method are written/read once. There is then a separate choice as
to whether or not that string should also be interned using String.intern().
When you look through the objects using YourKit, see if you can compare
identity to see if the strings that appear as duplicates, really are duplicates.
In many cases I would expect up to about 3 copies of the string between
ones read from files, ones in constants etc. If you start seeing
more than that, it gets interesting.
More generally, we are very much interested in ideas for how to improve
this model. IMHO it was flawed from the outset but we did not notice
until it was too late. 3.0 maintains soft references to configuration
elements so at least some of this stuff goes away but in many cases (in
the UI in particular), people keep pointers to the registry structure.
This greatly inhibits the runtime's ability to manage the space.
Our challenge now is to a) come up with a better model and b) introduce
it in a compatible way.
Please (please!) contribute ideas if you have them.