Bug 532088 - Remove obsoleted hacks around String memory optimization
Summary: Remove obsoleted hacks around String memory optimization
Status: RESOLVED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.8   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.9 M3   Edit
Assignee: Andrey Loskutov CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-06 17:41 EST by Andrey Loskutov CLA
Modified: 2021-12-07 13:44 EST (History)
2 users (show)

See Also:


Attachments
test projects configured to use different JVM's (89.36 KB, application/x-zip-compressed)
2018-08-05 18:39 EDT, Andrey Loskutov CLA
no flags Details
Test results on different VM's (25.67 KB, text/plain)
2018-08-05 18:43 EDT, Andrey Loskutov CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Loskutov CLA 2018-03-06 17:41:29 EST
I've per occasion stumbled upon code introduced in bug 120410 in org.eclipse.jdt.internal.compiler.problem.DefaultProblemFactory.getLocalizedMessage(int, int, String[]).

return new String(output.toString());

This was needed in earlier (before Java 6) Java versions where StringBuffer shared its internal buffer with created String.

After that I've checked who else is doing some hacks around Strings and found some places in JDT which could be safely removed.

bug 73891 introduced WeakHashSet and JavaModelManager.intern(String) which is synchronized (!) and maintains a stringSymbols WeakHashSet with Strings created from given strings. This was needed before Java 7 added improved string pool. From Java 7 on it is most likely just another obsoleted hack (see http://java-performance.info/string-intern-in-java-6-7-8/).

So I would propose to revisit this old hacks around inefficient String code and remove them altogether. JDt is running on minimum Java 8, so we can trust the JVM will do the right things in this area, and we may get some performance increase (no need to call syncronized methods, decrease String reference numbers, remove our own WeakHashSet with its own performance issues on rehash() etc).
Comment 1 Christian Dietrich CLA 2018-03-07 02:30:26 EST
+1 for evaluating this.
Comment 2 Andrey Loskutov CLA 2018-08-05 07:08:27 EDT
See also bug 84872 optimizations, which are totally unneeded and can be safely today.

1) The String.substring() in 1.4 kept original char[] reference in the new String, which was of course not good. In 1.8+ it always copies the portion of char array.

So all constructs can be safely changed:

new String(s.substring(x)) -> s.substring(x)
new String(s.substring(x,y)) -> s.substring(x,y)

2) String(String) in 1.4 copied the original char array if the size was different, so by new String(String) one tried to avoid memory leaks. In 1.8+ new String(String) simply reuses the original pointers, so no gain here and we create new (wasted) String reference. 

So except few places where a *different* String *pointer* is required, all constructs can be safely changed:

new String(string) -> string
new String(stringBuffer.toString()) -> stringBuffer.toString()

For this two cases above we don't even need any measures, it is simply outdated and not reasonable code anymore.

Patch following.
Comment 3 Eclipse Genie CLA 2018-08-05 07:12:45 EDT
New Gerrit change created: https://git.eclipse.org/r/127073
Comment 5 Andrey Loskutov CLA 2018-08-05 18:39:21 EDT
Created attachment 275279 [details]
test projects configured to use different JVM's

(In reply to Andrey Loskutov from comment #0)
> bug 73891 introduced WeakHashSet and JavaModelManager.intern(String) which
> is synchronized (!) and maintains a stringSymbols WeakHashSet with Strings
> created from given strings. This was needed before Java 7 added improved
> string pool. From Java 7 on it is most likely just another obsoleted hack
> (see http://java-performance.info/string-intern-in-java-6-7-8/).

See also https://bugs.java.com/view_bug.do?bug_id=6962930 and https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6962931

The assumption is that the interned String objects are NOW garbage collected in Java 8+ JVM's - this was the original problem one tried to fix with JavaModelManager.intern() via bug 73891.

I've instrumented JavaModelManager to see which and how many String values are cached by JavaModelManager in my usual "Platform" workspace (containing 100 projects with ~65k files).

The JavaModelManager.stringSymbols cache was about 35000 elements big with the median length of 18 characters and the range from 1 to 153 characters.

I've created a naive test (see attachment) which creates lot of (pseudo-random) strings of different sizes and interns them via String.intern() or JavaModelManager.intern(String).

Results coming next.
Comment 6 Andrey Loskutov CLA 2018-08-05 18:43:45 EDT
Created attachment 275280 [details]
Test results on different VM's

I've run the test on all JVM's I could download from 1.4.1 to 10.0.2 (I haven't found 1.5 for Windows) on my Windows 10 notebook (32 GB RAM, 8 cores i7-8550u).

All VM's except 1.4 were running -Xmx4G, 1.4 was able to run with -Xmx1024m only :-).

Note, that Strings were put on permgen before 1.7, so the heap use we see in the test is not accurate for those JVM's.

Note to the JavaModelManager.intern(String): this code uses hand made WeakHashSet to store Strings.

In the test I've also checked how the default WeakHashMap would work - the numbers are almost identical.

"nada" is the test which does nothing with Strings.
"keep in map" is the test which puts all strings in the usual HashMap.
"JDT::intern" is the default JDT implementation ujsing custom weak table.
"JDT::intern2" is the implementation ujsing default WeakHashMap
"String::intern" is just interning Strings via String.intern()
 
Discussion follows.
Comment 7 Manoj N Palat CLA 2018-08-30 00:45:25 EDT
Bulk move out of 4.9
Comment 8 Andrey Loskutov CLA 2018-10-23 17:42:08 EDT
See also https://shipilev.net/jvm-anatomy-park/10-string-intern/

Looking on the numbers I had measured, there is no clear win-win. Without further investigation with some more *real* test case (not just synthetic benchmark) it is hard to say which way is the right one, especially that in most real cases the performance impact of other JDT code is much more higher. So one need some really crazy scenario where the String cache is a bottle neck.

From what I see there is at least no strong performance drop, so at least it seem OK to keep the old JDT way of String interning.

Since I was already done with other parts, I think we can close this one. Feel free to reopen if there is a real use case which can benefit from a change.