Community
Participate
Working Groups
I've per occasion stumbled upon code introduced in bug 120410 in org.eclipse.jdt.internal.compiler.problem.DefaultProblemFactory.getLocalizedMessage(int, int, String[]). return new String(output.toString()); This was needed in earlier (before Java 6) Java versions where StringBuffer shared its internal buffer with created String. After that I've checked who else is doing some hacks around Strings and found some places in JDT which could be safely removed. bug 73891 introduced WeakHashSet and JavaModelManager.intern(String) which is synchronized (!) and maintains a stringSymbols WeakHashSet with Strings created from given strings. This was needed before Java 7 added improved string pool. From Java 7 on it is most likely just another obsoleted hack (see http://java-performance.info/string-intern-in-java-6-7-8/). So I would propose to revisit this old hacks around inefficient String code and remove them altogether. JDt is running on minimum Java 8, so we can trust the JVM will do the right things in this area, and we may get some performance increase (no need to call syncronized methods, decrease String reference numbers, remove our own WeakHashSet with its own performance issues on rehash() etc).
+1 for evaluating this.
See also bug 84872 optimizations, which are totally unneeded and can be safely today. 1) The String.substring() in 1.4 kept original char[] reference in the new String, which was of course not good. In 1.8+ it always copies the portion of char array. So all constructs can be safely changed: new String(s.substring(x)) -> s.substring(x) new String(s.substring(x,y)) -> s.substring(x,y) 2) String(String) in 1.4 copied the original char array if the size was different, so by new String(String) one tried to avoid memory leaks. In 1.8+ new String(String) simply reuses the original pointers, so no gain here and we create new (wasted) String reference. So except few places where a *different* String *pointer* is required, all constructs can be safely changed: new String(string) -> string new String(stringBuffer.toString()) -> stringBuffer.toString() For this two cases above we don't even need any measures, it is simply outdated and not reasonable code anymore. Patch following.
New Gerrit change created: https://git.eclipse.org/r/127073
Gerrit change https://git.eclipse.org/r/127073 was merged to [master]. Commit: http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=fdff0425910a362eb56ee6287e38ac65a2bbe647
Created attachment 275279 [details] test projects configured to use different JVM's (In reply to Andrey Loskutov from comment #0) > bug 73891 introduced WeakHashSet and JavaModelManager.intern(String) which > is synchronized (!) and maintains a stringSymbols WeakHashSet with Strings > created from given strings. This was needed before Java 7 added improved > string pool. From Java 7 on it is most likely just another obsoleted hack > (see http://java-performance.info/string-intern-in-java-6-7-8/). See also https://bugs.java.com/view_bug.do?bug_id=6962930 and https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6962931 The assumption is that the interned String objects are NOW garbage collected in Java 8+ JVM's - this was the original problem one tried to fix with JavaModelManager.intern() via bug 73891. I've instrumented JavaModelManager to see which and how many String values are cached by JavaModelManager in my usual "Platform" workspace (containing 100 projects with ~65k files). The JavaModelManager.stringSymbols cache was about 35000 elements big with the median length of 18 characters and the range from 1 to 153 characters. I've created a naive test (see attachment) which creates lot of (pseudo-random) strings of different sizes and interns them via String.intern() or JavaModelManager.intern(String). Results coming next.
Created attachment 275280 [details] Test results on different VM's I've run the test on all JVM's I could download from 1.4.1 to 10.0.2 (I haven't found 1.5 for Windows) on my Windows 10 notebook (32 GB RAM, 8 cores i7-8550u). All VM's except 1.4 were running -Xmx4G, 1.4 was able to run with -Xmx1024m only :-). Note, that Strings were put on permgen before 1.7, so the heap use we see in the test is not accurate for those JVM's. Note to the JavaModelManager.intern(String): this code uses hand made WeakHashSet to store Strings. In the test I've also checked how the default WeakHashMap would work - the numbers are almost identical. "nada" is the test which does nothing with Strings. "keep in map" is the test which puts all strings in the usual HashMap. "JDT::intern" is the default JDT implementation ujsing custom weak table. "JDT::intern2" is the implementation ujsing default WeakHashMap "String::intern" is just interning Strings via String.intern() Discussion follows.
Bulk move out of 4.9
See also https://shipilev.net/jvm-anatomy-park/10-string-intern/ Looking on the numbers I had measured, there is no clear win-win. Without further investigation with some more *real* test case (not just synthetic benchmark) it is hard to say which way is the right one, especially that in most real cases the performance impact of other JDT code is much more higher. So one need some really crazy scenario where the String cache is a bottle neck. From what I see there is at least no strong performance drop, so at least it seem OK to keep the old JDT way of String interning. Since I was already done with other parts, I think we can close this one. Feel free to reopen if there is a real use case which can benefit from a change.