Bug 227986

Summary: Avoid duplicated strings in Java model
Product: [Eclipse Project] JDT Reporter: Martin Aeschlimann <martinae>
Component: CoreAssignee: Jerome Lanneluc <jerome_lanneluc>
Status: VERIFIED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: benno.baumgartner, caniszczyk, daniel_megert, david_audel, markus.kohler, mlists, philippe_mulet, sw
Version: 3.4Keywords: performance
Target Milestone: 3.5 M4   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Attachments:
Description Flags
Proposed fix none

Description Martin Aeschlimann CLA 2008-04-21 07:11:23 EDT
20080421

Using Yourkit 7.0 on my development workspace, I found that there where 2'600 instances of the string 'refactoring' in memory, taking 2.7 MB.

The references to this string came almost all from instances of type IPackageFragment. Each instanceof of a IPackageFragment seems to have it's own string instance for each package name segment.

When creating a IPackageFragment maybe you can use the string from the IPath of the underlying resource: the resource model already makes sure that all IPath elements share their segment Strings.
Comment 1 Martin Aeschlimann CLA 2008-04-21 07:15:18 EDT
Other duplicates are 'org' (1.8 MB waste), 'eclipse' (1.9 MB), 'jdt' (1.6 MB waste), 'ui' (1.5 MB), 'internal' (1.1 MB) ..

So a fix in this area could really pay off...
Comment 2 Jerome Lanneluc CLA 2008-04-23 09:43:25 EDT
Martin, do you have more details on your scenario? Interning Strings has a cost, and I don't want to slow down every IPackageFragment handle creation. So I will optimize the space only if this is not at the cost of the speed. This is why I need to know more on you scenario, so that I know how the IPackageFragments that you saw are created.
Comment 3 Martin Aeschlimann CLA 2008-04-23 10:37:00 EDT
The memory snapshot I took was after a day of work. So I can't say it exactly. But just before taking the snapshot I was doing searches in the jdt.ui source code, for example searching for 'IJavaElement.getElementName', and looking at all search results in the search view. Hope this helps.

If I find some time I'll try to construct some steps.


Comment 4 Markus Kohler CLA 2008-05-20 05:00:23 EDT
Hi all,
I can help you out with this. 
See my blog at http://kohlerm.blogspot.com/2008/05/analyzing-memory-consumption-of-eclipse.html of how to analyze this with the Eclipse Memory Analyzer. 

If Martin could provide us an hprof heap dump, the analysis should be easy. 

Regards,
Markus
Comment 5 Jerome Lanneluc CLA 2008-09-02 05:23:57 EDT
Steps or a hprof heap dump are still needed
Comment 6 Jerome Lanneluc CLA 2008-11-26 10:41:21 EST
I was able to find a case where package fragments hold duplicate strings. To observe this I took a snapshot of my development workspace using YourKit 7.5.11, and I ran the "Duplicate Strings" inspections. It showed that "jdt" and "org" was mostly duplicate.

After investigation, it appears that NameLookup would create instances of PackageFragment with the String[] resulting of splitting the package name, instead of reusing the String[] from the packageFragments cache.
Comment 7 Jerome Lanneluc CLA 2008-11-26 10:43:15 EST
Created attachment 118802 [details]
Proposed fix

Note that no regression tests can be written for memory improvements. So to verify, either run Yourkit's inspection, or check the code.
Comment 8 Jerome Lanneluc CLA 2008-11-27 07:13:44 EST
Fix released for 3.5M4
Comment 9 David Audel CLA 2008-12-09 10:07:49 EST
Verified for 3.5M4 using build I20081208-1800