Bug 265775 - [performance] The XML editor uses a lot of memory
Summary: [performance] The XML editor uses a lot of memory
Status: CLOSED FIXED
Alias: None
Product: WTP Source Editing
Classification: WebTools
Component: wst.xml (show other bugs)
Version: 3.1   Edit
Hardware: All All
: P2 enhancement (vote)
Target Milestone: 3.2 M7   Edit
Assignee: Nick Sandonato CLA
QA Contact: Nitin Dahyabhai CLA
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2009-02-22 17:58 EST by Valentin Baciu CLA
Modified: 2010-04-12 21:13 EDT (History)
3 users (show)

See Also:


Attachments
patch (9.30 KB, patch)
2010-01-06 17:30 EST, Nick Sandonato CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Valentin Baciu CLA 2009-02-22 17:58:47 EST
The SSE/XML editor provides great functionality but this seems to come with a hefty price: the editor is quite the memory gourmand. 

I tried to determine why the editor uses so much memory when editing large XML files. I found that even the text editor has a significant footprint, and that footprint grows quite a bit when the backing model is SSE/XML. I measured the memory usage empirically with the Eclipse heap monitor, so while they may not be entirely accurate the following numbers should be representative.

Text editor (Plain Eclipse SDK, no WTP/SSE/XML)

Size on disk (MB) 	1.5	10.9
Heap before (MB)	22	22
Heap after (MB)  	33	99
Growth (MB)	 	11	77
Times bigger		7.3	7.6

Text editor with SSE/XML model backing

Size on disk (MB) 	1.5	10.9
Heap before (MB)	23	23
Heap after (MB)  	57	300
Growth (MB)	 	34	273
Times bigger		22	25

I have also profiled a bit to find out what's taking so much memory. The snapshots are quite large and I won't attach them here. At a first look, the biggest hitters are:
- 3 of what appear to be redundant copies of the original document content 
- the linked list with the document regions.

Some things I noted looking at the code:
- the attribute name region could probably use a short vs. an int (not sure it will help much but the other region types seem to use short)
- the element/attribute names are stored as strings and in many large XML documents there are many duplicates. Assuming they are not cached somewhere else, perhaps some cache would be in order? 
- the growth strategy for the region list could probably be optimized. For example when an attribute name region is encountered the array could grow by three automatically and probably similar for tags. There also seems to be a lot of array copying going on there.

I wonder if the entire design could not be improved to avoid creating all these little objects (linked list with region with list with array with regions).

Should you decide to do more profiling, make sure to get a recent platform build to avoid being distracted by bug 265449.
Comment 1 David Carver CLA 2009-02-25 19:04:53 EST
Well, if we are looking at performance, another option long term is to look at using an EMF/CDO backed model.   CDO is designed for efficient handling of large models.   It would be of great benefit for handling fairly large XML documents.  It's not uncommon to have to look at files in 500MB to 2MB range.

Just a thought.   You might also want to look into an indexing option as well to handle some of the lookup information.

Comment 2 Nick Sandonato CLA 2010-01-06 17:30:05 EST
Created attachment 155460 [details]
patch

Attaching a patch that takes some strides towards reducing the memory footprint of the XML editor a bit. Based on some Yourkit profiling that I conducted, I tried to clean up things like redundant strings, references that were null 100% of the time, and strings that were held onto past their use.

The file I used was 993200 bytes.

Without patch:
Retained size (initial): 26814448 bytes
Retained size (final): 67967488 bytes
Growth: 41153040 bytes

With patch:
Retained size (initial): 26834828 bytes
Retained size (final): 64022316 bytes
Growth: 37187488 bytes

Result: 9.7% decrease in retained memory size.
Comment 3 David Carver CLA 2010-01-06 18:28:09 EST
You might also want to review the FindBugs report, and a couple of the patches I supplied to help clean some of these up.  Particularly areas where variables are assigned but never read, or areas where null will always be the case.

Many cases we just have duplicate code as well, which could be cleaned up and help reduce memory footprint in the variables that are used and created.

The hudson build instance runs the find bugs reports with each build.


(In reply to comment #2)
> Created an attachment (id=155460) [details]
> patch
> 
> Attaching a patch that takes some strides towards reducing the memory footprint
> of the XML editor a bit. Based on some Yourkit profiling that I conducted, I
> tried to clean up things like redundant strings, references that were null 100%
> of the time, and strings that were held onto past their use.
> 
> The file I used was 993200 bytes.
> 
> Without patch:
> Retained size (initial): 26814448 bytes
> Retained size (final): 67967488 bytes
> Growth: 41153040 bytes
> 
> With patch:
> Retained size (initial): 26834828 bytes
> Retained size (final): 64022316 bytes
> Growth: 37187488 bytes
> 
> Result: 9.7% decrease in retained memory size.
Comment 4 Nick Sandonato CLA 2010-01-08 10:33:25 EST
To get the ball rolling on this, I've checked in the changes from the patch. Dave, I'll look into the patches you supplied based on FindBugs reports. Thanks!

(In reply to comment #3)
> You might also want to review the FindBugs report, and a couple of the patches
> I supplied to help clean some of these up.  Particularly areas where variables
> are assigned but never read, or areas where null will always be the case.
> 
> Many cases we just have duplicate code as well, which could be cleaned up and
> help reduce memory footprint in the variables that are used and created.
> 
> The hudson build instance runs the find bugs reports with each build.
> 
> 
> (In reply to comment #2)
> > Created an attachment (id=155460) [details] [details]
> > patch
> > 
> > Attaching a patch that takes some strides towards reducing the memory footprint
> > of the XML editor a bit. Based on some Yourkit profiling that I conducted, I
> > tried to clean up things like redundant strings, references that were null 100%
> > of the time, and strings that were held onto past their use.
> > 
> > The file I used was 993200 bytes.
> > 
> > Without patch:
> > Retained size (initial): 26814448 bytes
> > Retained size (final): 67967488 bytes
> > Growth: 41153040 bytes
> > 
> > With patch:
> > Retained size (initial): 26834828 bytes
> > Retained size (final): 64022316 bytes
> > Growth: 37187488 bytes
> > 
> > Result: 9.7% decrease in retained memory size.
Comment 5 Nick Sandonato CLA 2010-04-07 11:02:38 EDT
We've had almost a 10% reduction in memory use, plus we've added some of the patches Dave's generated based on the FindBugs reports. As always, we'll continue to improve performance, but I'd say for now we can close this one out.
Comment 6 Valentin Baciu CLA 2010-04-12 21:13:06 EDT
Thanks Nick, every bit counts.