Bug 244631 - [Markers] Problems view cache is too large for very large marker sets
Summary: [Markers] Problems view cache is too large for very large marker sets
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 3.4   Edit
Hardware: PC Windows XP
: P3 critical (vote)
Target Milestone: 3.4.1   Edit
Assignee: Tod Creasey CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2008-08-19 17:56 EDT by Ernest Mah CLA
Modified: 2008-09-11 16:40 EDT (History)
13 users (show)

See Also:
Mike_Wilson: pmc_approved+
bokowski: review+
Tod_Creasey: review? (john.arthorne)
eclipse: review+


Attachments
Example to generate many marlers (10.92 KB, application/octet-stream)
2008-08-21 09:28 EDT, Tod Creasey CLA
no flags Details
Screen shot of YourKit duplicate string trace (29.24 KB, image/png)
2008-08-21 14:29 EDT, John Arthorne CLA
no flags Details
Patch with the collation keys removed (4.07 KB, patch)
2008-08-21 16:11 EDT, Tod Creasey CLA
no flags Details | Diff
Patch that clears the cache after sorting (2.15 KB, text/plain)
2008-08-21 16:47 EDT, Tod Creasey CLA
no flags Details
Update to Tod's patch that nulls the collation key cache (2.72 KB, patch)
2008-08-21 17:50 EDT, John Arthorne CLA
no flags Details | Diff
Update to Johns patch with NPE fixed. (5.78 KB, text/plain)
2008-09-02 09:46 EDT, Tod Creasey CLA
no flags Details
Patch with the caches reduced to one dictionary (8.02 KB, patch)
2008-09-02 10:50 EDT, Tod Creasey CLA
Tod_Creasey: review?
Details | Diff
More versatile version of the examples (13.07 KB, application/zip)
2008-09-03 11:20 EDT, Tod Creasey CLA
no flags Details
Proposed patch for 3.4.1 (3.10 KB, patch)
2008-09-03 12:28 EDT, Boris Bokowski CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ernest Mah CLA 2008-08-19 17:56:59 EDT
Build ID: I20080617-2000

Steps To Reproduce:
In some of our large customer workspaces, builds can create a high number of markers.

One workspace ends up with 305,000 markers.  In analyzing a heap dump of this scenario I discovered 246MB was being held on to by org.eclipse.ui.internal.views.markers.MarkerMap.  This causes our application to crash with an OutOfMemoryException (our max heap size is 768MB)

Some of the larger entries in this marker map are 2.5Kb each.  Note that I haven't expanded anything in the problems view.

I can provide heapdump data if required.

More information:
Comment 1 Kevin McGuire CLA 2008-08-19 18:57:19 EDT
Well aren't we little piggies...

Tod, thoughts on how we can condense this?
Comment 2 Ernest Mah CLA 2008-08-19 19:47:42 EDT
Additional information.  In one of the larger MarkerEntry hashmaps, there's a com.ibm.icu.text.CollationKey class that is taking 2K out of the 2.5K.
Comment 3 Tod Creasey CLA 2008-08-20 09:12:18 EDT
The CollationKeys are from ICU - it would be interesting to see what it was like if icu.base was used instead. However they are clearly not all this size as 2K collation keys would take 1.4G of space if there were 705,000 of them.

However as we sort primarily on description and access the other values lazily on expansion this is something we should focus on if there are size issues.

The Collation keys are based on the description. What was the description of the problem with the large key?

Are you running with your heap monitor on? If so you could get an idea of the heap size with and without the problems view.
Comment 4 Ernest Mah CLA 2008-08-20 10:01:31 EDT
Hi Todd,

I couldn't quite do the comparo since I hit the OOME.

Here's what I did try for fun though.  I upped my max heap size to 1024M and it still died.
Then I went back to 768M and close the problems view before performing the import and build, and it went through clean without OOME.

Comment 5 John Arthorne CLA 2008-08-20 10:03:30 EDT
For reference, this caching was added to fix performance bug 65711.
Comment 6 Tod Creasey CLA 2008-08-21 09:28:54 EDT
Created attachment 110564 [details]
Example to generate many marlers

Here is an example that generates 305,000 markers. I tried this using an empty workspace with the problems view closed and it will still cause an OutOfMemory error with the argument -Xmx768m.

As a product I think you need to investigate why you generate 305,000 markers if you want to run with 768M of memory? Even if we do manage to reduce the amount of space the view requires the overhead created by a 1/3 of a million markers will blow the memory.

We should still look into ways to improve the MarkerMap but I'm afraid anything we do here won't help you with this problem.
Comment 7 Ernest Mah CLA 2008-08-21 12:24:07 EDT
Hi Tod, the plugins does demonstrate the issue and it definitely blows the 768M even with the problems view closed.

I did some further experimentation because in my scenario closing the problems view did allow it to finish.

I ran without Xmx parameter so I could better catch what was going on.  I enabled the heap monitor.
1. Launch your generate 305,000 markers action.
2. Let the heap grow to about 500Mb then cancel the operation.  Hit the garbage collection button and it wen down to 250Mb.
3. Open the problems view and it goes up to 507MB (without expanding the errors node).
4. Close the problems view.  If I wait, it drops down to 253MB.

So I looks like closing the problems view does save a bunch of memory, as your mention, 300,000 markers is an issue.  I'm going to try and see if we had a mechanism to limit the markers generated and whether it was by validator, or by a general preference.
Comment 8 Tod Creasey CLA 2008-08-21 13:05:57 EDT
This is consistent with what I am seeing too. We can look at possible improvements in the problem view for sure - I just wanted to point out that there is only so much anyone can do with this many markers if we need to sort them.

Doing some crude testing using the heap monitor it appears that the use of Collation keys seems to roughly double the memory we used for opening the problems view so we should assess if the performance improvements in sorting are worth the extra space the CollationKeys require.

In smaller (10,000 element or so) sets the space performance wasn't that big of an issue but when you are so close to maximum heap size it can make a big difference.

Note that this is not a change in behaviour from 3.3 - we actually cached much more  previously.

You could also define some filtering using the markerSupport extension point to cut down on the number of problems that get shown in the view.
Comment 9 John Arthorne CLA 2008-08-21 14:29:40 EDT
Created attachment 110608 [details]
Screen shot of YourKit duplicate string trace

I took a quick look at memory usage of MarkerMap in YourKit.  It looks like there is a lot of string duplication here that could be saved by doing a uniquifying pass over the map (walk over the map and throw all the strings in a set to weed out duplicates).  In this screen shot there are 5 unique strings (two description attributes and three marker path attributes) that have 2MB worth of duplicates. The 5000 copies of the string "0:" are also held by the MarkerMap, but I have no idea where this string comes from (it's some marker attribute). This is on a self-hosting workspace with org.eclipse.ui* and org.eclipse.swt checked out.

The underlying IMarker mechanism does do a pass where it eliminates all duplicate strings. However, it happens in the background after startup and I suspect the problems view is grabbing the description strings before they are canonicalized. I think a similar canonicalization pass on MarkerMap would be a big win with little effort.

Another smaller optimization would be to combine the "attributeCache" and "collationKeys" maps into a single map on MarkerEntry. Since you can obtain the source string from the collation key (CollationKey.getSourceString()), there is no need to hang onto both the CollationKey and attribute String in separate maps.
Comment 10 Tod Creasey CLA 2008-08-21 16:11:19 EDT
Created attachment 110618 [details]
Patch with the collation keys removed

Here is a patch with the collation key use removed to test performance
Comment 11 Tod Creasey CLA 2008-08-21 16:25:39 EDT
I'm not sure what you are getting at with MarkerMap John. There are no strings cached there - just a mapping from marker to MarkerEntry.

Are you referring to MarkerEntry? If you mean the attributeCache we are using keys that are constants already. If it is the values are you suggesting that we do a clearing of the cached data after the first sort to avoid these duplicate values?
Comment 12 Tod Creasey CLA 2008-08-21 16:47:20 EDT
Created attachment 110621 [details]
Patch that clears the cache after sorting

As sorting populates the marker entry for every marker we can get a big optimization by clearing the caches after sorting as generally we are only accessing the attributes for the top 100 entries after sorting.

My quick benchmark indicates that this is a very significant reduction in the space we will take. Ernest if you could try this out I would be interested in your results.
Comment 13 John Arthorne CLA 2008-08-21 17:49:49 EDT
I was referring to MarkerEntry. It contains two maps: one for attributes, and another for the collation key of string attributes. I was suggesting merging them into one map, whose value would either be a String or a CollationKey. If the map contains a CollationKey the attribute value can be obtained by calling CollationKey.getSourceString().

However your optimization to flush the cache likely makes this kind of optimization unnecessary. It would only save on the extra overhead of a Map and MapEntry[] for each MarkerEntry that used a collation key.

If the MarkerEntry.collationKeys field will only end up being used on a small number of markers after the initial sort, I suggest nulling that field rather than just clearing the Map. Otherwise you still have 300,000 empty Map objects taking up space.
Comment 14 John Arthorne CLA 2008-08-21 17:50:42 EDT
Created attachment 110627 [details]
Update to Tod's patch that nulls the collation key cache
Comment 15 Ernest Mah CLA 2008-08-21 22:19:56 EDT
Trying out the updated patch now.  One question, when is the sorting finished?  In the scenario we have, a long running build is run that includes validation that generates the markers as it goes.  Will there be opportunities for this cache clearing code to be run periodically?

FYI, I got a NPE in the following:
java.lang.NullPointerException
	at org.eclipse.ui.internal.views.markers.MarkerEntry.getCollationKey(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerDescriptionField.getDescriptionKey(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerDescriptionField.compare(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerProblemSeverityAndMessageField.compare(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerComparator.compare(Unknown Source)

I updated the getCollationKey() as follows:
	CollationKey getCollationKey(String attribute, String defaultValue) {
		if (collationKeys != null && collationKeys.containsKey(attribute))
			return (CollationKey) collationKeys.get(attribute);
Comment 16 Ernest Mah CLA 2008-08-22 02:47:47 EDT
The patch seems to be an improvement on one of the large workspaces, but before it finished building I got the following exception:

!ENTRY org.eclipse.core.jobs 4 2 2008-08-22 02:38:00.623
!MESSAGE An internal error occurred during: "Process resource updates".
!STACK 0
java.lang.ClassCastException: java.lang.ref.SoftReference incompatible with java.util.Map
	at com.ibm.icu.impl.ICUService.getKey(Unknown Source)
	at com.ibm.icu.impl.ICUService.getKey(Unknown Source)
	at com.ibm.icu.impl.ICULocaleService.get(Unknown Source)
	at com.ibm.icu.impl.ICULocaleService.get(Unknown Source)
	at com.ibm.icu.text.CollatorServiceShim.getInstance(Unknown Source)
	at com.ibm.icu.text.Collator.getInstance(Unknown Source)
	at com.ibm.icu.text.Collator.getInstance(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerEntry.getCollationKey(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerDescriptionField.getDescriptionKey(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerDescriptionField.compare(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerProblemSeverityAndMessageField.compare(Unknown Source)
	at org.eclipse.ui.internal.views.markers.MarkerComparator.compare(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.mergeSort(Unknown Source)
	at java.util.Arrays.sort(Unknown Source)
	at java.util.Arrays.sort(Unknown Source)
	at org.eclipse.ui.internal.views.markers.CachedMarkerBuilder.sortAndMakeCategories(Unknown Source)
	at org.eclipse.ui.internal.views.markers.CachedMarkerBuilder.buildAllMarkers(Unknown Source)
	at org.eclipse.ui.internal.views.markers.CachedMarkerBuilder$1.run(Unknown Source)
	at org.eclipse.core.internal.jobs.Worker.run(Unknown Source)
Comment 17 Ernest Mah CLA 2008-08-25 12:52:44 EDT
FYI, I haven't encountered the exception I noted in my last note in further testing on the weekend.  The exception may have happened at the time where the OOME was being hit.

Comment 18 Ernest Mah CLA 2008-08-26 15:37:10 EDT
My latest test run where the build is at 100%.  I generated a heapdump.  There are 2 places where memory use is high in relation to the MarkerEntry.

1) MarkerMap - this is at 276MB, with 317,298 entries in the map.
2) Problems view itself - this ends up holding onto a TreeItem that has a MarkerCategory object.  This object is 209MB.

This adds up to 485MB tied up in memory for markers (not sure if the marker entries overlap between the two structures).

John, anything further we can tweak here?  (I believe Tod is on vacation until next Wednesday).
Comment 19 Min Idzelis CLA 2008-08-27 10:50:57 EDT
(In reply to comment #6)
> 
> As a product I think you need to investigate why you generate 305,000 markers
> if you want to run with 768M of memory? Even if we do manage to reduce the
> amount of space the view requires the overhead created by a 1/3 of a million
> markers will blow the memory.

I thought creating a IMarker was basically a resource operation. To me it seems that should mean that the amount of markers that can be created should only be limited by available disk space. 

With regards to problem view. I would consider the limit of IMarkers to show to be a property of the UI. I don't think it should be a property of (non-UI) things that create markers. There are some basic problems with that since there is no "global marker size" limit that marker-creators can query.

> 
> We should still look into ways to improve the MarkerMap but I'm afraid anything
> we do here won't help you with this problem.
> 

(I don't know how problems view is implemented, so this is just a shot out at the dark...) Could the problems view take advantage of the SWT.VIRTUAL flag which will only load the model objects for the objects that are currently being displayed? The problems view will NEVER display all 300,000+ markers at a time, so why pay the cost of keeping them in memory? I think at in the optimal case you could just keep the IMarkers that you are currently displaying in memory. 
Comment 20 Ernest Mah CLA 2008-08-27 11:44:42 EDT
Hi Min, some interesting notes.  

One thing that Tod mentioned was that all the info was required in order to sort the markers before showing the top 100.  Once the sort is done and the top 100 known, then this patch discards the cached attribute information.  

Thinking more about this, maybe we can have a hard upper bound in the number of markers to consider for sorting.  Something that no properly setup workspace would ever hit, but a value that would keep things from getting out of hand.  One guess would be 100,000 markers.  This rough number would potentially cut the memory used by this particular workspace down to 1/3 and definitely speed things up for the sorting.  An 100,000 markers is more than enough for consideration for the typical user.  Would this suggestion also limit the MarkerCategory object's memory use?
Comment 21 Min Idzelis CLA 2008-08-27 13:26:20 EDT
Yes, sorting the array of 305,000 needs to have the whole set in memory before it can return any results. What if there was a disk-based "marker index" that could be used by the sort routine? That way, you wouldn't have to load all 300,000+ markers before performing actions on them. 

SQL databases regularly sort complex queries without loading all the tables into memory so this is not an insurmountable problem. 
Comment 22 Boris Bokowski CLA 2008-08-27 13:57:41 EDT
(In reply to comment #15)
> FYI, I got a NPE in the following:
> java.lang.NullPointerException
>         at
> org.eclipse.ui.internal.views.markers.MarkerEntry.getCollationKey(Unknown
> Source)

There is a problem with the patch, in MarkerEntry.getCollationKey, the first
line should be:

if (collectionKeys != null && collationKeys.containsKey(attribute))
Comment 23 Boris Bokowski CLA 2008-08-27 14:02:41 EDT
(In reply to comment #21)
> Yes, sorting the array of 305,000 needs to have the whole set in memory before
> it can return any results. What if there was a disk-based "marker index" that
> could be used by the sort routine? That way, you wouldn't have to load all
> 300,000+ markers before performing actions on them. 
> 
> SQL databases regularly sort complex queries without loading all the tables
> into memory so this is not an insurmountable problem. 

Are we still talking about a fix for 3.4.1? We can only do surgical fixes with minimal risk in a point release.

If you know in advance that only the first 100 markers (after sorting) are relevant, you can compute the result without having to load all markers into memory. You just iterate over all markers while maintaining a list of the 100 lexicographically smallest markers. I don't think we can change the current code to work this way though, for 3.4.1.
Comment 24 Ernest Mah CLA 2008-08-27 14:22:07 EDT
(In reply to comment #23)
> (In reply to comment #21)
> > Yes, sorting the array of 305,000 needs to have the whole set in memory before
> > it can return any results. What if there was a disk-based "marker index" that
> > could be used by the sort routine? That way, you wouldn't have to load all
> > 300,000+ markers before performing actions on them. 
> > 
> > SQL databases regularly sort complex queries without loading all the tables
> > into memory so this is not an insurmountable problem. 
> 
> Are we still talking about a fix for 3.4.1? We can only do surgical fixes with
> minimal risk in a point release.
> 
> If you know in advance that only the first 100 markers (after sorting) are
> relevant, you can compute the result without having to load all markers into
> memory. You just iterate over all markers while maintaining a list of the 100
> lexicographically smallest markers. I don't think we can change the current
> code to work this way though, for 3.4.1.
> 
Agreed Boris, Min's suggestion would be something to consider for future.  For now, something not too invasive is required.
Comment 25 Ernest Mah CLA 2008-08-27 14:23:27 EDT
(In reply to comment #22)
> (In reply to comment #15)
> > FYI, I got a NPE in the following:
> > java.lang.NullPointerException
> >         at
> > org.eclipse.ui.internal.views.markers.MarkerEntry.getCollationKey(Unknown
> > Source)
> 
> There is a problem with the patch, in MarkerEntry.getCollationKey, the first
> line should be:
> 
> if (collectionKeys != null && collationKeys.containsKey(attribute))
> 

Yup, I hit that early on, and made the change as mentioned at the end of comment #15.
Comment 26 Boris Bokowski CLA 2008-08-28 12:30:52 EDT
Note that due to a change to our schedule (see http://dev.eclipse.org/mhonarc/lists/eclipse-dev/msg08304.html), we won't be able to address this for 3.4.1 RC1. It's still on the table for 3.4.1 RC2 at this point.

Here are a couple more ideas, but nothing that I would be comfortable with for 3.4.1. Note that there are two aspects to this - reducing the memory that we hold on to long-term, while the Problems view is open; and reducing the memory that we hold on to temporarily while recomputing/sorting/categorizing the MarkerEntry objects. I suspect that it is the latter that causes the OOMEs you are seeing, but improving the former (such as through the proposed patch) will help too. This is because we still hold onto the old "currentMap" while CachedMarkerBuilder.sortAndMakeCategories is working to compute a new "currentMap". As observed by Ernest in comment 18, we end up holding onto roughly double the amount of memory of just one "currentMap" while we are sorting.

- (proposed by John in comment 9 and comment 13) Combine the "attributeCache" and "collationKeys" maps into a single map on MarkerEntry.
- Going even further, on the assumption that there are only a few entries in these maps, convert them (or only one of them) into an array with linear search. For example, the typical case for "collationKeys" is to contain one entry.
- (proposed by John in comment 9) Eliminate duplicate strings - we might be able to do this as we sort in CachedMarkerBuilder.sortAndMakeCategories but we would have to pass the data structure (HashMap?) used for this down to MarkerEntry.getAttributeValue - probably through the comparator and the field objects that are used during the sort operation.
- (proposed by me in comment 23) Avoid sorting over all MarkerEntry objects - iterate over all markers incrementally while maintaining a TreeSet of the 100 first MarkerEntry objects. This would cut down the memory consumption dramatically but requires quite a bit of work to implement.
- Get rid of MarkerEntry objects that are not displayed. Note that even though we only display the first 100, we still hold onto all MarkerEntry objects we used for sorting.

Note that there are many potential optimizations you could do on the client side:
- Don't generate as many markers - I doubt that end users will be able to do anything about them if they see markers in the 100,000s. Some ideas:
  . When you encounter problems that may have been caused by earlier problems, don't report them if they are indeed a follow-up problem.
  . After reporting a certain number of problems per file/folder/project (the exact number and scope depends on your domain), stop reporting more and instead generate one more marker saying that you stopped looking for more problems.
  . Disable validators if there is something fundamentally wrong with the project structure you are seeing. For example, JDT will not report problems with individual files if there is a serious problem with your project's class path.
- Use shorter strings in your description. Why are some of them 2.5K? Discounting the CollationKey overhead, these must still be pretty long strings.
Comment 27 Boris Bokowski CLA 2008-08-28 12:32:58 EDT
(In reply to comment #25)
> Yup, I hit that early on, and made the change as mentioned at the end of
> comment #15.

I missed that - probably thought it was part of the stack trace you posted. ;-)
Comment 28 Ernest Mah CLA 2008-08-29 14:18:03 EDT
(In reply to comment #26)
> Here are a couple more ideas, but nothing that I would be comfortable with for
> 3.4.1. Note that there are two aspects to this - reducing the memory that we
> hold on to long-term, while the Problems view is open; and reducing the memory
> that we hold on to temporarily while recomputing/sorting/categorizing the
> MarkerEntry objects.

Boris, can we get the first part of this fixed with an official patch for our product (long term memory held by the view in between sorting), and a more optimal fix for the second aspect (memory used during sort) later?
Comment 29 Tod Creasey CLA 2008-09-02 08:25:00 EDT
BTW all VIRTUAL will save us is the widget creation (which we already save by limiting the number of problems shown) but is no help with object held onto by the model.
Comment 30 Tod Creasey CLA 2008-09-02 09:46:25 EDT
Created attachment 111473 [details]
Update to Johns patch with NPE fixed.

Here is an update to John's patch with the NPE fixed. I am also working on one to amalgamate the caches which should be what we look at for 3.5.

As far as trying a String reduction technique similar to the marker one goes I don't think it is necessary after we clear the caches as we will be asking the markers for theiur Strings after the String reduction from Core has happened.
Comment 31 Tod Creasey CLA 2008-09-02 10:50:21 EDT
Created attachment 111478 [details]
Patch with the caches reduced to one dictionary

Here is the patch with the maps amalgamated to be reviewed by John and Boris.
Comment 32 Min Idzelis CLA 2008-09-02 13:14:19 EDT
(In reply to comment #29)
> BTW all VIRTUAL will save us is the widget creation (which we already save by
> limiting the number of problems shown) but is no help with object held onto by
> the model.
> 

What about using VIRTUAL in combination with ILazyTreeContentProvider which only retrieves the "model" for visible elements? 

Or perhaps creating a proxy for the IMarker that just hold the IMarker's "id", and retrieve the rest of the information from the IMarker on it's IResource on demand? 
Comment 33 Tod Creasey CLA 2008-09-02 13:43:31 EDT
Min the problem is sorting the elements first. In both of those cases I have to sort first.
Comment 34 Tod Creasey CLA 2008-09-03 10:51:23 EDT
We should look at second last (lower risk) fix for 3.4.1 and the other one for 3.5.
Comment 35 Tod Creasey CLA 2008-09-03 11:20:30 EDT
Created attachment 111588 [details]
More versatile version of the examples
Comment 36 Tod Creasey CLA 2008-09-03 11:43:28 EDT
I have made a branch for org.eclipse.ui.ide called R3_4_maintenance_patches based off of version M20080827-0800a (the latest ide maintenance contribution)
Comment 37 Boris Bokowski CLA 2008-09-03 12:28:11 EDT
Created attachment 111595 [details]
Proposed patch for 3.4.1

This is another update to patch 111473 "Update to Johns patch with NPE fixed" without the whitespace changes.
Comment 38 Boris Bokowski CLA 2008-09-03 12:31:21 EDT
+1 for releasing patch 111595 "Proposed patch for 3.4.1" into the R3_4_maintenance stream for inclusion in 3.4.1.
Comment 39 Boris Bokowski CLA 2008-09-03 13:56:03 EDT
(In reply to comment #36)
> I have made a branch for org.eclipse.ui.ide called R3_4_maintenance_patches
> based off of version M20080827-0800a (the latest ide maintenance contribution)

I had to create a new branch (called R3_4_maintenance_patches_2) based off of the original version I20080606-1300 in R3_4 to avoid picking up unrelated changes.
Comment 40 Boris Bokowski CLA 2008-09-03 14:08:02 EDT
Patch released to R3_4_maintenance_patches_2.
Comment 41 Mike Wilson CLA 2008-09-03 14:34:55 EDT
I reviewed the patch and it seems reasonable. This late in the game, I would not recommend that we attempt anything more pervasive for 3.4.1.

In addition to revisiting the performance issue in 3.5, we should also add words somewhere in the Javadoc (maybe on IMarker?) that indicate what we believe are reasonable numbers of markers for users to work with and identify best practices for reporting catastrophic failures (as is done by JDT, for example).


Comment 42 Boris Bokowski CLA 2008-09-03 14:37:52 EDT
Fix released to R3_4_maintenance.
Comment 43 Tod Creasey CLA 2008-09-05 09:49:15 EDT
Verified in M20080903-2000
Comment 44 Boris Bokowski CLA 2008-09-05 15:17:38 EDT
Verified that we indeed do not hold on to as much memory anymore.
Comment 45 Ernest Mah CLA 2008-09-10 17:49:09 EDT
(In reply to comment #26)

> Note that there are many potential optimizations you could do on the client
> side:
> - Don't generate as many markers - I doubt that end users will be able to do
> anything about them if they see markers in the 100,000s. Some ideas:
>   . When you encounter problems that may have been caused by earlier problems,
> don't report them if they are indeed a follow-up problem.
>   . After reporting a certain number of problems per file/folder/project (the
> exact number and scope depends on your domain), stop reporting more and instead
> generate one more marker saying that you stopped looking for more problems.

Very interesting discovery I just made as I was trying to determine which one of our validators was producing so many markers in one of our large workspaces we test with.

1) First I grouped the markers by type and found that 186,374 out of 228,187 (82%) were Java Problems!

2) Then I grouped the markers by Java Problem type and found 131,702 out of 186,374 (71%) were due to type safety and raw types warnings (which .  These type safety and raw type warnings contribute to 58% of the markers here on the entire workspace.

3) I compared it with a run of our previous product version (based on Eclipse 3.2).  There were 77,447 Java problems.  This means our product based on Eclipse 3.4 saw an increase of 108,927 Java problem type markers on the same workspace.  The Type safety and Raw type category increased from 30,545 to 131,702 markers.

Is there any optimization JDT can do here with those warnings?  Perhaps taking some of the tips you've outlined here Boris?
Comment 46 Ernest Mah CLA 2008-09-10 18:02:10 EDT
(In reply to comment #31)
> Created an attachment (id=111478) [details]
> Patch with the caches reduced to one dictionary
> 
> Here is the patch with the maps amalgamated to be reviewed by John and Boris.
> 

I did a rough test of this patch and found it saved approximately 30M in step 2 compared to just the clearing of the cache after sort was done (attachment id=111595).

1. Start with no problems view (294M).
2. Open problems view (Peaked at 501M).
3. View refreshes after sort (377M after GC).

Still curious as to why having the problems view open costs us 83M (better than 186M I suppose), where as in Eclipse 3.2 the problems view didn't seem to cost any additional space whether it was open or closed.
Comment 47 Dani Megert CLA 2008-09-11 01:54:19 EDT
>Is there any optimization JDT can do here with those warnings? 
Can't you disable the "Usage of raw type" warning? Or, in case you don't code against 1.5 set the JRE or EE to 1.4?
Comment 48 Ernest Mah CLA 2008-09-11 10:00:13 EDT
(In reply to comment #47)
> >Is there any optimization JDT can do here with those warnings? 
> Can't you disable the "Usage of raw type" warning? Or, in case you don't code
> against 1.5 set the JRE or EE to 1.4?
> 

Yes that's certainly possible.  For our customers, this won't be the first thing to do for them (though it may become a learned response).  With the default settings, autobuild is on and JDT generates those messages.  They'll import the project interchange, or checkout from cvs and they'll hit this use case (since most of our customers will have legacy workspaces they're extending).  Can we cap the number of these warnings generated per project or trim the number of warnings per file after a certain number has been encountered?  If they're interested in cleaning it up, they still can (and it could be 100 per project that's visible.

In the meantime, we'll have to provide a tech note in our product for this and hope our customers read this before potentially hitting an OOME from this and then figuring out this setting or calling support :)
Comment 49 Dani Megert CLA 2008-09-11 10:13:15 EDT
> Can we cap the number of these warnings generated per project or
>trim the number of warnings per file 
Since the user can decide how a problem is reported (ignore, warning, error) we can only allow to cap per number of problems. This is currently set to 100 per file (see Java > Compiler > Building).

>In the meantime, we'll have to provide a tech note in our product for this and
>hope our customers read this before potentially hitting an OOME
If you know that most customers have an issue with this you could set that warning to 'ignore' in your config file and in the tech note / readme inform the users how to enable it if desired.
Comment 50 Boris Bokowski CLA 2008-09-11 16:40:44 EDT
We should continue the discussion in bug 246593. Sorry for not pointing to the 3.5 bug earlier!

I have taked comment #45 and filed bug 247081 against JDT Core.