Bug 292267 - OutOfMemoryError during workspace refresh due to leak in UnifiedTree
Summary: OutOfMemoryError during workspace refresh due to leak in UnifiedTree
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.5.1   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: 3.6 M3   Edit
Assignee: Platform-Resources-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 245008 293519 351942
  Show dependency tree
 
Reported: 2009-10-14 10:24 EDT by Martin Oberhuber CLA
Modified: 2011-07-13 06:44 EDT (History)
13 users (show)

See Also:


Attachments
Patch making the improvement (2.72 KB, patch)
2009-10-14 10:40 EDT, Martin Oberhuber CLA
no flags Details | Diff
Updated patch against R3_5_maintenance (3.38 KB, patch)
2009-10-16 08:40 EDT, Martin Oberhuber CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2009-10-14 10:24:30 EDT
+++ This bug was initially created as a clone of Bug #245008 +++

We have a workspace with 400000 files, which gets populated outside Eclipse:
 * Create project foo, Quit Eclipse
 * Outside Eclipse, add symbolic link under foo pointing to 400000 files tree
 * Restart

During restart, Eclipse performs a Workspace refresh. This workspace refresh runs into an OutOfMemoryError with -Xmx320m heap size.

We analyzed the heap dump using the MAT tool and came to the conclusion that the UnifiedTree#freeList is the problem in our case. This freeList grows to 167000 UnifiedTreeNode elements, each of which is relatively heavy-weight (holding an IResource, IFileInfo and IFileStore). The freeList is allowed to grow unbounded, even though it is just there as a performance improvement to avoid excessive "new" calls.

I implemented 2 improvements for the situation:
  1) Delete the unused fields of UnifiedTreeNode before adding to the freeList.
  2) Bound the freeList at 32000 elements maximum (approx 1.2 MB)
With these two changes, our use-case completes without going OutOfMemory.

See also bug 245008 for more possible memory performance improvements.
Comment 1 Martin Oberhuber CLA 2009-10-14 10:40:05 EDT
Created attachment 149537 [details]
Patch making the improvement

Attached patch makes the improvement.

The 32767 items limit for the freeList is somewhat randomly chosen, the more important piece is freeing the UnifiedTreeNode elements before adding to the list.

I'm going to commit this into the 3.6 Stream since I really can't see any negative impact of this. Could I get fellow committer's review for also backporting to 3.5.2 ?

PS with the fix, we can complete our use case at 320MB heap limit, and after refreshing memory usage drops to 130MB. Still, at 256MB heap limit, we still run OOME. We continue investigating to find additional memory hogs -- most likely related to java.lang.String#substring() not freeing the underlying large byte[] array, as discussed in bug 245008.
Comment 2 Martin Oberhuber CLA 2009-10-14 11:16:18 EDT
Committed into 3.6, are we OK with putting this into 3.5.2 too?
Comment 3 Szymon Brandys CLA 2009-10-14 11:23:24 EDT
(In reply to comment #2)
> Committed into 3.6
Thanks Martin.

> are we OK with putting this into 3.5.2 too?
Since John A. and Pawel P. were looking at the issue already, I would like to give them a chance to look at the fix. I would also suggest to give the fix some time to settle down in 3.6.
Comment 4 James Blackburn CLA 2009-10-14 11:31:12 EDT
(In reply to comment #1)
> Created an attachment (id=149537) [details]
> Patch making the improvement
> 
> Attached patch makes the improvement.

Is this intentional:
      * Releases elements that won't be needed any more for garbage collection.
      * Should be called before adding a node to the free list.

but it's called in the opposite order:
	freeNodes.add(node);
	//free memory-consuming elements of the node for garbage collection
	node.releaseForGc();

?
Comment 5 Martin Oberhuber CLA 2009-10-14 15:35:19 EDT
Sharp eyes, James! It's not intentional but it won't matter since the UnifiedTree code is single-threaded. We are just putting a reference into the list... it' won't matter whether the content of what we reference is cleaned first or afterwards. 

But in order to be consistent and not confuse future readers, reversing the oder would likely make sense (first releaseForGc, then add to freeList).

BTW, I think that the issue here is a completely different one than bug 245008. So Szymon I'm not quite sure if I understand your comment about "John A and Pawel P having looked at this already".
Comment 6 Dani Megert CLA 2009-10-15 04:39:21 EDT
>Szymon I'm not quite sure if I understand your comment about "John A and
>Pawel P having looked at this already".
I think he simply wants them to review your fix.
Comment 7 Martin Oberhuber CLA 2009-10-16 08:40:34 EDT
Created attachment 149742 [details]
Updated patch against R3_5_maintenance

Attached is an updated patch against R3_5_maintenance which (1) addresses James' concern, and (2) fixes an NPE which we detected in testing. This patch brings the fix back in sync with what I committed into 3.6.

The NPE was caused by a problem in the cyclic symlink handling which I had committed 2 years ago. In the case where an existing resource was followed by a cyclic symbolic link, a duplicate of the same UnifiedTreeNode denoting that resource was added into the tree twice. Since the same object was found in the tree twice, clearing its fields for re-used led to an NPE.

The 2-year-old bug had broken the assumption that each node in the tree must be unique. Without the new memory optimization, that bug had gone unnoticed because just doing the same work twice did not lead to any problems. With the memory optimization, a node that had already been freed for re-use could not be used any more without running into the NPE.
Comment 8 James Blackburn CLA 2009-10-16 09:38:30 EDT
It would be really interesting to know what, if any, benefit this performance optimisation gives with real, large trees.

Object creation is relatively cheap in java, and in this case UnifiedTreeNodes are 5 obj refs + 1 bool.  Evidence (google) suggests much of the cost is in garbage collecting these objects, but this is mitigated by generational garbage collection (and if you're exceeding 32k entries in your cache, are these objects are being created faster than they're being used?)

Given the above and the fact that, up until now, the free UTN list has had strong references to its children, I would have expected this memory hit to have more of an adverse effect on the system (larger heap => memory pressure => more frequent + expensive GC) than the cost of the new()s that the optimisation was attempting to mitigate.  The fact that few have noticed this suggests that the effect is either marginal, or people are used to Eclipse eating all their memory ;).

I guess I'd be very interested to see a comparison of heap usage and overall CPU time with and without the freeNodes optimisation. But it sounds like good repeatable benchmarks are hard to come by?
Comment 9 Martin Oberhuber CLA 2009-10-16 10:00:10 EDT
(In reply to comment #8)
> It would be really interesting to know what, if any, benefit this performance
> optimisation gives with real, large trees.

James, this was from a real, large tree in our product, which customers of ours get by every day. We are building embedded linux systems, and we're adding a linked resource pointing to the tree populating the entire linux file system -- 400.000 files in small file system, more in a large one.

By using the (arguably somewhat artificial) limit of 32727 nodes in the free list, I was trying to make a compromise between saving time for the new() operation -- which, according to my information, is still more expensive than nulling out the elements -- and using memory. "Normal" refresh jobs should still be fast enough, whereas large ones now pass through without an OOME even on huge workspaces.

I did consider removing the freeList entirely and see what, if any, additional performance gain that would give. But for one, I do not have the time doing so right now, and second I do not expect any big difference.

Fact is, there was a real life problem and it has been fixed. If you feel like doing any more analysis to further optimize it, you are more than welcome! The one thing that I don't want is postponing the current working and tested fix just because a slightly better one might also be possible. I think we need to take one step at a time.

> was attempting to mitigate.  The fact that few have noticed this suggests that
> the effect is either marginal, 

The effect is easy to compute by looking at the heapdump, which I can share with you if you want: In our case, the freeList grew to 168000 elements (which by itself is ridiculous since I would not expect so many nodes to ever be needed!). Each element retained 24 bytes shallow memory, but the real memory pigs were the Resource and FileStore objects which held absolute paths pointing to the elements -- together around 300 bytes each. 

As a result, 50MB of main memory were unnecessarily retained by the UnifiedTree during the refresh operation -- enough to kill our app with a default heap size of 320MB which we cannot easily increase since then on Linux we would run into problems doing Runtime.getRuntime().exec() due to the known linux limitations of fork() and exec() -- see http://developers.sun.com/solaris/articles/subprocess/subprocess.html . Acutally I guess this fork-and-exec limitation is the main reason why people haven't noticed this before: they just uped their max heap size to 768M or 1 Gig so they hid this problem. We cannot do that since then our app won't be able to fork processes any more on linux system with little virtual memory.

Arguably, a better solution for the large linux file system of ours would be using lazy resources as per bug 244979 since nobody needs full workspace functionality on those resources anyways. While we don't have that, we need to optimize memory :)
Comment 10 James Blackburn CLA 2009-10-16 10:25:15 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > It would be really interesting to know what, if any, benefit this performance
> > optimisation gives with real, large trees.
> 
> James, this was from a real, large tree in our product, which customers of ours
> get by every day. We are building embedded linux systems,

Woops, I should have been clearer: I was wondering about the motivation behind the addition of the 'freeNodes' list rather than your fix for the leak Martin!
Comment 11 John Arthorne CLA 2009-10-16 10:54:46 EDT
+1. Good catch Martin, this fix looks good to me.
Comment 12 Szymon Brandys CLA 2009-10-28 07:10:25 EDT
Martin, could you please verify the fix using the M3 candidate? Thanks.
Comment 13 Martin Oberhuber CLA 2009-10-28 17:11:27 EDT
Verified in I20091027-0100 by source review:

I imported the org.eclipse.core.resources plugin from the self-hosted target platform with source folders, renamed it and compared against HEAD in CVS. My fixes are in.

Full functional verification (that actually less memory is used) requires bug 245008 to be fixed first, because the effects of that bug would otherwise mask this one -- Eclipse would run OutOfMemory due to bug 245008 before even running into the issue fixed here. At Wind River, however, we have been running a version that has both this patch and the one for bug 245008 for a while now without any issues and proven reduction of memory need.