Community
Participate
Working Groups
Created attachment 151533 [details] Screenshot of the analysis in MAT +++ This bug was initially created as a clone of Bug #245008 +++ Build ID: Wind River workbench test build (based on eclipse-SDK-3.5.1-win32) Consider the heap dump on http://dl.dropbox.com/u/1975868/heapdump_wb37Alx20.zip The OutOfMemoryError happened during a workspace refresh of a large workspace (around 500.000 files, most of the tree linked in under a linked resource root), after a large C/C++ build had been performed. * Unzip the heap dump and load in http://www.eclipse.org/mat/ * Perform the default automatica leak suspects analysis - Problem Suspect 1 is internal to Wind River and not relevant here - Problem Suspect 2 is the UnifiedTreeNode queue, which we will look at here * Expand details of suspect 2 - click java.lang.Object[331129] > Show Objects by Class > outgoing refs - Expand Object[] > IFileInfo > String > char[] - Right-click the char[] > Java Basics > Group by Value > Finish The analysis shows that the UnifiedTreeNode queue has allocated an array of 331129 elements. In the queue, 274382 UnifiedTreeNode elements have been allocated. The underlying char[] arrays of those UnifiedTreeNode elements take over 90 MB of heap space, because the char[] arrays carry around the baggage of original full paths, rather than just holding the names. Attached screenshot also shows the analysis. Note that other than the FileInfo nodes, the File nodes which are stored in the Workspace organize the file names way more efficiently in their Path elements. I had suspected in bug 245008 before that the extra substring baggage might be introduced into the Workspace through the FileInfo class, and I believe that this heapdump proves my assumption.
Small correction: The heap dump was actually taken on Linux and not Windows. Ubuntu 8.04 64-Bit, but Eclipse running 32-bit on a Sun 1.5.0_11 JVM.
Created attachment 151534 [details] Simple patch fixing the issue Attached simple patch fixes the issue. The patch is against R3_5_maintenance because this is what we are testing with. The FileInfo class is not really meant to be held in memory for long, so the extra memory cost of creating a new String object should not be relevant. Also, in terms of Performance, the cost for creating the extra String object will be way lower than the cost of retrieving file status from the OS (which actually involves converting the file Path to platform bytes anyways). What's way more important, is that IFileInfo.getName() is meant to return the exact canonical name of a file object from the OS, and is therefore the "entry point" into Eclipse for the file name. It is therefore important to get it right and optimized.
I will test with this patch to see whether it's actually sufficient to also fix bug 245008. Comments welcome.
I should mention that I also see potential for a completely different route fixing this. Fact is that in EFS, the FileStore and FileInfo can be considered as related; and since the FileStore implementation carries the underlying java.io.File object, having the FileInfo's byte[] array re-use that array from java.io.File may not be problematic in most cases. It's really the code in UnifiedTree which creates the problem by (a) allocating the huge queue of 300000 elements, where IFileInfo objects are usually meant to be short-lived; (b) creating the Workspace / DataTreeNode structure, which is meant to be long-lived, out of the IFileInfo information, which is meant to be short-lived. With this in mind, one could argue that the "optimzation" of the file names should be done in UnifiedTree, with (a) being the bigger problem of both. At the moment, I cannot quite see why it should be necessary to ever remember the FileInfo objects of 300000 elements. The FileInfo should likely be retrieved later, on demand, when it can actually be processed; and that processing should perform the substring optimization. An implementation along these lines will need a bit more investigation than the simple patch attached here, but I would appreciate fellow committer's comments at this point. Thanks!
(In reply to comment #3) > I will test with this patch to see whether it's actually sufficient to also fix > bug 245008. Comments welcome. The patch looks good to me, and likely fixes the issue as shown in the screenshot I attached on the other bug: https://bugs.eclipse.org/bugs/attachment.cgi?id=148627 It's also interesting that these Strings aren't shortened by the StringPool by default. Is it worth also tweaking StringPool so that it only adds shortened paths to its map? --- src/org/eclipse/core/internal/utils/StringPool.java +++ src/org/eclipse/core/internal/utils/StringPool.java @@ -50,6 +50,7 @@ public final class StringPool { savings += 44 + 2 * string.length(); return (String) result; } + string = new String(string.toCharArray()); map.put(string, string); return string; }
(In reply to comment #5) Yes, I tend to agree that a StringPool should always only hold optimized Strings. That's the whole purpose of a StringPool. On the other hand, if we can be reasonably sure that at the point where StringPool kicks in (which is pretty late) we already have optimized Strings, that extra work would be unnecessary. But then again, the cost of storing a String in the HashMap is probably higher than the cost of creating the new String object. At any rate, right now I do not have evidence that doing the StringPool optimization would solve any real-world problem. Should we open a separate bug for discussing this? Or just get back to the StringPool topic when the current known real-world problems have been addressed? (In reply to comment #4) A very superficial look at UnifiedTree along the lines of comment 4 shows that it might be possible to simply not remember the FileInfo in UnifiedTreeNode, but remember individual attributes of the IFileInfo instead. The only client currently using the FileInfo directly after it has been created is DeleteVisitor#recursiveKeepHistory(), which really only needs the modificationTime. But I am not entirely happy with refactoring the UnifiedTreeNode. I like the concept of encapsulating IFileInfo such that we do not need to know any details. I would be more interested in changing the order of processing in the UnifiedTree algorithm, such that we could live with a shorter queue length and create the IFileInfo more on-demand. But at the moment I cannot quite see how this could be done.
Fix looks good to me.
I have committed the fix to CVS, since our testing indicates that this indeed fixes the problem. Our large workspace can now be opened without going OOME. I'm going to file a separate bug requesting a backport of this to Eclipse 3.5.2.