Community
Participate
Working Groups
I20040226 On moving to 20040226, I can no longer use my workspace that I was developing with using I20040219. It happens that my workspace has projects that are using both Jdk 1.4 and Jdk 1.5. John W and I have been looking at it and there is a ton of SimpleWordSets allocated and staying in memory (82000ish). As these are being allocated out of the AddJarFileToIndex.execute class the Out of Memory occurs. It would appear that the MemoryIndex is not getting totalled cleaned up? This is in the log: !SESSION Feb 26, 2004 15:12:22.560 --------------------------------------------- java.version=1.4.1-rc java.vendor=Sun Microsystems Inc. BootLoader constants: OS=win32, ARCH=x86, WS=win32, NL=en_US Command-line arguments: -data c:\1119 -showlocation !ENTRY org.eclipse.jdt.core 4 4 Feb 26, 2004 15:12:22.560 !MESSAGE Background Indexer Crash Recovery !STACK 0 java.lang.OutOfMemoryError !ENTRY org.eclipse.jdt.core 4 4 Feb 26, 2004 15:12:25.504 !MESSAGE Background Indexer Crash Recovery !STACK 0 java.lang.OutOfMemoryError
Darin: what VM args do you start your workspace with?
-Xmx300M
AddJarFileToIndex saves the index once its finished indexing all the .class files. This replaces the MemoryIndex with a new one... the old one should be GC'ed. All of your jar files are indexed one after the other so there should be only 1 reference to a 'full' MemoryIndex. Are you getting any walkbacks in the log? Are the save index calls failing? How many projects do you have using 1.4? How many using 1.5?
All that was in the log I provided to you. There was no indication that the moving to the DiskIndex was failing. I have 17 source project using 1.4.2 I have 1 source project using 1.5. Watching with OptimizeIt, we did not see any evidence of the SimpleWordSets getting GC'd. The number just kept growing until OutOfMemory.
Ok I have a full Eclipse source workspace (on 1.4.1) in which I added a new project on 1.5.0. I deleted all the index files in <workspace>\.metadata\.plugins\org.eclipse.jdt.core then restarted. These are the numbers I am getting running I20040226 with -Xmx300M on a jdk1.4.1 VM: memory used according to the TaskManager: less than 140Mb time to reindex, save to disk & run AllTypesCache: 1 minute 40 seconds Total disk space is 20.1Mb: 6 large index files (512K, 735K, 917K, 1.5Mb, 4.1Mb, 6.0Mb) 26 index files between 100K and 350K ~75 smaller index files that are less than 100K Can you duplicate & let me know your numbers please.
Will do test case in the next 2 hours
I have the debug and Ant source projects (17 projects), and one test project set to use 1.5. Delete the index files. The task manager never indicates more than 100mb but I get two Out of Memory exceptions once the index files start showing up again. I end up with 21 index files. The first OOM occurs right after a <number>.index.tmp file is created. I get only one large index file (4.2 mb)
And you don't have any disk space problems? This doesn't make any sense... if the OS thinks you're using < 100Mb of 300Mb allowed, why should you get OOMs? I was able to get OOMs if I ran with default VM args (no -Xmx paramater at all). The OOMs started showing up at 96Mb. Can you double check your VM args please.
So I am a bonehead...I was incorrectly specifying the vmargs so I was just using the default heap size for this Eclipse session. Sorry.
No problem.
I ran a related test on this one: new workspace, create a project named P1. I watched the Windows memory usage. In M7, the memory usage went from 30->60 Mbytes In 226, the memory usage went from 30->90 Mbytes My (unproven) claim is that this delta (30, 60 Mbytes respectively) is the amount of heap required to compute the index for rt.jar (jdk 1.4). 60 Mbytes seems like too much memory to require.
Yes I'm seeing 35 to 93Mb. I'll look into to it today.
So to launch an empty workspace on the Java perspective, we're at ~34Mb. To create the first project & run the indexer, but don't keep anything (ie. commented out the memoryIndex.addIndexEntry call) increases to ~44Mb... the cost to read every .class file & extract the info. To keep it in memory but never save to disk, it increases to ~81Mb (~74Mb for rt.jar which is first). Then to write each index out to disk, we jump to ~89Mb. The saved files on disk take 4.73Mb. Once the AllTypesCache is finished, we're at ~93Mb. Even though we index each jar one after the other & free the space, the GC never seems to reuse the space. So I need to look into the 44Mb to 74/81Mb jump. The MemoryIndex representation is too EXPENSIVE!
Forcing a GC at the end of the Index.save() settles the memory used to ~60Mb, but it still peaks at ~82Mb. So we've got back the easiest 10Mb.
For rt.jar, we're looking at storing ~1 million words in the index spread across 8 categories. There are 85,000 unique words... so roughly every word is extracted 11 times from the 9189 files. Interning the words reduces the peak memory use from ~82Mb to ~71Mb (the idle memory use is still ~60Mb). The additional cost of interning the words for rt.jar is 0.5 seconds (approximately 10%).
Reopening to track the changes.
So for rt.jar, we are extracting & storing 942,413 words with an average size of 10.74 characters. Headers on char[] are 16 bytes & characters are 2 bytes each -> 35Mb! So the bottom line is indexing every .class file in rt.jar can allocate over 40Mb (reading & extracting) before we ever start remembering the info. I also just ran M7 to check the numbers, it takes 68Mb on my machine (after starting at 33Mb)... so with the 2 changes we will now peak a little higher but settle down at 60Mb after rt.jar is saved.
*** Bug 53070 has been marked as a duplicate of this bug. ***
The manual call to GC feels like a hack. What about building indexes for libraries (zip/jar) in a different manner ? where the merge would occur during the indexing process.
Increased the threshold for the GC call to 1000 changes so it will only triggered by very large index files... we now peak and stay at 72Mb instead of shrinking to 60Mb. The space is not in the representation but in the words generated by the indexer. By eliminating the duplicates we are in the same range as the old implementation.
Verified for 3.0 using build I200403240800. Open a M7 workspace, open type, notice full re-indexing, modify some files. No OutOfMemory error encountered.