Bug 53242 - Consitent Out of Memory problems indexing (with multiple Java libraries)
Summary: Consitent Out of Memory problems indexing (with multiple Java libraries)
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows 2000
: P2 critical (vote)
Target Milestone: 3.0 M8   Edit
Assignee: Kent Johnson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 53070 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-02-26 18:23 EST by Darin Swanson CLA
Modified: 2018-01-19 11:14 EST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Swanson CLA 2004-02-26 18:23:54 EST
I20040226

On moving to 20040226, I can no longer use my workspace that I was developing 
with using I20040219.
It happens that my workspace has projects that are using both Jdk 1.4 and Jdk 
1.5.

John W and I have been looking at it and there is a ton of SimpleWordSets 
allocated and staying in memory (82000ish). As these are being allocated out of 
the AddJarFileToIndex.execute class the Out of Memory occurs.

It would appear that the MemoryIndex is not getting totalled cleaned up?

This is in the log:
!SESSION Feb 26, 2004 15:12:22.560 ---------------------------------------------
java.version=1.4.1-rc
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=win32, ARCH=x86, WS=win32, NL=en_US
Command-line arguments: -data c:\1119 -showlocation
!ENTRY org.eclipse.jdt.core 4 4 Feb 26, 2004 15:12:22.560
!MESSAGE Background Indexer Crash Recovery
!STACK 0
java.lang.OutOfMemoryError
!ENTRY org.eclipse.jdt.core 4 4 Feb 26, 2004 15:12:25.504
!MESSAGE Background Indexer Crash Recovery
!STACK 0
java.lang.OutOfMemoryError
Comment 1 Kent Johnson CLA 2004-02-27 10:51:11 EST
Darin: what VM args do you start your workspace with?
Comment 2 Darin Swanson CLA 2004-02-27 11:43:21 EST
-Xmx300M
Comment 3 Kent Johnson CLA 2004-02-27 11:56:13 EST
AddJarFileToIndex saves the index once its finished indexing all the .class 
files. This replaces the MemoryIndex with a new one... the old one should be 
GC'ed.

All of your jar files are indexed one after the other so there should be only 1 
reference to a 'full' MemoryIndex.

Are you getting any walkbacks in the log? Are the save index calls failing?

How many projects do you have using 1.4? How many using 1.5?
Comment 4 Darin Swanson CLA 2004-02-27 12:01:17 EST
All that was in the log I provided to you.
There was no indication that the moving to the DiskIndex was failing.

I have 17 source project using 1.4.2
I have 1 source project using 1.5.

Watching with OptimizeIt, we did not see any evidence of the SimpleWordSets 
getting GC'd. The number just kept growing until OutOfMemory.
Comment 5 Kent Johnson CLA 2004-02-27 13:47:49 EST
Ok I have a full Eclipse source workspace (on 1.4.1) in which I added a new 
project on 1.5.0. I deleted all the index files in 
<workspace>\.metadata\.plugins\org.eclipse.jdt.core then restarted.

These are the numbers I am getting running I20040226 with -Xmx300M on a 
jdk1.4.1 VM:

memory used according to the TaskManager: less than 140Mb

time to reindex, save to disk & run AllTypesCache: 1 minute 40 seconds

Total disk space is 20.1Mb:
6 large index files (512K, 735K, 917K, 1.5Mb, 4.1Mb, 6.0Mb)
26 index files between 100K and 350K
~75 smaller index files that are less than 100K

Can you duplicate & let me know your numbers please.
Comment 6 Darin Swanson CLA 2004-02-27 13:59:30 EST
Will do test case in the next 2 hours
Comment 7 Darin Swanson CLA 2004-02-27 15:18:36 EST
I have the debug and Ant source projects (17 projects), and one test project 
set to use 1.5.
Delete the index files.
The task manager never indicates more than 100mb but I get two Out of Memory 
exceptions once the index files start showing up again.

I end up with 21 index files. The first OOM occurs right after a 
<number>.index.tmp file is created. 
I get only one large index file (4.2 mb)
Comment 8 Kent Johnson CLA 2004-02-27 15:38:55 EST
And you don't have any disk space problems?

This doesn't make any sense... if the OS thinks you're using < 100Mb of 300Mb 
allowed, why should you get OOMs?

I was able to get OOMs if I ran with default VM args (no -Xmx paramater at 
all). The OOMs started showing up at 96Mb.

Can you double check your VM args please.
Comment 9 Darin Swanson CLA 2004-02-27 15:40:54 EST
So I am a bonehead...I was incorrectly specifying the vmargs so I was just 
using the default heap size for this Eclipse session. Sorry.
Comment 10 Kent Johnson CLA 2004-02-27 15:44:45 EST
No problem.
Comment 11 John Wiegand CLA 2004-02-27 19:37:33 EST
I ran a related test on this one: new workspace, create a project named P1.
I watched the Windows memory usage.
In M7, the memory usage went from 30->60 Mbytes
In 226, the memory usage went from 30->90 Mbytes

My (unproven) claim is that this delta (30, 60 Mbytes respectively) is the 
amount of heap required to compute the index for rt.jar (jdk 1.4).

60 Mbytes seems like too much memory to require.
Comment 12 Kent Johnson CLA 2004-03-01 12:39:38 EST
Yes I'm seeing 35 to 93Mb.

I'll look into to it today.
Comment 13 Kent Johnson CLA 2004-03-01 16:03:42 EST
So to launch an empty workspace on the Java perspective, we're at ~34Mb.

To create the first project & run the indexer, but don't keep anything (ie. 
commented out the memoryIndex.addIndexEntry call) increases to ~44Mb... the 
cost to read every .class file & extract the info.

To keep it in memory but never save to disk, it increases to ~81Mb (~74Mb for 
rt.jar which is first).

Then to write each index out to disk, we jump to ~89Mb. The saved files on disk 
take 4.73Mb.

Once the AllTypesCache is finished, we're at ~93Mb.

Even though we index each jar one after the other & free the space, the GC 
never seems to reuse the space.

So I need to look into the 44Mb to 74/81Mb jump. The MemoryIndex representation 
is too EXPENSIVE!
Comment 14 Kent Johnson CLA 2004-03-01 16:18:44 EST
Forcing a GC at the end of the Index.save() settles the memory used to ~60Mb, 
but it still peaks at ~82Mb.

So we've got back the easiest 10Mb.
Comment 15 Kent Johnson CLA 2004-03-01 18:12:13 EST
For rt.jar, we're looking at storing ~1 million words in the index spread 
across 8 categories.

There are 85,000 unique words... so roughly every word is extracted 11 times 
from the 9189 files.

Interning the words reduces the peak memory use from ~82Mb to ~71Mb (the idle 
memory use is still ~60Mb). The additional cost of interning the words for 
rt.jar is 0.5 seconds (approximately 10%).
Comment 16 Kent Johnson CLA 2004-03-01 18:14:10 EST
Reopening to track the changes.
Comment 17 Kent Johnson CLA 2004-03-02 11:36:55 EST
So for rt.jar, we are extracting & storing 942,413 words with an average size 
of 10.74 characters.

Headers on char[] are 16 bytes & characters are 2 bytes each -> 35Mb!

So the bottom line is indexing every .class file in rt.jar can allocate over 
40Mb (reading & extracting) before we ever start remembering the info.

I also just ran M7 to check the numbers, it takes 68Mb on my machine (after 
starting at 33Mb)... so with the 2 changes we will now peak a little higher but 
settle down at 60Mb after rt.jar is saved.
Comment 18 John Arthorne CLA 2004-03-02 12:09:45 EST
*** Bug 53070 has been marked as a duplicate of this bug. ***
Comment 19 Philipe Mulet CLA 2004-03-02 12:12:47 EST
The manual call to GC feels like a hack.
What about building indexes for libraries (zip/jar) in a different manner ? 
where the merge would occur during the indexing process.
Comment 20 Kent Johnson CLA 2004-03-02 13:28:24 EST
Increased the threshold for the GC call to 1000 changes so it will only 
triggered by very large index files... we now peak and stay at 72Mb instead of 
shrinking to 60Mb.

The space is not in the representation but in the words generated by the 
indexer. By eliminating the duplicates we are in the same range as the old 
implementation.
Comment 21 Frederic Fusier CLA 2004-03-24 13:26:21 EST
Verified for 3.0 using build I200403240800.
Open a M7 workspace, open type, notice full re-indexing, modify some files.
No OutOfMemory error encountered.