Bug 20844 - Indexing space usage
Summary: Indexing space usage
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows 2000
: P3 normal (vote)
Target Milestone: 2.1 M5   Edit
Assignee: Kent Johnson CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2002-06-21 16:39 EDT by Nick Edgar CLA
Modified: 2003-02-11 10:04 EST (History)
4 users (show)

See Also:


Attachments
Memory diff from OptimizeIt (data.html) (285.62 KB, text/html)
2002-06-21 16:40 EDT, Nick Edgar CLA
no flags Details
redbar.gif (to go with data.html) (842 bytes, image/gif)
2002-06-21 16:42 EDT, Nick Edgar CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Edgar CLA 2002-06-21 16:39:32 EDT
Build 20020621

- new workspace
- new Java project
- watch VM size in the Windows task list (need to add this column, which is 
more accurate than Mem Usage which is only the in-memory footprint)
- add all the plugin jars as external jars
- OK
- watch the VM size keep going up to about 170 meg
- doesn't go down when indexing is done
- profiling it shows that (after repeated GCs) 15843 instances of 
org.eclipse.jdt.internal.core.ClassFile are being held onto
- tracing the refs, they are all in the list of children of the corresponding 
JarPackageFragmentInfo, which is in the JavaModelCache
- there are 892 instances of JarPackageFragmentInfo

However, this doesn't account for the large VM size.
See the attached export from OptimizeIt.
The footprint for all new ClassFile, Object[], char[] and String instances 
only totals about 6 meg.
Comment 1 Nick Edgar CLA 2002-06-21 16:40:03 EDT
Created attachment 1553 [details]
Memory diff from OptimizeIt (data.html)
Comment 2 Nick Edgar CLA 2002-06-21 16:42:04 EDT
Created attachment 1554 [details]
redbar.gif (to go with data.html)
Comment 3 Nick Edgar CLA 2002-06-21 16:44:07 EDT
To get it to reindex, I did the following
- close all perspectives
- open resource perspective
- exit
- delete index files under .metadata\.plugins\org.eclipse.jdt.core
- restart
- open a .java file
Comment 4 Nick Edgar CLA 2002-06-21 16:51:57 EDT
I noticed that while indexing, the count for WordEntry keeps going up into the 
tens of thousands.  It did go down occasionally, but the numbers seem higher 
than expected.  Might be for the larger jars like rt.jar.
It does go down to 0 when done.
Comment 5 Nick Edgar CLA 2002-06-21 16:52:14 EDT
Turns out there's nothing new here.  KH know about this pattern.
Comment 6 Philipe Mulet CLA 2002-06-24 07:19:10 EDT
Jerome - please verify if 892 is a sound number of jar pkg roots

Not critical, but we should double check how we get 892 jar package fragment 
root infos. Might be a racing issue, where multiple identical infos are created 
at once (like we had a bug on NameLookup creation).

Post 2.0 we want to change our locking approach, to finer grain locks. 


Comment 7 Jerome Lanneluc CLA 2002-06-26 10:25:52 EDT
Following Nick's steps and running under OptimizeIt, I see only 227 
JarPackageFragmentInfos which is exactly the number of folders in my 1.3.1 
rt.jar. Other package fragment infos are not cached since the .java file that I 
opened is in a project that does not prereq the other plugin projects.

This shows that indexing doesn't create JarPackageFragmentInfos. The problem 
must be somewhere else.

Nick, if you have more details, please annotate this bug.
Comment 8 Nick Edgar CLA 2002-11-29 12:35:03 EST
Some more recent figures, using build I20021127.

- workspace contains a regular self-hosting setup: all Core and UI plugins in 
source, the rest in binary
- delete any index files under .metata\.plugins\org.eclipse.jdt.core
- start up workspace under OptimizeIt, in resource perspective with no editors 
open (so as not to activate JDT)
- force GC, check heap: 15M allocated, 6.8M used
- Window / Show View / Other... / Java / Hierarchy (to trigger JDT in a 
minimal way, i.e. no editors)
- it indexes everything
- just before the end of indexing, force GC, check heap: 160M allocated, 48M 
used 
- after indexing, force GC, check heap: 120M allocated, 9.2M used
- repeated GCs don't collapse the heap more (although it clearly collapsed a 
bit after indexing); I'm using the IBM 1.3.1 VM

It seems like the indexer hangs on to all IndexFile and all WordEntry objects 
until the end of indexing, when it does a final merge.  
Just before the end of indexing, OptimizeIt showed 13100 IndexedFile and 
279000 WordEntry instances (after a GC), and many strings and arrays that went 
along with this.  These both went down to 0 after indexing, but there was 
still a 2.4M difference in heap after indexing, although this may be due to 
other areas in JDT.

Indexing should merge more often.  The Index tries to maintain an estimate of 
its footprint and should merge when this reaches 10M, but this is apparently 
not working.  Another heuristic would be to merge after every few thousand 
indexed files (rt.jar would count as ~5600 files, not 1), in addition to 
merging at the end.

While indexing mostly cleans up at the end, it uses a lot of heap while 
indexing, which leaves the IBM VM with a large, mostly empty, heap at the end.
Merging more often would reduce this.

I did not see any ClassFile instances when just doing indexing, but they do 
appear when doing other operations like opening an editor (anything touching 
the Java model), which I did do in my original steps above.  I'll file a 
separate PR for this.
Comment 9 Kent Johnson CLA 2002-11-29 15:29:18 EST
I'm not convinced we have a problem with the 10Mb watermark not working... I 
suspect we are holding onto in-memory indexes longer than we need to... I'll 
look into it.
Comment 10 Kent Johnson CLA 2003-01-06 18:01:57 EST
Removed all index consistency checks from JDT Core startup.

Released changes that add save index jobs immediately after rebuildAll jobs to 
free up space. Previously, each index could keep upto 10Mb in memory before 
flushing the information to disk. Index files were only saved when the 
IndexManager was idle.

Now with SaveIndex jobs, rebuildAll jobs which fork 10-100 small update file 
jobs, are immediately followed by save jobs to free all the space.
Comment 11 David Audel CLA 2003-02-11 10:04:33 EST
Verified.