65831 – search for all types slow/memory intensive [search]

Bug 65831 - search for all types slow/memory intensive [search]

Summary: search for all types slow/memory intensive [search]

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.0
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.0 RC3
Assignee:	Jerome Lanneluc
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2004-06-04 17:56 EDT by John Wiegand
Modified:	2004-06-18 09:20 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
The difference between Heap before and Heap after search for all types (191.13 KB, image/jpeg) 2004-06-05 18:29 EDT, Dirk Baeumer	no flags	Details
Screen shot showing the allocated objects (100.41 KB, image/jpeg) 2004-06-05 18:42 EDT, Dirk Baeumer	no flags	Details
Proposed patch (12.47 KB, patch) 2004-06-15 11:39 EDT, Jerome Lanneluc	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Wiegand

2004-06-04 17:56:57 EDT

20040604

New workspace/import all projects binary/linked
1. Search for all types named *.
This results in 22.8k types.
The operation takes 5 minutes and my task manager shows an increase of > 200M 
during the operation (from 60M to 290M)

Both of these numbers seem high.

#2.  pressing group by type makes Eclipse non-responsive for 10 minutes.  
(This is related to bug 63247 (treeviewer slow with many items) - annotating 
that PR to reflect this case.

Comment 1 Dirk Baeumer

2004-06-05 18:28:13 EDT

Tested the following scenario:

- all 3.0 plug-in in linked binary format
- 22812 types
- open eclipse and forced all indexes to be built
- closed/reopened Eclipse
- opened search result view
- opened search dialog
- took memory snapshot
- searched for all types (*, declaration)
- memory went up ~200 MB
- took memory snapshot

The actual diff between the two snapshots is 28MB (see attached screen shot). 
So it seems that we are generating a lot of intermediate garbage.

Comment 2 Dirk Baeumer

2004-06-05 18:29:38 EDT

Created attachment 11631 [details]
The difference between Heap before and Heap after search for all types

Comment 3 Dirk Baeumer

2004-06-05 18:41:27 EDT

I created another workspace containing the following projects as binary:

org.apache.ant
org.eclipse.core.resources
org.eclipse.core.resources.win32
org.eclipse.core.runtime
org.eclipse.core.runtime.compatibility
org.eclipse.jdt.core
org.eclipse.osgi
org.eclipse.team.core
org.eclipse.text
org.eclipse.update.configurator

Searching for all types in these projects excluding the runtime jars produces 
a result of 2078 types. While computing the result temporary objects of a 
total size of ~400MB are created. 

I will upload the memory dump to our ftp server on Monday. Thomas please have 
a look at the dump. 

Philippe, can you please look at the dump as well. It seems that a lot of 
objects are allocated in JDT/Core.

Comment 4 Dirk Baeumer

2004-06-05 18:42:25 EDT

Created attachment 11632 [details]
Screen shot showing the allocated objects

Comment 5 Thomas M??der

2004-06-07 11:46:34 EDT

I've looked at the allocation statistics in the 2000 type snapshot. 95% of all
allocations happen inside JavaSearchQuery.run(...), which pretty much just calls
the search engine. However, only 1% of the allocations occur inside the
SearchRequestor I pass to the search engine (see
NewSearchResultCollector.acceptSearchMatch(...)). If I understand this right,
94% of the allocations happen in the SearchEngine in Core, outside of our
influence. Unless I misunderstand the trace, there's nothing I can do on the
search UI side. 
Moving to JDT-Core.

Comment 6 Philipe Mulet

2004-06-10 07:53:51 EDT

Scanner#getLineEnds() alone is allocating close to 20Mb of transient objects 
(copy line tables). Reduced number significantly by reworking clients. There 
were 2 hot instances. JavadocParser creating a copy of these for every single 
Javadoc, and source element requestors (where they should have stolen the 
existing copy from compilation result). Early measurements show the transient 
memory for this drops down to 3Mb.

Comment 7 Philipe Mulet

2004-06-10 17:35:02 EDT

Rescheduling for RC3. I have a change in progress which should cut by 2 the 
source char[] stored in scanner (make unicode support more lazy).

Comment 8 Philipe Mulet

2004-06-10 19:34:19 EDT

On a smaller testcase (linked jdtcore with prereqs), search for '*' type decls 
was allocated 1,474 megs of transient memory. With the above scanner 
optimization, it drops to 971 megs.

Comment 9 Philipe Mulet

2004-06-14 06:04:34 EDT

Released scanner changes to HEAD, JCK tests are ok (lots of unicode tests).

Need to further check the search engine behavior on this scenario.
- it should not resolve any type name to find type declarations
- would reducing the amount of units processed at once benefit to search (500-
>250?).

Note: search is usually more memory intensive as build, since it will go and 
parse source attachments for all binaries (where build simply skip binaries). 
So search is always dealing with a bigger amount of sources to process.

Comment 10 Jerome Lanneluc

2004-06-15 09:56:26 EDT

Several problems remains:
1. The Java model is populated when it is not necessary:
- By using IType#isMember(), #isLocal() and #isAnonymous(),
  SourceMapper#findSourceFileName(IType, IBinaryType) is forcing a new
  ClassFileReader to be created even if we have one in hand: IBinaryType.
  -> Propose to change #findSourceFileName(...) to use IBinaryType#isMember(),
     #isLocal() and #isAnonymous() instead.
- By using IMember#getNameRange(), MatchLocator#reportBinaryMemberDeclaration
  (IResource, IMember, IBinaryTpe, int) is forcing the IMember to be opened.
  -> Propose to change #reportBinaryMemberDeclaration(...) to use the
     SourceMapper directly to find the name range if the member is not opened.
2. Resolution of possible matches is always requested even if the search
   pattern doesn't need it.
  -> Propose to change MatchLocator#locateMatches(JavaProject,PossibleMatch[], 
     int, int) to skip the resolution and process each possible match if the
     pattern doesn't need resolution.
3. ASTNodes are kept in its MatchingNodeSet after a possible match has been
   processed.
  -> Propose to change MatchLocator#locateMatches(JavaProject,PossibleMatch[], 
     int, int)  to nullify the node set when done with the possible match as
     we do for the source field.

With these 4 changes, I'm able to search for all types in a workspace 
containing all Eclipse SDK plugins and without increasing the VM's maximun 
Java heap size.

All JDT Core and JDT UI tests are green.

Comment 11 Jerome Lanneluc

2004-06-15 11:04:00 EDT

Reporting progress slows the whole process a lot also. We report progress for 
each possible match. Changing this to report progress for each batch of 500 
possible matches makes the whole search twice as fast.

Comment 12 Philipe Mulet

2004-06-15 11:08:20 EDT

Jerome: pls attach patch to this defect.

Comment 13 Jerome Lanneluc

2004-06-15 11:39:13 EDT

Created attachment 12160 [details]
Proposed patch

Comment 14 Jerome Lanneluc

2004-06-15 12:11:47 EDT

Entered bug 67276 against Platform Search for the slowness in their progress 
monitor.

Comment 15 Jerome Lanneluc

2004-06-15 12:21:42 EDT

Testing with the following scenario:
- workspace with org.eclipse.jdt.core as a linked binary project
- JRE : JDK1.4.2
- group by project in the search view
- search for '*' Type Declarations in Workspace
11 169 matches are found.

With 3.0 RC2:
- memory peak: 403 124 K
- time to find all the matches: 1 minute 50 sec

With 3.0 RC2 + Philippe's changes + attached patch:
- memory peak: 85 240 K
- time to find all the matches: 32 sec

Comment 16 Philipe Mulet

2004-06-15 12:27:16 EDT

Impressive numbers.

Comment 17 Philipe Mulet

2004-06-15 12:54:44 EDT

Approved by John and I for RC3.

Comment 18 Jerome Lanneluc

2004-06-15 13:05:52 EDT

Patch + changed to batch progress reporting released in HEAD.

Comment 19 David Audel

2004-06-18 09:20:41 EDT

Verified for 3.0RC3 I200406180010