Bug 168354 - [indexing] Java Type Indicator eats CPU time
Summary: [indexing] Java Type Indicator eats CPU time
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 3.3   Edit
Hardware: PC All
: P3 major (vote)
Target Milestone: 3.3 M6   Edit
Assignee: JDT-UI-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on: 138309
Blocks:
  Show dependency tree
 
Reported: 2006-12-17 09:44 EST by Michael Woski CLA
Modified: 2007-03-20 11:44 EDT (History)
8 users (show)

See Also:


Attachments
snapshot 1 (112.25 KB, text/html)
2006-12-20 06:45 EST, Martin Aeschlimann CLA
no flags Details
hot spots in 3.3 M4, with classfiles (12.82 KB, text/html)
2006-12-20 06:46 EST, Martin Aeschlimann CLA
no flags Details
hot spots in 3.3 M4, with source files (72.72 KB, text/html)
2006-12-20 06:47 EST, Martin Aeschlimann CLA
no flags Details
hot spots in 3.2.2, with class files (8.97 KB, text/html)
2006-12-20 06:47 EST, Martin Aeschlimann CLA
no flags Details
Performance test results comparing different ways to get flags (15.13 KB, text/plain)
2007-03-15 05:15 EDT, Frederic Fusier CLA
no flags Details
Patch to apply on InterfaceIndicatorLabelDecorator (2.79 KB, patch)
2007-03-15 06:15 EDT, Frederic Fusier CLA
no flags Details | Diff
patch (2.50 KB, patch)
2007-03-15 10:11 EDT, Martin Aeschlimann CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Woski CLA 2006-12-17 09:44:55 EST
Build ID: I20061214-1445

Steps To Reproduce:
1. enter a big package in a library or project
2. take a break
3.


More information:
basically, the system thread that calculates the decorators is  on heavy business as soon as the "Java Type Indicator" in the preferences is activated and one chooses to open a package in the Project Explorer. It not only uses the CPU to the brim but also uses tons of memory.
I've started realizing this since 3.3M4.
Comment 1 Martin Aeschlimann CLA 2006-12-20 06:44:13 EST
You are right. Decorations on class files inside JARs is really slow.
It seems that using the search engine to find out the 'type' (class/interface/...) of the public type is not very efficient here.

My setup:
- Fresh workspace (3.2.1, 3.3 M1, M3 and M4) started with JDK 1.5_10
- Turn off 'Java Type Indication' on General > Appearance > Label Decorators
- create a test project, open the JRE container and expand 'java.io' (nothing else expanded)
- Now turn on  'Java Type Indication' on General > Appearance > Label Decorators
and wait for label decorations (for example on 'Closable' an interface decorator will appear)

With 3.2.1 this goes very quickly: There we used IType.getFlags()
With 3.3 this takes very long: There we use the search engine as recommended by Philippe (see InterfaceIndicatorLabelDecorator)

This is not a problem for ICompilationUnits where using the search engine is very fast.

Please let me know if I should use again IType.getFlags() for class files or if this is a problem in the search engine.
Performance traces attached.
Comment 2 Martin Aeschlimann CLA 2006-12-20 06:45:59 EST
Created attachment 55954 [details]
snapshot 1
Comment 3 Martin Aeschlimann CLA 2006-12-20 06:46:47 EST
Created attachment 55955 [details]
hot spots in 3.3 M4, with classfiles
Comment 4 Martin Aeschlimann CLA 2006-12-20 06:47:05 EST
Created attachment 55956 [details]
hot spots in 3.3 M4, with source files
Comment 5 Martin Aeschlimann CLA 2006-12-20 06:47:34 EST
Created attachment 55957 [details]
hot spots in 3.2.2, with class files
Comment 6 Frederic Fusier CLA 2006-12-21 04:55:52 EST
The problem is that index category table is not cached when table has more than 10,000 entries and unfortunately, this is the case for rt.jar (11,772 classes)...
So, each search all type names request opens and reads the entire index file and makes this request slower...

Philippe, we need to talk about this issue to figure out what could be the best strategy in this case...
Comment 7 Frederic Fusier CLA 2007-01-12 06:38:46 EST
I've talked with Philippe about this issue... It seems we need to increase the limit of entries for cached category table.

I've made some measure on rt.jar of 1.4.2, 5.0 and 6.0 Sun VMs and here are the results I got:
  - 1.4.2: 'typeDecl' table has 8559 keys, estimated size = 1001540 bytes
  - 5.0:   'typeDecl' table has 11772 keys, estimated size = 1449332 bytes
  - 6.0:   'typeDecl' table has 14572 keys, estimated size = 1834954 bytes

So, it sounds like the limit needs to be pushed to at least 15,000 but perhaps more (20,000?) in prevision of next JDK versions...

As the size of the cached table is near multiplied by 2, we need to optimize table memory size to avoid memory footprint increasing. Bug 138309 already proposes some ways for this optimization.

I think we need to find some others, typically try to reduce the 'typeDecl' table by interning package names the same way it is done in TypeDeclarationPattern
Comment 8 Frederic Fusier CLA 2007-01-24 13:54:17 EST
See patch attached to bug 138309
Comment 9 Philipe Mulet CLA 2007-01-25 09:08:06 EST
Candidating the cache size increase for 3.2.2. If the cache is defeated, performances are terrible.
Comment 10 Frederic Fusier CLA 2007-01-25 11:08:15 EST
Released for 3.3 M5 in HEAD stream.
Leave it opened as we'll backport this fix to R3_2_maintenance stream.
Comment 11 Frederic Fusier CLA 2007-01-25 11:13:42 EST
Released for 3.2.2 in R3_2_maintenance stream.
Comment 12 Frederic Fusier CLA 2007-01-25 11:20:05 EST
(In reply to comment #11)
> Released for 3.2.2 in R3_2_maintenance stream.
> 
Note that, of course, this is not the patch attached to bug 138309 which has been released in R3_2_maintenance stream but just the change of the threshold from 10,000 to 20,000 to cache category table...

So, the patch applied for this fix in R3_2_maintenance stream was:
### Eclipse Workspace Patch 1.0
#P org.eclipse.jdt.core
Index: search/org/eclipse/jdt/internal/core/index/DiskIndex.java
===================================================================
RCS file: /cvsroot/eclipse/org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/DiskIndex.java,v
retrieving revision 1.49.4.2
diff -u -r1.49.4.2 DiskIndex.java
--- search/org/eclipse/jdt/internal/core/index/DiskIndex.java	4 Aug 2006 14:48:01 -0000	1.49.4.2
+++ search/org/eclipse/jdt/internal/core/index/DiskIndex.java	25 Jan 2007 16:19:26 -0000
@@ -618,7 +618,7 @@
 		this.categoryTables.put(categoryName, categoryTable);
 		// cache the table as long as its not too big
 		// in practise, some tables can be greater than 500K when the contain more than 10K elements
-		this.cachedCategoryName = categoryTable.elementSize < 10000 ? categoryName : null;
+		this.cachedCategoryName = categoryTable.elementSize < 20000 ? categoryName : null;
 	} finally {
 		stream.close();
 	}
Comment 13 Kent Johnson CLA 2007-01-26 13:57:14 EST
Looked at reducing the size of a type declarations table.

For JDK6, there are 15841 documents in the index. 14315 type declarations reference a single file & 256 reference more than 1 (anonymous types).

The category table size is mostly because of:

14571 keys, 670,766 characters (avg. size is 46 chars) -> ~1.5Mb for char[]s

As for the int[] values, it would be possible to save ~160K if we changed the format of the values from mostly int[1] to just an int. BUT this could not be done for reference tables that have very few, if any int[].length == 1, since it would take more space to store their int[]s. Is 160K out of 1.7Mb worth the added complexity of declaration vs. reference tables? I'm not convinced it is.

Clearly the majority of the space is in the keys, which the DiskIndex does not control. Even though we know 20-30 of the 46 characters of each key is the package name, we would lose any savings (to object headers) if we create other objects by splitting the char[] into 2-3 parts.

Unless we can come up with a way to replace the package name with an int in the key, I don't see how we can reduce the size of the table significantly.

We could create an interned name table & map a small number of names (ie. only package names) to ints. Then those names could be replaced with their int value in the keys of the index category tables.
Comment 14 Kent Johnson CLA 2007-01-26 14:14:05 EST
Since TypeDeclarationPattern already has an in-memory:

// want to save space by interning the package names for each match
static PackageNameSet internedPackageNames = new PackageNameSet(1001);

We should change it to a table that maps the name to an int & persist it. Then each key in the TypeDeclarationPattern could substitute the int for the package name.
Comment 15 Frederic Fusier CLA 2007-01-26 14:38:28 EST
(In reply to comment #14)
> Since TypeDeclarationPattern already has an in-memory:
> 
> // want to save space by interning the package names for each match
> static PackageNameSet internedPackageNames = new PackageNameSet(1001);
> 
> We should change it to a table that maps the name to an int & persist it. Then
> each key in the TypeDeclarationPattern could substitute the int for the package
> name.
> 
I already thought about this possibility, but then there will be a problem when this set is rehashed, won't there?
Comment 16 Kent Johnson CLA 2007-01-26 14:48:48 EST
Replace the set with a compound table that maps the package name to an int ID AND also stores the reverse lookup by holding onto a char[][], where each int ID is the index of the package name.

Adding a new package name would grow the char[][] and the added spot would be the int ID used to store the name in the lookup table.

This way its quick to find the ID associated with a package name and also quick to decode the name given its ID.
Comment 17 Maxime Daniel CLA 2007-02-06 03:02:03 EST
Using the following test case, we are twice faster with M5 than with M4, but still three+ times slower with M5 than with 3.2.1. Discussed with Frédéric and decided to reopen for now.

Test case (same as Martin's but with jdk6):
- open Eclipse on a new workspace;
- create a simplistic Java project that uses jdk 6;
- expand jdk/rt.jar/java.io;
- switch 'Java Type Indication' off and on.
Comment 18 Frederic Fusier CLA 2007-02-06 13:18:57 EST
There was also a problem with the read of index file document names while performing the search all type names request. DiskIndex uses a cache for these names (cachedChunks) but this cache is systematically reset at the end of the query (stopQuery() method).

On the scenario given by Maxime (but using a 1.5 VM) I got following numbers to decorate all types:
 - 3.2.1: 1,1 second 
 - 3.3 M5: 4,3 second (2,6 spent in DiskIndex.cacheDocumentNames())
Comment 19 Kent Johnson CLA 2007-02-06 13:26:55 EST
Martin, is there any chance you can call:

searchAllTypeNames(char[][] qualifications, char[][] typeNames, ...)

Do you know all the types from the jar file up front or just 1 at a time?
Comment 20 Frederic Fusier CLA 2007-02-06 14:03:32 EST
Baxk to JDT/UI land as I got new numbers while changing a little bit the searchAllTypeNames request done in InterfaceIndicatorLabelDecorator.

In method getOverlayWithSearchEngine, there's currently one (or sometimes several requests) per type with following parameters:
    engine.searchAllTypeNames(
        null,
        SearchPattern.R_EXACT_MATCH,
        null,
        SearchPattern.R_EXACT_MATCH,
        IJavaSearchConstants.TYPE,
        scope,
        requestor,
        IJavaSearchConstants.WAIT_UNTIL_READY_TO_SEARCH,
        null);

This is definitely inefficient has there's neither package nor type name given to the request. The scope is reduced to the java element but it only reduce the search to the jar for class file.... That explains why all the document names of rt.jar are cached and consume so much times.

If I change the request as follow:
    String typeName = element.getElementName();
    typeName = typeName.substring(0, typeName.indexOf('.'));
    engine.searchAllTypeNames(
        null,
        SearchPattern.R_EXACT_MATCH,
        typeName.toCharArray(),
        SearchPattern.R_EXACT_MATCH,
        IJavaSearchConstants.TYPE,
        scope,
        requestor,
        IJavaSearchConstants.WAIT_UNTIL_READY_TO_SEARCH,
        null);

then the time spent in getOverlayWithSearchEngine to decorate all types of the java.io package is:
 - cold (ie, just after eclipse has started): around 400ms
 - warm (ie. Java Type Indicator set/unset twice): 200-250ms

So, this solution sounds to be enough to definitely fix the problem as these numbers are at least two times faster than 3.2.1 :-)

Comment 21 Dani Megert CLA 2007-02-07 09:42:37 EST
The suggested optimization goes into the right direction but
- it will not handle types like 'EntrySet' that are in 'HashMap$EntrySet.class'
- JDT UI is the wrong area for this as it already tells JDT Core to only
  search in a CU or class file i.e. this performance optimization has to go
  into JDT Core so that all other clients can profit. From the Javadoc of the
  search engine it is not obvious that giving the type and package name is faster
  than providing the exact Java element in the scope

Frédéric and I agreed to do the following:
- JDT Core will do this optimization but will need more time for a clean and
  well tested fix. Hence this will be done for M6.
- for M5 JDT UI will put in a fix along comment 20 for class files but with 
  better handling for inner class types

Side Note: the fix that was put into 3.2.2 should probably go into its own bug report and this one be retargeted for 3.3 M6.
Comment 22 Dani Megert CLA 2007-02-07 09:51:08 EST
Committed workaround to HEAD.
Will be in one of the next M5 candidate builds.
Comment 23 Frederic Fusier CLA 2007-02-07 11:25:49 EST
As suggested, I've opened separate bug 173279 for the category table entries number limit increase and set current target t 3.3 M6.
I've changed the entry in R3_2_maintenance and HEAD buildnotes to reflect bug change...
Comment 24 Frederic Fusier CLA 2007-03-15 05:15:39 EDT
Created attachment 60909 [details]
Performance test results comparing different ways to get flags

I wrote a test corresponding to the initial test case and took some measures using different ways to get the flags on each class file of the java.io package of rt.jar (1.4.2).

Full results of these tests are in this attached file.
Here's the summary:

Search all type names request:			Time	Heap
=============================
1) no package, no type name:			4.26s	9.46M
2) no package, no type name, case sensitive:	4.16s	12.05M
3) package name, no type name:			1.82s	9.64M
4) package name, no type name, case sensitive:	1.01s	9.6M
5) no package, type name:			0.521s	1.89M
6) no package, type name, case sensitive:	0.438s	1.94M
7) package, type name:				0.258s	4.67M
8) package, type name, case sensitive:		0.130s	4.67M

JavaModel getFlags():
====================
1) No class files opened:			0.035s	1.88M
1) All class files opened:			0.0068s	715
Comment 25 Frederic Fusier CLA 2007-03-15 05:45:49 EDT
The two results to compare in comment 24 numbers are the test 8) of search all type names request and test 1) of java model.

As even the best search all type names request is still 4 times slower and consumes around 3 times more memory than the getFlags() when the java element is not opened, I do not think we can apply this change in JDT/Core directly in the getFlags() method as suggested during the JDT/Call 2 weeks ago.

The other solution would be to create a new API method equivalent to getFlags() and which would not populate the model (getModifiers() for example), but sounds like a little bit late 2 days before the API freeze...

My conclusion is that the fix must be done in JDT/UI land for now and leave the getFlags() method working as it does right now (i.e. open the java element to get the modifiers).

Note: searchAllTypeNames request has been already optimized (see releng performance tests results on last I20070313-1051 build: http://fullmoon.ottawa.ibm.com/downloads/drops/I20070313-1051/performance/performance.php)
and I didn't find any easy other way to speed-up this request...
Comment 26 Frederic Fusier CLA 2007-03-15 06:15:12 EDT
Created attachment 60913 [details]
Patch to apply on InterfaceIndicatorLabelDecorator

With this patch CPU does no longer heat and I can see decorator appear in a reasonable amount of time (around 2s on my box)...
Comment 27 Martin Aeschlimann CLA 2007-03-15 10:11:12 EDT
Created attachment 60941 [details]
patch

I suggest to use this patch. JavaElementUtil.getMainType(unit) is not something to use (very inefficient). Please comment so I can release.
Comment 28 Frederic Fusier CLA 2007-03-15 10:23:48 EDT
(In reply to comment #27)
> Created an attachment (id=60941) [details]
> patch
> 
> I suggest to use this patch. JavaElementUtil.getMainType(unit) is not something
> to use (very inefficient). Please comment so I can release.
> 
Fine by me :-)
Comment 29 Martin Aeschlimann CLA 2007-03-16 05:17:58 EDT
patch released > 20070316
Comment 30 Frederic Fusier CLA 2007-03-16 05:21:13 EDT
Move to JDT/UI as the fix was released in JDT/UI code...
Comment 31 Martin Aeschlimann CLA 2007-03-16 05:23:54 EDT
mark as fixed
Comment 32 Dani Megert CLA 2007-03-20 11:44:14 EDT
Verified in I20070320-0010.