281078 – Performance issue with org.eclipse.dltk.core.search.matching.MatchLocator

Bug 281078 - Performance issue with org.eclipse.dltk.core.search.matching.MatchLocator

Summary: Performance issue with org.eclipse.dltk.core.search.matching.MatchLocator

Status:	NEW

Alias:	None

Product:	DLTK
Classification:	Technology
Component:	Common (show other bugs)
Version:	1.0
Hardware:	All All

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	dltk.common-inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-06-22 09:42 EDT by Zviki Cohen
Modified:	2009-06-23 10:30 EDT (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zviki Cohen

2009-06-22 09:42:16 EDT

The org.eclipse.dltk.core.search.matching.MatchLocator has a resolveDuplicates which is used to resolve duplicate source reference elements when resolving a node from the AST to a model element. This method resolves duplication with between integers by using a hashset. This is extremely expensive. After profiling the code, I came to the conclusion that this has dramatic affect on the code performance. 

In the example below, I replaced the hashset with an array of booleans. True at position i indicates that the index i was already taken.

1. Declaration -

Replace: 
	private HashSet handles;

with:
	private static final int SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS = 30;
	private boolean[] sourceRefElementCache = new boolean[SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS];

	
2. Initialization - 
	in method reportMatching(ModuleDeclaration unit)
	
Replace:
	 this.handles = new HashSet();

with:
	clearSourceRefElementCache();
	
Add the method:
	private void clearSourceRefElementCache() {
		for (int i = 0; i < sourceRefElementCache.length; i++) {
			sourceRefElementCache[i] = false;
		}
	}
	
Remove the line:
	this.handles = null;
	
3. Resolution - 

The method resolveDuplicates(IMember handle) should look like this:
	protected void resolveDuplicates(IMember handle) {
		if (handle instanceof SourceRefElement) {
			try {
				SourceRefElement element = ((SourceRefElement) handle);
				while (sourceRefElementCache[element.occurrenceCount]) {
					element.occurrenceCount++;
				}
				sourceRefElementCache[element.occurrenceCount] = true;
			} catch (IndexOutOfBoundsException e) {
				resizeSourceRefElementCache();
				resolveDuplicates(handle);
			}
		}
	}

Add the following method:
	private void resizeSourceRefElementCache() {
		boolean newArray[] = new boolean[sourceRefElementCache.length
				+ SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS];
		for (int i = 0; i < sourceRefElementCache.length; i++) {
			newArray[i] = sourceRefElementCache[i];
		}
		for (int i = sourceRefElementCache.length; i < newArray.length; i++) {
			newArray[i] = false;
		}
		sourceRefElementCache = newArray;
	}

Comment 1 Michael Spector

2009-06-22 11:23:48 EDT

May be I miss something, but I think this solution is incorrect. Handles hash set is used for incrementing occurrence counts of elements that have multiple matches. I don't quite understand how can you check that element A existed previously by checking it's occurrence count in the boolean array?

Comment 2 Alex Panchenko

2009-06-22 11:37:15 EDT

I think contains() test is not that expensive, especially comparing to source parsing time. 

Could you please provide some numbers, to illustrate how this code affects overall performance?

Thank you,
Alex

Comment 3 Zviki Cohen

2009-06-22 11:41:03 EDT

@Michael, 
I agree that it is not the same, but I believe it will work. The numbering or occurrences will not be sequential - but does it really matter?

If it does, you can still greatly improve the performance by keeping a hashtable from the sourceRefElement to the max occurrence. 
As a key to this table you may use the toString after temporarily setting the occurrence count to 0.

The set lookup of the original function is a very very expensive operation - it's a full collection scan with full object comparison for each item (the equals is a complex method for these objects).
This is being done in a loop several times until we get the right occurrence count.

Comment 4 Michael Spector

2009-06-22 11:53:33 EDT

(In reply to comment #3)

Suppose we have 3 matches of class A and 1 match of function foo().
When resolving duplicates (in order: A, foo) using your method we'll get final numbers:

3 occurrences of A
4 occurrences of foo

Is this correct?

Comment 5 Michael Spector

2009-06-22 11:58:32 EDT

(In reply to comment #3)

In addition, handle.contains() checks ModelElement.hashCode() first, then it invokes equals() if needed. What is a chance that hash code will be the same for different elements?

Comment 6 Zviki Cohen

2009-06-22 12:00:42 EDT

As far as I can tell, yes. Again, I didn't find any problem with that in the code, but I can agree that this is not the functionality you had before.

You can still go with my second suggestion which will still be much much better than what you have today.

Comment 7 Zviki Cohen

2009-06-23 08:47:19 EDT

Hi guys,

Is there a fix for this coming for 2.1 - this is really a major issue. 
I can send you a profiler output to prove my point (it will take me time to produce it, but I can do it).

Thanks,
Zviki

Comment 8 Alex Panchenko

2009-06-23 10:28:30 EDT

Hi Zviki,

I do want to see the profiling results.

Thank you,
Alex

Comment 9 Alex Panchenko

2009-06-23 10:30:08 EDT

(removed [AST] prefix from summary, since this issue is about search + structural model)