Bug 281078 - Performance issue with org.eclipse.dltk.core.search.matching.MatchLocator
Summary: Performance issue with org.eclipse.dltk.core.search.matching.MatchLocator
Status: NEW
Alias: None
Product: DLTK
Classification: Technology
Component: Common (show other bugs)
Version: 1.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: dltk.common-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-22 09:42 EDT by Zviki Cohen CLA
Modified: 2009-06-23 10:30 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zviki Cohen CLA 2009-06-22 09:42:16 EDT
The org.eclipse.dltk.core.search.matching.MatchLocator has a resolveDuplicates which is used to resolve duplicate source reference elements when resolving a node from the AST to a model element. This method resolves duplication with between integers by using a hashset. This is extremely expensive. After profiling the code, I came to the conclusion that this has dramatic affect on the code performance. 

In the example below, I replaced the hashset with an array of booleans. True at position i indicates that the index i was already taken.

1. Declaration -

Replace: 
	private HashSet handles;

with:
	private static final int SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS = 30;
	private boolean[] sourceRefElementCache = new boolean[SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS];

	
2. Initialization - 
	in method reportMatching(ModuleDeclaration unit)
	
Replace:
	 this.handles = new HashSet();

with:
	clearSourceRefElementCache();
	
Add the method:
	private void clearSourceRefElementCache() {
		for (int i = 0; i < sourceRefElementCache.length; i++) {
			sourceRefElementCache[i] = false;
		}
	}
	
Remove the line:
	this.handles = null;
	
3. Resolution - 

The method resolveDuplicates(IMember handle) should look like this:
	protected void resolveDuplicates(IMember handle) {
		if (handle instanceof SourceRefElement) {
			try {
				SourceRefElement element = ((SourceRefElement) handle);
				while (sourceRefElementCache[element.occurrenceCount]) {
					element.occurrenceCount++;
				}
				sourceRefElementCache[element.occurrenceCount] = true;
			} catch (IndexOutOfBoundsException e) {
				resizeSourceRefElementCache();
				resolveDuplicates(handle);
			}
		}
	}

Add the following method:
	private void resizeSourceRefElementCache() {
		boolean newArray[] = new boolean[sourceRefElementCache.length
				+ SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS];
		for (int i = 0; i < sourceRefElementCache.length; i++) {
			newArray[i] = sourceRefElementCache[i];
		}
		for (int i = sourceRefElementCache.length; i < newArray.length; i++) {
			newArray[i] = false;
		}
		sourceRefElementCache = newArray;
	}
Comment 1 Michael Spector CLA 2009-06-22 11:23:48 EDT
May be I miss something, but I think this solution is incorrect. Handles hash set is used for incrementing occurrence counts of elements that have multiple matches. I don't quite understand how can you check that element A existed previously by checking it's occurrence count in the boolean array?
Comment 2 Alex Panchenko CLA 2009-06-22 11:37:15 EDT
I think contains() test is not that expensive, especially comparing to source parsing time. 

Could you please provide some numbers, to illustrate how this code affects overall performance?

Thank you,
Alex
Comment 3 Zviki Cohen CLA 2009-06-22 11:41:03 EDT
@Michael, 
I agree that it is not the same, but I believe it will work. The numbering or occurrences will not be sequential - but does it really matter?

If it does, you can still greatly improve the performance by keeping a hashtable from the sourceRefElement to the max occurrence. 
As a key to this table you may use the toString after temporarily setting the occurrence count to 0.

The set lookup of the original function is a very very expensive operation - it's a full collection scan with full object comparison for each item (the equals is a complex method for these objects).
This is being done in a loop several times until we get the right occurrence count.
Comment 4 Michael Spector CLA 2009-06-22 11:53:33 EDT
(In reply to comment #3)

Suppose we have 3 matches of class A and 1 match of function foo().
When resolving duplicates (in order: A, foo) using your method we'll get final numbers:

3 occurrences of A
4 occurrences of foo

Is this correct?

Comment 5 Michael Spector CLA 2009-06-22 11:58:32 EDT
(In reply to comment #3)

In addition, handle.contains() checks ModelElement.hashCode() first, then it invokes equals() if needed. What is a chance that hash code will be the same for different elements?

Comment 6 Zviki Cohen CLA 2009-06-22 12:00:42 EDT
As far as I can tell, yes. Again, I didn't find any problem with that in the code, but I can agree that this is not the functionality you had before.

You can still go with my second suggestion which will still be much much better than what you have today.
Comment 7 Zviki Cohen CLA 2009-06-23 08:47:19 EDT
Hi guys,

Is there a fix for this coming for 2.1 - this is really a major issue. 
I can send you a profiler output to prove my point (it will take me time to produce it, but I can do it).

Thanks,
Zviki
Comment 8 Alex Panchenko CLA 2009-06-23 10:28:30 EDT
Hi Zviki,

I do want to see the profiling results.

Thank you,
Alex
Comment 9 Alex Panchenko CLA 2009-06-23 10:30:08 EDT
(removed [AST] prefix from summary, since this issue is about search + structural model)