Community
Participate
Working Groups
The org.eclipse.dltk.core.search.matching.MatchLocator has a resolveDuplicates which is used to resolve duplicate source reference elements when resolving a node from the AST to a model element. This method resolves duplication with between integers by using a hashset. This is extremely expensive. After profiling the code, I came to the conclusion that this has dramatic affect on the code performance. In the example below, I replaced the hashset with an array of booleans. True at position i indicates that the index i was already taken. 1. Declaration - Replace: private HashSet handles; with: private static final int SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS = 30; private boolean[] sourceRefElementCache = new boolean[SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS]; 2. Initialization - in method reportMatching(ModuleDeclaration unit) Replace: this.handles = new HashSet(); with: clearSourceRefElementCache(); Add the method: private void clearSourceRefElementCache() { for (int i = 0; i < sourceRefElementCache.length; i++) { sourceRefElementCache[i] = false; } } Remove the line: this.handles = null; 3. Resolution - The method resolveDuplicates(IMember handle) should look like this: protected void resolveDuplicates(IMember handle) { if (handle instanceof SourceRefElement) { try { SourceRefElement element = ((SourceRefElement) handle); while (sourceRefElementCache[element.occurrenceCount]) { element.occurrenceCount++; } sourceRefElementCache[element.occurrenceCount] = true; } catch (IndexOutOfBoundsException e) { resizeSourceRefElementCache(); resolveDuplicates(handle); } } } Add the following method: private void resizeSourceRefElementCache() { boolean newArray[] = new boolean[sourceRefElementCache.length + SOURCE_REF_ELEMENT_CACH_SIZE_INCREMENTS]; for (int i = 0; i < sourceRefElementCache.length; i++) { newArray[i] = sourceRefElementCache[i]; } for (int i = sourceRefElementCache.length; i < newArray.length; i++) { newArray[i] = false; } sourceRefElementCache = newArray; }
May be I miss something, but I think this solution is incorrect. Handles hash set is used for incrementing occurrence counts of elements that have multiple matches. I don't quite understand how can you check that element A existed previously by checking it's occurrence count in the boolean array?
I think contains() test is not that expensive, especially comparing to source parsing time. Could you please provide some numbers, to illustrate how this code affects overall performance? Thank you, Alex
@Michael, I agree that it is not the same, but I believe it will work. The numbering or occurrences will not be sequential - but does it really matter? If it does, you can still greatly improve the performance by keeping a hashtable from the sourceRefElement to the max occurrence. As a key to this table you may use the toString after temporarily setting the occurrence count to 0. The set lookup of the original function is a very very expensive operation - it's a full collection scan with full object comparison for each item (the equals is a complex method for these objects). This is being done in a loop several times until we get the right occurrence count.
(In reply to comment #3) Suppose we have 3 matches of class A and 1 match of function foo(). When resolving duplicates (in order: A, foo) using your method we'll get final numbers: 3 occurrences of A 4 occurrences of foo Is this correct?
(In reply to comment #3) In addition, handle.contains() checks ModelElement.hashCode() first, then it invokes equals() if needed. What is a chance that hash code will be the same for different elements?
As far as I can tell, yes. Again, I didn't find any problem with that in the code, but I can agree that this is not the functionality you had before. You can still go with my second suggestion which will still be much much better than what you have today.
Hi guys, Is there a fix for this coming for 2.1 - this is really a major issue. I can send you a profiler output to prove my point (it will take me time to produce it, but I can do it). Thanks, Zviki
Hi Zviki, I do want to see the profiling results. Thank you, Alex
(removed [AST] prefix from summary, since this issue is about search + structural model)