310258 – MiExpression.ExpressionInfo.hashCode() produces very inefficient behavior

Bug 310258 - MiExpression.ExpressionInfo.hashCode() produces very inefficient behavior

Summary: MiExpression.ExpressionInfo.hashCode() produces very inefficient behavior

Status:	NEW

Alias:	None

Product:	CDT
Classification:	Tools
Component:	cdt-debug-dsf-gdb (show other bugs)
Version:	0 DD 1.1
Hardware:	PC All

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	Project Inbox
QA Contact:	Jonah Graham

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-04-23 05:54 EDT by Derek Foster
Modified:	2020-09-04 15:21 EDT (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Derek Foster

2010-04-23 05:54:58 EDT

Build Identifier: 20100218-1602

Various people at my company noticed that when we were debugging, if the Variables View was open and contained a large array ("large" meaning around 4000 elements or more), if a large number of elements of the array were selected, then stepping or doing various other GUI debugging actions took upwards of several seconds.

After a long profiling/debugging session through the Eclipse source code, this slowdown turned out to be a result of a great many backed-up jobs in the UI thread.

This backup, in turn, turned out to be due to AbstractDVMNode.update(final
IHasChildrenUpdate[] updates), which saves the current tree selection, performs an update, and then restores the current tree selection. This save/restore was taking a very long time.

This, in turn, turned out to be due to TreeSelection.InitializeData taking a long time, which turned out to be due to a CustomHashTable instance that is being constructed. Inspecting this object revealed that all of the entries had the same hashCode(!).

The culprit ultimately turned out to be in MiExpression.ExpressionInfo, whose hashCode method looks like this:

@Override
        public int hashCode() {
            return (fullExpression == null ? 0 : fullExpression.hashCode()) ^ 
                   (relativeExpression == null ? 0 :
relativeExpression.hashCode());
        }

Note that in the above method, that if fullExpression.equals(relativeExpression), that this hashCode method will always return zero, regardless of what the values of fullExpression and relativeExpression happen to be. In the case of large arrays, this can be true for every element in the arrays (where the expressions might be foo[1], foo[2], foo[3], etc., with both relativeExpression and fullExpression set to the same value for each expression).

Hence a large number of objects are created with the same hashCode, and hence end up being put in the SAME hash bucket within the CustomHashTable instance within TreeSelection.InitializeData. This makes the hash table basically function as a LinkedList, with the same (horrible) performance characteristics (O(N) search, O(N^2) compare, etc.)

I rewrote the MiExpression.ExpressionInfo.hashCode() method above to look like this:

@Override
        public int hashCode() {
            final int hash1 = (fullExpression == null ? 0 :
fullExpression.hashCode());
            final int hash2 = (relativeExpression == null ? 0 :
relativeExpression.hashCode());
            return hash1 ^ (Integer.rotateLeft(hash2, Short.SIZE));
        }

and performance was much improved: Selecting large numbers of elements in the array no longer resulted in noticeable delays.

(Even after this fix, Eclipse still works quite badly in the presence of large arrays due to an SWT Tree bug/design deficiency which I will file right after I file this one. However, THIS bug is about it working much, much, worse when many elements of a large array are selected.)



Reproducible: Always

Steps to Reproduce:
1. Create a large array in C or C++ code (at least 4000 elements).
2. Start a debug session.
3. In the Variables View, select the array and most of its elements.
4. Step, or do other GUI actions.
5. Observe that response times are horrible. The GUI is essentially unusable.

Comment 1 Derek Foster

2010-04-23 06:32:43 EDT

The other bug that I mentioned above has now been filed as bug #310260 (' Large SWT Trees are horribly slow on Windows due to inefficient selection status fetch code').

Comment 2 Marc Khouzam

2010-04-23 08:58:52 EDT

(In reply to comment #0)

> The culprit ultimately turned out to be in MiExpression.ExpressionInfo, whose
> hashCode method looks like this:
> 
> @Override
>         public int hashCode() {
>             return (fullExpression == null ? 0 : fullExpression.hashCode()) ^ 
>                    (relativeExpression == null ? 0 :
> relativeExpression.hashCode());
>         }
> 
> Note that in the above method, that if
> fullExpression.equals(relativeExpression), that this hashCode method will
> always return zero, regardless of what the values of fullExpression and
> relativeExpression happen to be.

Woops.

What if we changed the exclusive or (^) with an addition followed by a divide by 2?
 
 
> @Override
>         public int hashCode() {
>             final int hash1 = (fullExpression == null ? 0 :
> fullExpression.hashCode());
>             final int hash2 = (relativeExpression == null ? 0 :
> relativeExpression.hashCode());
>             return hash1 ^ (Integer.rotateLeft(hash2, Short.SIZE));
>         }

Does the rotate give better distribution?

Comment 3 Derek Foster

2010-04-23 20:02:15 EDT

I tried something similar to that (actually, I tried fullExpression^(3*relativeExpression) and ended up with a fairly lumpy distribution (most hash buckets empty, with those non-empty ones containing 5 or more elements).

Your suggestion (fullExpression.hashCode()+relativeExpression.hashCode())/2 seems to perform nicely in the case where fullExpression.equals(relativeExpression), but it is not clear to me that it will perform well in the cases when they are not equal, since for any given value of low-order hash bits, it seems clear that at least two of these values (namely, replacing fullExpression with relativeExpression or vice versa) will produce aliasing. Other more subtle aliases can exist as well, although it is not easy to predict where they will occur.

Keep in mind that the data being fed to these values is likely to be sequential array elements extending, for instance, from "foo[0]" to "foo[4000]" for a 4000-element array. This kind of "counting index" substring is present in both the fullExpression and relativeExpression, and is probably equal in both of them. We don't want to take a chance of the hash contributions of the characters associated with those indices might totally or partially cancel each other out arithmetically, since that will produce a large number of duplicate hash codes (or at least duplicate low-order bits within the hash codes).

I used the rotate-and-xor approach in an attempt to create as little interference between the low-order bits of the two operands as possible. The low-order bits of the fullExpression end up offset 16 bits from the low-order bits of relativeExpression, so the potentially-interfering bits overlap very little and should therefore cancel each other out extraordinarily rarely, regardless of whether fullExpression.equals(relativeExpression) or not. In the light testing I did, this hash function seems to produce a fairly even distribution within the hash buckets.