Bug 102279 - [search] method reference performance depends on method name
Summary: [search] method reference performance depends on method name
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.1   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.6 M6   Edit
Assignee: Frederic Fusier CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 182346 187227
  Show dependency tree
 
Reported: 2005-06-30 05:03 EDT by David Saff CLA
Modified: 2010-03-09 06:39 EST (History)
8 users (show)

See Also:


Attachments
Draft patch (12.97 KB, patch)
2010-02-04 13:18 EST, Frederic Fusier CLA
no flags Details | Diff
Proposed patch (5.66 KB, patch)
2010-02-05 11:27 EST, Frederic Fusier CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Saff CLA 2005-06-30 05:03:04 EDT
In 3.1.  Open FailureTrace (in the junit plug-in).  Open Call Hierarchy on the
method refresh.  On my system, it takes about 5.5 seconds.  Now, rename refresh
to refresh2, and Open Call Hierarchy again.  On my system, it takes about .5
seconds.  refresh neither overrides nor is overridden, so the difference seems
to come from the fact that my workspace has 188 declared methods named
"refresh", and 0 named "refresh2".  However, there is no semantic relationship
between FailureTrace.refresh and the 187 other methods with the same name.  What
would it take to build the search indices to take advantage of this fact?
Comment 1 Dirk Baeumer CLA 2005-07-14 12:27:46 EDT
Markus, I think this belongs to core since there isn't much we can do here in UI.
Comment 2 Markus Keller CLA 2005-07-14 12:39:57 EDT
Yes, moving to JDT/Core.
This is a general problem with search for method references. 
Comment 3 Frederic Fusier CLA 2005-07-20 06:06:35 EDT
Search needs to verify that there's no relationship between your pattern and
entries returned by indexer. Search needs to parse file containing these entries
to know whether these possible matches may be accurate or not...
Don't know if there's something to do in this peculiar case, but I'll have a look.
Comment 4 Markus Keller CLA 2005-07-20 06:27:16 EDT
The index for method invocations could contain information about the target type
and prune non-matching index entries without parsing those files.
Comment 5 Frederic Fusier CLA 2006-03-17 04:49:29 EST
Unfortunately indexing cannot store any other information for method callers than method name and number of arguments. Parser definitely needs resolution to have more information on them and indexing does not and won't use resolution for performances concerns...
So, I'm afraid that there's nothing we can do to fix this problem with current design...
Please reopen and increase severity if you think we definitely need to find a solution to it.
Comment 6 quartz quartz CLA 2006-12-15 14:11:51 EST
Methods should simply be indexed by declaring classes AND signature, not only signature. Having experienced the problem with 'add', 'get' and 'close', it is clear to me that having the owning class/interface as part of the primary key to the method is mandatory to improove the search performance.

If the the search engine is text aware but not java aware, (and I foxus on thf 'if') then there are no reasons to avoid improoving the JDT. I feel such java search should be delegated or implemented in a fully java-aware layer, for  opportunity reasons.

The book keeping performance of such lookup would be simply a cloned variant of the current lookup. If the current indexing cost 1x, having an index that adds the declaring class to primary key would overall cost 2x.

When a piece of code uses FileOutputStream.close(), it registers its location under the OutputStream.close() anyway, since FileOutputStream isn't declaring close(). Then if you search for calls to OutputStream.close() (not any other close like Socket.close()), the java-aware search would be launched with OutputStream.close() and it would return a narrower list of locations. Then, the filtering for FileOutputStream.close() can occur, if required. If you want immediate searching of FileOutputStream.close(), it requires the calling code to register with all pairs of possible superclass, superinterface, declaringclass, or declaring interface with the method. (Such book keeping is ugly I gree, that why I would only register under the 1 declaring class/interface + method).

The only memory impact it the number of entries in such new lookup map, since, for example, the OutputStream.close() key is not equals to the Socket.close() key (factorization of all colliding methods names under their declaring classes/interfaces).

This issue should be reopenned as an improovement for the search engine.
Comment 7 Markus Keller CLA 2007-10-23 10:49:14 EDT
(In reply to comment #5)
> So, I'm afraid that there's nothing we can do to fix this problem with current
> design...

Maybe we just didn't check the right places to improve the search speed. What I usually do now if I run into an often-used method name (see comment 1), is this:

- rename the method to refresh2
- save and let the incremental Java builder run
- all references show up in the the problems view as errors ("The method refresh2(..) is undefined for the type ..")

This is much faster than letting the search engine parse all files that contain a call to any refresh(..) method.

This optimization could be implemented right in the search engine: If there are a lot of index matches and the workspace is fully built, only parse the CUs that have a dependency on the method declaration's CU. The builder must know these dependencies, otherwise an incremental build would take much longer.


Comment 8 Frederic Fusier CLA 2007-10-23 11:11:18 EDT
(In reply to comment #7)
> (In reply to comment #5)
> > So, I'm afraid that there's nothing we can do to fix this problem with current
> > design...
> 
> Maybe we just didn't check the right places to improve the search speed. What I
> usually do now if I run into an often-used method name (see comment 1), is
> this:
> 
> - rename the method to refresh2
> - save and let the incremental Java builder run
> - all references show up in the the problems view as errors ("The method
> refresh2(..) is undefined for the type ..")
> 
> This is much faster than letting the search engine parse all files that 
> contain a call to any refresh(..) method.
> 
> This optimization could be implemented right in the search engine: If there are
> a lot of index matches and the workspace is fully built, only parse the CUs
> that have a dependency on the method declaration's CU. The builder must know
> these dependencies, otherwise an incremental build would take much longer.
> 
Why don't you simply use a search only in sources?

I think if there's a strong need for this requirement, it would be a more simple way to investigate: turn off the parse of files in libraries if too many matches are found in jar indexes...
Comment 9 Markus Keller CLA 2007-10-23 11:42:35 EDT
> Why don't you simply use a search only in sources?

I just tried it again and measured the time to search for references to org.eclipse.ltk.core.refactoring.Change.getParent() in my workspace
(all values warm, i.e. do the operation twice and only measure 2nd run):

No matter whether I turn off libraries or not, search takes about 30s until the first result is shown and 50s until the search is done. Incremental build, however, is done in 7s, which is a factor of 4-7 better.
Comment 10 Kent Johnson CLA 2007-10-23 12:15:54 EDT
We already have a notion of compound searches, so is it possible to search for references to the type org.eclipse.ltk.core.refactoring.Change + senders of the method getParent() ?

Then use the reduced set to parse/match correctly.
Comment 11 Markus Keller CLA 2007-10-23 12:27:45 EDT
> We already have a notion of compound searches, so is it possible to search for
> references to the type org.eclipse.ltk.core.refactoring.Change + senders of the
> method getParent() ?

Nope, the invocation c.getParent() could be a polymorphic call (type of c is a subtype of Change). In this case, the CU containing the invocation does not have to reference the type Change explicitly.
Comment 12 Markus Keller CLA 2007-11-22 13:13:00 EST
(In reply to comment #5)
> Please reopen and increase severity if you think we definitely need to find a
> solution to it.

We should really see whether comment 7 could be implemented. I almost starved to death while searching for references to my own getShell() method. And this problem will inevitably become worse with bigger and bigger workspaces.
Comment 13 Frederic Fusier CLA 2007-12-06 06:59:04 EST
First need to optimize the PossibleMatch storage in PossibleMatchSet and see the new performance results after it has been implemented...
Comment 14 Olivier Thomann CLA 2009-08-13 10:54:27 EDT
Frédéric, could you please investigate this one before you move to another project ?
Comment 15 Frederic Fusier CLA 2009-10-05 04:45:43 EDT
(In reply to comment #9)
> > Why don't you simply use a search only in sources?
> 
> I just tried it again and measured the time to search for references to
> org.eclipse.ltk.core.refactoring.Change.getParent() in my workspace
> (all values warm, i.e. do the operation twice and only measure 2nd run):
> 
> No matter whether I turn off libraries or not, search takes about 30s until the
> first result is shown and 50s until the search is done. Incremental build,
> however, is done in 7s, which is a factor of 4-7 better.

I've also made some measures and get different results...

My workspace have all JDT sources, i.e. all projects got from JDT/Core, JDT/UI and JDT/Debug releng map files. Searching for all references in the workspace of Change.getParent() takes around 7 seconds in this workspace when I precise only sources in the Java Search dialog... For comparison, the incremental build solution takes 5-6 seconds in the same workspace.

So, I do not observe the same improvement factor... Do I miss something?
Comment 16 Markus Keller CLA 2009-10-05 06:48:30 EDT
My dev workspace also contains Ant and PDE in source (without tests). With I20090929-0800, I get these numbers for a search to the IMethod Change#getParent() in warm state (i.e. not realistic in real life):
- all "Search In" options enabled: 27s
- "JRE libraries" disabled: 23s
- only "Sources": 21s

Add a "2" after name of Change#getParent() and save: 8s until incremental build is done. And the building includes all the overhead of writing files, adding problems, etc.

For comparison, search for references to Change#setEnabledShallow(boolean) is not measurable by hand, i.e. < 1s.


> So, I do not observe the same improvement factor... Do I miss something?

Maybe getParent() is not frequent enough in your workspace to make it obvious. Furthermore, class Change has quite a few dependencies, so it's not that good as an example because it makes incremental building slow as well (but I guess most of the slowness is disk IO, which would be less of a problem if you only read dependencies).

In UI code, "refresh" and "getShell" are even better examples, e.g. to find references to org.eclipse.jface.preference.ListEditor.getShell(), I need 65s, while incremental build is done in 2s.
Comment 17 Olivier Thomann CLA 2009-11-03 11:35:56 EST
This would be a good item to look at post current work on the formatter.
Targetting M4 for now. Will reconsider once we access how much work is needed.
Comment 18 Olivier Thomann CLA 2009-12-07 11:07:44 EST
Frédéric, please investigate for M5.
Comment 19 Frederic Fusier CLA 2010-01-06 13:40:57 EST
(In reply to comment #16)

With additional projects, I can get the same numbers...

However, I'm still not convinced that this solution is feasible. At a first look, the builder status does not seem to be publicly accessible, hence it may be really difficult to get its content for the Search Engine.

But of course this is not the difficulty which must stop us... However, even if we could access the builder status content, it might be out-of-date or even not exist if no full build has been done before the search...

So, as Olivier remind us for another bug, the accuracy of the compiler and its tools is the first law... Hence, the S.E. would always need to search by itself when the builder status was not accurate (which is a common case when the 'Build automatically' preference is not checked).

An other point is that when I looked at the incremental builder, I saw that it was of course listening to resource events to know which resources will need to be updated in the status. Currently the S.E. does not listen to resources events. If we decide to change it, this would imply big changes that unfortunately we could not afford for 3.6 version.

So, my conclusion is that:
1) I strongly doubt that using the build status could be the solution here
2) I agree that if we really want to fix this, a possible idea would be that the S.E. implement a new strategy to have something similar to the builder and indexing one (i.e. status + listen event resources)
Comment 20 Markus Keller CLA 2010-01-07 05:25:13 EST
> Hence, the S.E. would always need to search by itself
> when the builder status was not accurate (which is a common case when the
> 'Build automatically' preference is not checked).

Yes, this optimization can only work when the affected projects are fully built, so the old way has to stay. But for Java development, the common case is that auto-build is enabled and the workspace is fully built.

> An other point is that when I looked at the incremental builder, I saw that it
> was of course listening to resource events to know which resources will need to
> be updated in the status. Currently the S.E. does not listen to resources
> events. If we decide to change it, this would imply big changes that
> unfortunately we could not afford for 3.6 version.

Why would the S.E. have to listen to resource changes? The S.E. can just ask the builder whether it is ready. If the workspace changes while a search is running, then it's OK to miss matches that are only available in the new state, or to report matches that are out of date. Clients that can't cope with outdated matches have to lock the workspace anyway, since the matches could also become outdated 1ms after the search ended (or even during the requestor callback).
Comment 21 Frederic Fusier CLA 2010-01-07 13:26:33 EST
(In reply to comment #20)
> Yes, this optimization can only work when the affected projects are fully
> built, so the old way has to stay. But for Java development, the common case is
> that auto-build is enabled and the workspace is fully built.
>
Not sure about this assumption, all Java developers here have disabled auto-build... ;-) 
> 
> Why would the S.E. have to listen to resource changes? The S.E. can just ask
> the builder whether it is ready. If the workspace changes while a search is
> running, then it's OK to miss matches that are only available in the new state,
> or to report matches that are out of date. Clients that can't cope with
> outdated matches have to lock the workspace anyway, since the matches could
> also become outdated 1ms after the search ended (or even during the requestor
> callback).

Unfortunately I have looked at the Incremental JDT builder + asked Philippe about this and the conclusion is that there's currently no other way to have the idea the builder state than asking for resource deltas using the API method IncrementalProjectBuilder.getDeltas(IProject). Expect if we missed something, it seems that this currently can only be done through a builder, hence I cannot figure out how the S.E. could get them...

As I said, I talked with Philippe about this issue and his advice was to try to reduce the scope, as in fact the builder do. The builder never works on the entire workspace (except for the full build). It uses a stored state to know what is the smallest scope to recompile when a file is changed.

The simple idea is to do the same in this peculiar case. My first idea was to reduce the scope by restricting the search to the sources only. After having talked with Philippe, I realized that we could also reduced the scope to the Enclosing projects instead of the entire workspace...

And here are the number for org.eclipse.jface.preference.ListEditor.getShell() on the workspace I use for this issue:
 - Search in 'Sources' only using 'Workspace' scope: ~28s (warm)
 - Search in 'Sources' only using 'Enclosing projects' scope: ~2s (cold)
   ~1s (warm)
 - Search in 'Sources' + 'Required projects' + 'Application libraries'
   using 'Enclosing projects' scope: ~3s (cold)  ~1s (warm)
 - Build: ~5s (cold) ~1s (warm)

Maybe we could think about to give the users access to this kind of search in an easier way than the Java Search dialog...?
Comment 22 Olivier Thomann CLA 2010-01-19 20:28:31 EST
Moving to M6.
Comment 23 Markus Keller CLA 2010-02-02 06:26:02 EST
Re getting the build state from the builder: I see that this can get tricky. But you could probably just hock into the existing DeltaProcessor. I saw that it already calls JavaBuilder.buildFinished(), so it could also set a flag there and reset it on resource change events.

> As I said, I talked with Philippe about this issue and his advice was to try to
> reduce the scope, as in fact the builder do. The builder never works on the
> entire workspace (except for the full build). It uses a stored state to know
> what is the smallest scope to recompile when a file is changed.
> 
> The simple idea is to do the same in this peculiar case.

That's what I'm proposing, but I thought it's not worth to duplicate this dependency information in search, so I suggested to take this state from the builder when it is available.


> My first idea was to reduce the scope

That's fine too, but AFAIK, the search engine already does this where it doesn't reduce the possible match count (e.g. strips away unrelated projects).

> Maybe we could think about to give the users access to this kind of search in
> an easier way than the Java Search dialog...?

The user should not be bothered with the 'Search In' flags all the time. The options in the Search dialog are mainly a way to reduce the match count if the user knows e.g. that there are no relevant references from the JRE (although there could be references through polymorphism, but Search cannot know that they will never happen because it does not do any flow analysis).

The validity of those settings also depend on the workspace setup (e.g. whether I have some projects from source or imported as binary), but the user's request is always the same: Give me all references in the workspace as fast as possible, doing whatever tricks are necessary to speed up the operation.
Comment 24 Frederic Fusier CLA 2010-02-04 13:18:27 EST
Created attachment 158208 [details]
Draft patch

Markus, if you can make a quick test with this patch and let me know if it goes in the right direction... TIA

Note that I still need to discuss with Jerome about it to be sure of the DeltaProcessor changes (it seems to be a little naive so far).

I should be back with an update tomorrow...
Comment 25 Markus Keller CLA 2010-02-04 14:06:25 EST
(In reply to comment #24)
I quickly tried it, but I'm not sure what I'm supposed to test. The changes look promising, but the shortcut in the IndexSelector is too easy, since it only visits the enclosing projects of the generatedResources, but it misses to check references to these resources. The incremental builder must know about these references somehow, and that knowledge most be used here as well.
Comment 26 Frederic Fusier CLA 2010-02-05 11:27:02 EST
Created attachment 158321 [details]
Proposed patch

A new proposal to address this issue...

The key point is that Search Engine can be improved in this area *only* when the Auto-build is activated in the workspace. Unfortunately, all my attempts to implement a needRebuild() using the DeltaProcessor failed mainly due to the fact that there's no order specified between POST_BUILD and POST_CHANGE resources events coming from the Platform...

So, assuming that the user has activated the Auto-build preference in his/her workspace, with this patch, the IndexSelector will be able to reduce the number of project index files to look at by using the project builder state and see whether it includes a reference to the focus type or not...

Using this patch, searching for the ListEditor.getShell() method references on the *entire* Eclipse full sources workspace takes 14s just after having opened the session (i.e. when it's cold) and only 2s if I repeat the search (i.e. when it's warm)...

Comparatively, on the same workspace, renaming the method and trigger a build takes less than 2s when it's cold and less than 1s when it's warm...

So, even if it's not as fast as the builder, I guess this is the simplest change we can have for an acceptable result.
Comment 27 Frederic Fusier CLA 2010-02-05 11:32:24 EST
(In reply to comment #25)
> I quickly tried it, but I'm not sure what I'm supposed to test.
Just play with the last posted patch and see if it definitely addresses your problem. That means, do you think that the response time for the search query you perform in your usual workspace is now acceptable...

Also, with the last patch I posted, do you think it's acceptable that the S.E. speed-up only works when the auto-build is ON?
Comment 28 Markus Keller CLA 2010-02-08 10:10:28 EST
Speedup is great for some examples (ListEditor#getShell()), but not noticeable for others (Change#getParent()). I guess that's a consequence of the strategy to only use dependency information to include/exclude complete projects. I'm not sure how frequent the two patterns are in practice -- I'll have to test this a bit longer in practice to see if the current fix is already enough.


> Also, with the last patch I posted, do you think it's acceptable that the S.E.
> speed-up only works when the auto-build is ON?

Yes, that was the whole idea (use the available dependency data if it is available for free, but don't spend more cycles for resolving dependencies while maintaining the index).


I found that the new strategy reduces the amount of potential matches in some cases. E.g. when you import jface, swt, and all plug-ins that match "*.ui" as source, then search for references to ListEditor#getShell() used to find potential matches in org.eclipse.compare which is e.g. required by org.eclipse.equinox.p2.ui. The found potential matches were all wrong though, so I think it's no big deal.
Comment 29 Frederic Fusier CLA 2010-02-15 15:22:18 EST
(In reply to comment #26)
> Created an attachment (id=158321) [details]
> Proposed patch
> 
Released for 3.6M6 in HEAD stream.
Comment 30 Srikanth Sankaran CLA 2010-03-09 06:15:54 EST
Frederic,

The following code looks a bit suspicious to me: (reformatted to fit)

while (javaElement != null &&
       !(javaElement instanceof ICompilationUnit) && 
         !(javaElement instanceof IClassFile)) {
		javaElement = javaElement.getParent();
}

if (javaElement != null) {
    ICompilationUnit compilationUnit = (ICompilationUnit) javaElement;

If we exit the while with javaElement being an instanceof IClassFile then
the cast to ICompilationUnit could throw an class cast exception ? 

Let me know if I have misunderstood something here.

Otherwise the fix looks ok, I verified it by code review since
the performance angle has already been checked by Markus.
Comment 31 Frederic Fusier CLA 2010-03-09 06:33:20 EST
(In reply to comment #30)
> Frederic,
> 
> The following code looks a bit suspicious to me: (reformatted to fit)
> 
> while (javaElement != null &&
>        !(javaElement instanceof ICompilationUnit) && 
>          !(javaElement instanceof IClassFile)) {
>         javaElement = javaElement.getParent();
> }
> 
> if (javaElement != null) {
>     ICompilationUnit compilationUnit = (ICompilationUnit) javaElement;
> 
> If we exit the while with javaElement being an instanceof IClassFile then
> the cast to ICompilationUnit could throw an class cast exception ? 
> 
> Let me know if I have misunderstood something here.
> 
> Otherwise the fix looks ok, I verified it by code review since
> the performance angle has already been checked by Markus.


You definitely right, you points out here the potential CCE I was talking about in bug 304841 comment 4... Hopefully, I also saw it and it's now fixed in HEAD :-)
Comment 32 Srikanth Sankaran CLA 2010-03-09 06:39:25 EST
Verified for 3.6M6 by code inspection.