Bug 182738 - [search] Avoid keeping list of roots in JavaWorkspaceScope
Summary: [search] Avoid keeping list of roots in JavaWorkspaceScope
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.3   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.4 M7   Edit
Assignee: Jerome Lanneluc CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2007-04-17 08:48 EDT by Martin Aeschlimann CLA
Modified: 2008-05-05 05:45 EDT (History)
5 users (show)

See Also:


Attachments
Proposed patch (7.09 KB, patch)
2008-02-29 04:43 EST, Frederic Fusier CLA
no flags Details | Diff
New proposed patch (13.81 KB, patch)
2008-02-29 12:13 EST, Frederic Fusier CLA
no flags Details | Diff
New proposed patch after Jerome's review (13.81 KB, patch)
2008-03-04 05:48 EST, Frederic Fusier CLA
no flags Details | Diff
New proposed patch after Jerome's review (12.39 KB, patch)
2008-03-04 07:23 EST, Frederic Fusier CLA
no flags Details | Diff
Last proposed patch (9.71 KB, patch)
2008-03-04 09:05 EST, Frederic Fusier CLA
no flags Details | Diff
Improved patch (15.93 KB, patch)
2008-04-22 07:47 EDT, Jerome Lanneluc CLA
no flags Details | Diff
Final version (16.12 KB, patch)
2008-04-23 07:08 EDT, Jerome Lanneluc CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2007-04-17 08:48:23 EDT
20070417

JavaWorkspaceScope.enclosingProjectsAndJars returns all projects and JARs of the workspace. This is expensive to compute and maintain.

It should be possible to avoid calls to this API: introduce special treatment for the JavaWorkspaceScope. By definition, it contains everything, so it should be possible to iterate through every index.
Comment 1 Martin Aeschlimann CLA 2007-04-17 09:07:19 EDT
see also bug 179275 (bug with caching)
and bug 146577 (performance when creating a workspace scope)

Of course I don't know the internals of the search engine, but I'm guessing it should be possible to avoid the cache.
Comment 2 Frederic Fusier CLA 2007-04-17 10:47:00 EDT
JavaWorkspaceScope#encloses(...) methods already do not need the list of roots to work properly as they obviously always return true.

However, problem is for #enclosingProjectsAndJars() method which currently uses the superclass implementation. Do not keep this list in cache would mean to recompute the list for each call and would have a great performance impact on existing callers.

In eclipse workspace, this method is currently used in:
 - JDT/Core:
   + IndexSelector#initializeIndexLocations()
     could be changed to get all indexes locations directly from IndexManager
   + HandleFactory#getJarPkgFragmentRoot(String, IJavaSearchScope)
     could be changed to work as if the scope was null...
 - JDT/UI:
   + TextMacthUpdate#getProjectsInScope()
   + JavaSearchScopeFactory#getProjects(IJavaSearchScope)

Looking at JDT/UI methods names, I guess they also could be changed to get the projects directly from the workspace.

However, as this would break the behavior of current JavaWorkspaceScope for other unknown clients, I think it was not safe to change this at this late stage of 3.3 dvpt.

Jerome, what's your mind on this?
Comment 3 Martin Aeschlimann CLA 2007-04-17 11:14:46 EDT
I just checked and it would be quite simple in our code to avoid calling 'enclosingProjectsAndJars' on workspace scopes.
I see two possibilities for us to find out if it is a workspace scope:
- test if scope contains the 'IJavaModel' element
- test scope.getClass().equals(SearchEngine.createWorkspaceScope().getClass())
What do you prefer us to do?

You can keep the cache if you want, but make the initialization lazy (move it to enclosingProjectsAndJars()). Our and your code wouldn't call it, so it would only be initialized if others do.
Comment 4 Jerome Lanneluc CLA 2007-04-17 12:36:30 EDT
We also need to check whether the result of the initialization is used by other methods than enclosingProjectAndJars(). For example getAccessRuleSet(...) is also using the result of the initialization. Can we simply return an empty AccessRuleSet for the workspace scope ? Are there other methods that would be affected by not initializing the workspace scope ?
Comment 5 Philipe Mulet CLA 2007-08-31 12:41:21 EDT
Time permitting for 3.4. What would help is an explanation of the performance ramifications with this issue ?
Comment 6 Frederic Fusier CLA 2008-02-29 04:43:14 EST
Created attachment 91131 [details]
Proposed patch

Note that this patch also fixes bug 158526...

Jerome, can you have a look on it?
Thanks
Comment 7 Frederic Fusier CLA 2008-02-29 12:13:49 EST
Created attachment 91211 [details]
New proposed patch

After having talked with Jerome, here's a new proposal to fix this bug.
The JavaWorkspaceScope will no longer store any information.

The single hot spot is on the enclosingProjectsAndJars() method which will recompute the list of paths each time it is called. I'll spent some time to measure it and have a precise idea of what will be the overload for existing JavaWorkspaceScope clients.
Comment 8 Frederic Fusier CLA 2008-02-29 12:33:16 EST
Numbers on my test box show times between 0 and 47ms for 172 projects and jars on a Eclipse full source workspace. IMO, this sounds reasonable...

So, if nobody shouts about the new proposed patch, I'll release it for the next I-build.
Comment 9 Frederic Fusier CLA 2008-03-03 07:02:28 EST
Numbers on my thinkpad show times between 42 and 78ms for 686 projects and jars
on a Ganymede full source workspace. IMO, this still sounds reasonable...
Comment 10 Frederic Fusier CLA 2008-03-04 05:48:50 EST
Created attachment 91493 [details]
New proposed patch after Jerome's review

This patch remove unnecessary cases while adding jars files to the list...

However, JDT/Core "search all type names" performance tests show strong regression with this patch: between -28% and -50%!

This is not acceptable and should be fixed first before releasing...
Comment 11 Frederic Fusier CLA 2008-03-04 07:22:20 EST
Comment on attachment 91493 [details]
New proposed patch after Jerome's review

Sorry, wrong patch
Comment 12 Frederic Fusier CLA 2008-03-04 07:23:29 EST
Created attachment 91500 [details]
New proposed patch after Jerome's review

This patch is the right one...
Comment 13 Jerome Lanneluc CLA 2008-03-04 08:54:59 EST
(In reply to comment #12)
This patch looks good. For the perf problem we might want to cache the ecnclosingProjectAndJARs strings as I suspect they will not take much space.
Comment 14 Frederic Fusier CLA 2008-03-04 09:05:38 EST
Created attachment 91511 [details]
Last proposed patch

Looking at the size of the JavaWorkspaceScope on my Ganymede full sources workspace, it appears that the paths list for enclosingProjectsAndJars would be only around 2,700 bytes with the last proposed patch although the entire workspace scope is 101,240 bytes with HEAD...

So, it looks reasonable to store this list in the workspace scope avoiding to recompute it on each method call. Of course, this list would be lazily initialized and reset if needed while processing deltas.
Comment 15 Frederic Fusier CLA 2008-03-04 11:51:15 EST
Sizes on our monster workspace are:
 - 8,096 bytes with the last proposed patch
 - 311,024 bytes with HEAD...
for 2017 projects and JARs...

However, there's still a regression on testSearchAllTypeNameMatches test of FullSourceWorkspaceSearchTests... :-(

After having investigated, the reason is that the JavaWorkspaceScope is no longer a JavaSearchScope and so the TypeNameMatchRequestorWrapper now needs to use a HandleFactory to retrieve the IType while accepting a match.

So, it's still a no GO for this patch (which won't be the last one). I have to figure out how to speed up the HandleFactory or implement the packageFragmentRoot(String) method on JavaWorkspaceScope...

Perhaps can we also accept that ~300K is not so a big memory foot print for a monster workspace and leave the JavaWorkspaceScope as before...
Comment 16 Frederic Fusier CLA 2008-04-17 13:19:30 EDT
Jerome,
The patch can still be applied on top of HEAD and the last comment was the real last update I had on this issue.
Thanks
Comment 17 Jerome Lanneluc CLA 2008-04-22 07:47:05 EDT
Created attachment 97001 [details]
Improved patch

With this new patch, I'm seeing 38% improvement (over what's currently in HEAD) for FullSourceWorkspaceSearchTests.testSearchAllTypeNameMatches().
Comment 18 Jerome Lanneluc CLA 2008-04-23 07:08:16 EDT
Created attachment 97199 [details]
Final version
Comment 19 Jerome Lanneluc CLA 2008-04-23 07:14:55 EDT
Final version released for 3.4M7
Comment 20 Maxime Daniel CLA 2008-04-29 04:16:09 EDT
Looking at the source code (as of v_856) to verify the  bug.

I think the desired effect is obtained.

In terms of potential further improvements, I would contend that if the code of lines 72-93 is time consuming - and it seems to be so, we're at risk of eating up some deltas, but I'll leave this to Jérôme's appreciation.

Verified for 3.4 M7 by code inspection of v_856.
Comment 21 Jerome Lanneluc CLA 2008-05-05 05:45:20 EDT
(In reply to comment #20)
> In terms of potential further improvements, I would contend that if the code of
> lines 72-93 is time consuming - and it seems to be so, we're at risk of eating
> up some deltas, but I'll leave this to Jérôme's appreciation.
> 
Thanks Maxime. Entered bug 230168 to capture this issue.