Community
Participate
Working Groups
20070417 JavaWorkspaceScope.enclosingProjectsAndJars returns all projects and JARs of the workspace. This is expensive to compute and maintain. It should be possible to avoid calls to this API: introduce special treatment for the JavaWorkspaceScope. By definition, it contains everything, so it should be possible to iterate through every index.
see also bug 179275 (bug with caching) and bug 146577 (performance when creating a workspace scope) Of course I don't know the internals of the search engine, but I'm guessing it should be possible to avoid the cache.
JavaWorkspaceScope#encloses(...) methods already do not need the list of roots to work properly as they obviously always return true. However, problem is for #enclosingProjectsAndJars() method which currently uses the superclass implementation. Do not keep this list in cache would mean to recompute the list for each call and would have a great performance impact on existing callers. In eclipse workspace, this method is currently used in: - JDT/Core: + IndexSelector#initializeIndexLocations() could be changed to get all indexes locations directly from IndexManager + HandleFactory#getJarPkgFragmentRoot(String, IJavaSearchScope) could be changed to work as if the scope was null... - JDT/UI: + TextMacthUpdate#getProjectsInScope() + JavaSearchScopeFactory#getProjects(IJavaSearchScope) Looking at JDT/UI methods names, I guess they also could be changed to get the projects directly from the workspace. However, as this would break the behavior of current JavaWorkspaceScope for other unknown clients, I think it was not safe to change this at this late stage of 3.3 dvpt. Jerome, what's your mind on this?
I just checked and it would be quite simple in our code to avoid calling 'enclosingProjectsAndJars' on workspace scopes. I see two possibilities for us to find out if it is a workspace scope: - test if scope contains the 'IJavaModel' element - test scope.getClass().equals(SearchEngine.createWorkspaceScope().getClass()) What do you prefer us to do? You can keep the cache if you want, but make the initialization lazy (move it to enclosingProjectsAndJars()). Our and your code wouldn't call it, so it would only be initialized if others do.
We also need to check whether the result of the initialization is used by other methods than enclosingProjectAndJars(). For example getAccessRuleSet(...) is also using the result of the initialization. Can we simply return an empty AccessRuleSet for the workspace scope ? Are there other methods that would be affected by not initializing the workspace scope ?
Time permitting for 3.4. What would help is an explanation of the performance ramifications with this issue ?
Created attachment 91131 [details] Proposed patch Note that this patch also fixes bug 158526... Jerome, can you have a look on it? Thanks
Created attachment 91211 [details] New proposed patch After having talked with Jerome, here's a new proposal to fix this bug. The JavaWorkspaceScope will no longer store any information. The single hot spot is on the enclosingProjectsAndJars() method which will recompute the list of paths each time it is called. I'll spent some time to measure it and have a precise idea of what will be the overload for existing JavaWorkspaceScope clients.
Numbers on my test box show times between 0 and 47ms for 172 projects and jars on a Eclipse full source workspace. IMO, this sounds reasonable... So, if nobody shouts about the new proposed patch, I'll release it for the next I-build.
Numbers on my thinkpad show times between 42 and 78ms for 686 projects and jars on a Ganymede full source workspace. IMO, this still sounds reasonable...
Created attachment 91493 [details] New proposed patch after Jerome's review This patch remove unnecessary cases while adding jars files to the list... However, JDT/Core "search all type names" performance tests show strong regression with this patch: between -28% and -50%! This is not acceptable and should be fixed first before releasing...
Comment on attachment 91493 [details] New proposed patch after Jerome's review Sorry, wrong patch
Created attachment 91500 [details] New proposed patch after Jerome's review This patch is the right one...
(In reply to comment #12) This patch looks good. For the perf problem we might want to cache the ecnclosingProjectAndJARs strings as I suspect they will not take much space.
Created attachment 91511 [details] Last proposed patch Looking at the size of the JavaWorkspaceScope on my Ganymede full sources workspace, it appears that the paths list for enclosingProjectsAndJars would be only around 2,700 bytes with the last proposed patch although the entire workspace scope is 101,240 bytes with HEAD... So, it looks reasonable to store this list in the workspace scope avoiding to recompute it on each method call. Of course, this list would be lazily initialized and reset if needed while processing deltas.
Sizes on our monster workspace are: - 8,096 bytes with the last proposed patch - 311,024 bytes with HEAD... for 2017 projects and JARs... However, there's still a regression on testSearchAllTypeNameMatches test of FullSourceWorkspaceSearchTests... :-( After having investigated, the reason is that the JavaWorkspaceScope is no longer a JavaSearchScope and so the TypeNameMatchRequestorWrapper now needs to use a HandleFactory to retrieve the IType while accepting a match. So, it's still a no GO for this patch (which won't be the last one). I have to figure out how to speed up the HandleFactory or implement the packageFragmentRoot(String) method on JavaWorkspaceScope... Perhaps can we also accept that ~300K is not so a big memory foot print for a monster workspace and leave the JavaWorkspaceScope as before...
Jerome, The patch can still be applied on top of HEAD and the last comment was the real last update I had on this issue. Thanks
Created attachment 97001 [details] Improved patch With this new patch, I'm seeing 38% improvement (over what's currently in HEAD) for FullSourceWorkspaceSearchTests.testSearchAllTypeNameMatches().
Created attachment 97199 [details] Final version
Final version released for 3.4M7
Looking at the source code (as of v_856) to verify the bug. I think the desired effect is obtained. In terms of potential further improvements, I would contend that if the code of lines 72-93 is time consuming - and it seems to be so, we're at risk of eating up some deltas, but I'll leave this to Jérôme's appreciation. Verified for 3.4 M7 by code inspection of v_856.
(In reply to comment #20) > In terms of potential further improvements, I would contend that if the code of > lines 72-93 is time consuming - and it seems to be so, we're at risk of eating > up some deltas, but I'll leave this to Jérôme's appreciation. > Thanks Maxime. Entered bug 230168 to capture this issue.