Bug 567521 - Parallelize JDT index search
Summary: Parallelize JDT index search
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.17   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.19 M1   Edit
Assignee: Andrey Loskutov CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy, performance
: 569026 (view as bug list)
Depends on:
Blocks: 569285 570132
  Show dependency tree
 
Reported: 2020-10-01 10:53 EDT by Gunnar Wagenknecht CLA
Modified: 2022-02-09 05:30 EST (History)
11 users (show)

See Also:


Attachments
Patch with marker interface (53.37 KB, application/octet-stream)
2020-12-07 16:16 EST, Gayan Perera CLA
no flags Details
Type Hierarchy duplicates (79.54 KB, image/png)
2021-01-07 13:52 EST, Wim Jongman CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gunnar Wagenknecht CLA 2020-10-01 10:53:39 EDT
See https://bugs.eclipse.org/bugs/show_bug.cgi?id=481323#c21
Comment 1 Julian Honnen CLA 2020-10-02 06:44:51 EDT
replies to comments from bug 481323:
(In reply to Gunnar Wagenknecht from comment #29)
> -Djdt.codeCompleteSubstringMatch=false as this was the only option specified
> in the commit.
> 
> Preferences are the defaults.
Subword completion is enabled by default. To get the previous behavior you need to both disable substring completion via vmarg and subword completion via preferences.

> Not sure this is a Java 11+ issue.
> 
> It's not that workspace size that matters. It's the classpath of the single
> project where the completion happens.
> 
> These stats are covered in the analysis:
> -> pkg roots size: 5862
> -> pkgs size: 56835
Makes sense, I can't find any project with that magnitude in our workspace. Largest I found so far are ~150 package roots with ~500ms time spent in findTypes for the pattern "e".

(In reply to Gunnar Wagenknecht from comment #30)
> In this case it's PatternSearchJob. It has to loop over >5000 index files
> (there is one per jar and the majority of pkg roots are jars).
> 
> Adding only 0.5ms (average) to its search operation increases #findTypes and
> others by 2500ms. O(N) issue. It's not that bad but it is to blame.
I'll check if that constant factor can be improved.

But I would be surprised if there's an optimization that would bring that time down by 10x into the realm of responsive UI. Your N is simply too large.

Maybe we should make substring/subword completion for types optional with a separate preference.

> One other observation: Looking through the commit I cannot see how
> 'jdt.codeCompleteSubstringMatch' would have any impact on the search flags.
> Perhaps that's the issue?
The flag is evaluated in AssistOptions:242
Comment 2 Gunnar Wagenknecht CLA 2020-10-02 07:44:45 EDT
(In reply to Julian Honnen from comment #1)
> > Preferences are the defaults.
> Subword completion is enabled by default. To get the previous behavior you
> need to both disable substring completion via vmarg and subword completion
> via preferences.

Found it. Disabling the preference in combination with the flag is restoring performance.

How we do we continue from here?

I don't think a full revert is necessary. However, I think we need to improve the solution. 

What is the difference between subword and substring completion?
Can we disable both with a single system property?

I'm a bit confused why one is a system property and the other a preference.
Comment 3 Julian Honnen CLA 2020-10-02 08:21:21 EDT
(In reply to Gunnar Wagenknecht from comment #2)
> What is the difference between subword and substring completion?
> Can we disable both with a single system property?
> 
> I'm a bit confused why one is a system property and the other a preference.
Substring matches are exactly that: name.contains(pattern) - but case insensitive.
Subword matching is more advanced, e.g. "linkedmap" matches "LinkedHashMap".

We decided in bug 561709 that there's no reason to disable the basic substring matches (that's in line with other IDEs). The preference was therefore removed, the system property was only added as a fallback, like your regression.

Subword matches can be configured by the user.

(In reply to Gunnar Wagenknecht from comment #2)
> How we do we continue from here?
> 
> I don't think a full revert is necessary. However, I think we need to
> improve the solution. 
In my samples, most of the time is spent reading index files here:

org.eclipse.jdt.internal.core.SearchableEnvironment.findTypes ()	482 ms (68,3 %)	482 ms (68,3 %)
...
org.eclipse.jdt.core.search.SearchPattern.findIndexMatches ()	482 ms (68,3 %)	482 ms (68,3 %)
org.eclipse.jdt.internal.core.index.EntryResult.getDocumentNames ()	274 ms (38,9 %)	274 ms (38,9 %)
org.eclipse.jdt.internal.core.index.DiskIndex.readDocumentName ()	274 ms (38,9 %)	274 ms (38,9 %)

I.e. it scales with the number of matches, not directly with the number of package roots.
Can you confirm that?

That means we could mitigate it by disabling substring/word matches for short patterns (e.g. minimum of 3 chars).
Typing after invoking content assist would only filter the initial result though. 
-> typing "ma", invoke content assist and type "p" would not show HashMap.

Going further, it might be possible to parallelize the search loop in PatternSearchJob::execute.
Comment 4 Gunnar Wagenknecht CLA 2020-10-02 08:37:41 EDT
(In reply to Julian Honnen from comment #3)

Thanks for the explanation!

> > I don't think a full revert is necessary. However, I think we need to
> > improve the solution. 
> In my samples, most of the time is spent reading index files here:

..

> I.e. it scales with the number of matches, not directly with the number of
> package roots.
> Can you confirm that?

I haven't check the number of matches. I only noticed that it has to loop/scan over 5000+ index files in my case. There is one index per package root (jar).

> Going further, it might be possible to parallelize the search loop in
> PatternSearchJob::execute.

That sounds more promising than reducing the results. This would directly benefit Type Hierarchy as well.

I think I'm good converting this bug into an action for parallelizing PatternSearchJob::execute. Very likely this needs to be configurable (# of parallel read threads) to not cause too much stress on system disc.
Comment 5 Gunnar Wagenknecht CLA 2020-10-02 08:38:59 EDT
(In reply to Gunnar Wagenknecht from comment #4)
> > Going further, it might be possible to parallelize the search loop in
> > PatternSearchJob::execute.
> 
> That sounds more promising than reducing the results. This would directly
> benefit Type Hierarchy as well.

Disclaimer: I don't know top of my head what Type Hierarchy uses. But it also has to do a lot of loops on the index in our case.
Comment 6 Julian Honnen CLA 2020-10-02 08:44:47 EDT
(In reply to Gunnar Wagenknecht from comment #4)
> I haven't check the number of matches. I only noticed that it has to
> loop/scan over 5000+ index files in my case. There is one index per package
> root (jar).
Can you record a CPU sample with visualvm? Make sure to set the sampling frequency to 20ms.

> That sounds more promising than reducing the results. This would directly
> benefit Type Hierarchy as well.
Type hierarchy does indeed use PatternSearchJob.

> I think I'm good converting this bug into an action for parallelizing
> PatternSearchJob::execute. Very likely this needs to be configurable (# of
> parallel read threads) to not cause too much stress on system disc.
I will play around with that, if it looks promising, I'll rename this bug.
Comment 7 Eclipse Genie CLA 2020-10-07 07:45:55 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/170431
Comment 8 Manoj N Palat CLA 2020-11-18 05:39:18 EST
bulk move out of 4.18
Comment 9 Gayan Perera CLA 2020-11-22 14:02:53 EST
@Julian and others what do you think about this fix ? https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/172632
Comment 10 Andrey Loskutov CLA 2020-11-27 14:58:33 EST
*** Bug 569026 has been marked as a duplicate of this bug. ***
Comment 11 Gayan Perera CLA 2020-11-27 15:05:36 EST
@Andrey i found some issues in both implementations, mine and julian's. The problem is that most of the IJavaSearchScope s are not threadsafe. Because of that some of the scoping fails. I found this while running and debugging the unit tests for search. 

Option1: is to make all SearchScopes thread-safe. But that doesn't grantees implementations outside Eclipse.

Option2: is to decorate the searchscope with a synchronized wrappers, But we do have some instanceof checks which will cause problems. But for those we could change those places to actually make the decision based on decoratee.

Option3: is to make sure all access to searchscope operations are synchronized inside SearchPattern implementations. Which introduce more complexity since we need to make sure we use one lock object throughout all pattern instances.

I prefer the Option2 since it make the API backward compatible and also solves the problem with less changes and complexity than option3.

But i did gave a try with Option2 and looking more closer i see that we have more methods on subtypes which are invoked while searching the indexes.

So i think Option3 would be more safer.

WDYT ?
Comment 12 Gayan Perera CLA 2020-11-28 13:59:34 EST
I submitted a my gerrit here with option3. I do see that we are expecting a order of found results in org.eclipse.jdt.core.tests.model.JavaSearchBugsTests.testBug345807()

It does make sense to have a higher order for source matches. But do we really need to do that on index level, isn't it more or less part of the sorting of the found results ?
Comment 13 Gayan Perera CLA 2020-12-02 16:11:00 EST
Please check my latest gerrit, it handles
- Synchronized access to SearchScope
- Simplified parallel execution based on streams
- OperationCancelException handling when using streams

I still thinking how to preserve the current contract of giving binary results after the source results which is tested with org.eclipse.jdt.core.tests.model.JavaSearchBugsTests.testBug345807()

Ideas are really appreciated and reviews on my patch is really appreciated. My expectation is to finish this quickly and save some time for all Eclipse Java Developers.
Comment 14 Gayan Perera CLA 2020-12-05 13:47:14 EST
@Andrey @Julian i started looking at the marker interface usage. One problem with that is there if there are other production like groovy or scalar which uses internal implementation and extending them. Can we ignore those because internals are not meant to maintain the contracts over time ?


I also found out the we need to make sure the search patterns are not thread safe as well. So we need to mark them too. WDYT ?
Comment 15 Gayan Perera CLA 2020-12-05 14:44:56 EST
At the same time i was thinking of checking object.getClass().getInterfaces() to see if the IParallelize is available, so then we will safe, because being a super type thread safe doesn't mean the subtype will be unless it is also implemented as such.

Also i thought of using ReentarantReadWrite locks where locks are needed, such as in scopes. With this we could reduce the thread waiting times on read only operations.

WDYT ?
Comment 16 Gayan Perera CLA 2020-12-05 14:47:33 EST
The downside of object.getClass().getInterfaces() you need to implement the interface on all classes, its not something the java interfaces are made for the way i'm trying to use them.

How about using a annotation instead of the Marker interface ?
Comment 17 Gayan Perera CLA 2020-12-06 10:45:19 EST
Also what do you think about cloning the scope ? We could introduce a extension for clonable support of the scope and make clone for each thread. That will reduce the complexity of concurrency. Because some scopes refresh the scope and also update them self while the search is running, I do see a risk of getting wrong results based on thread scheduling.
Comment 18 Gunnar Wagenknecht CLA 2020-12-06 11:33:16 EST
I suggest that you create some reliable and repeatable benchmarks around this code area. https://github.com/openjdk/jmh

Then you can play with various ideas and identify the ones that are not regressing any other scenario.
Comment 19 Gayan Perera CLA 2020-12-06 14:47:13 EST
(In reply to Gunnar Wagenknecht from comment #18)
> I suggest that you create some reliable and repeatable benchmarks around
> this code area. https://github.com/openjdk/jmh
> 
> Then you can play with various ideas and identify the ones that are not
> regressing any other scenario.

Thats a good idea, but the problem right now is using java locks like read/write locks in HierarchyScope doesn't seems to behave well, because time to time some unit tests get different results, which seems be due to scope refreshs which cause internal data to change. 

seems like when we are working on a index with a scope, we need to keep that scope unchanged by other threads, the onlyway of doing that while keeping the parallelism is to copy the scope and handover the different threads.
Comment 20 Julian Honnen CLA 2020-12-07 03:21:31 EST
(In reply to Gayan Perera from comment #14)
> @Andrey @Julian i started looking at the marker interface usage. One problem
> with that is there if there are other production like groovy or scalar which
> uses internal implementation and extending them. Can we ignore those because
> internals are not meant to maintain the contracts over time ?
Yes. If those projects are using internal classes, it's their responsibility to adapt.

> I also found out the we need to make sure the search patterns are not thread
> safe as well. So we need to mark them too. WDYT ?
Which patterns are not thread safe?

(In reply to Gayan Perera from comment #17)
> Also what do you think about cloning the scope ? We could introduce a
> extension for clonable support of the scope and make clone for each thread.
> That will reduce the complexity of concurrency. Because some scopes refresh
> the scope and also update them self while the search is running, I do see a
> risk of getting wrong results based on thread scheduling.
The scopes are only modified by java model changes. That update mechanism is probably not even thread safe in the current single threaded scenario.

An alternative to copying the scope might be to delay the processDelta call in DeltaProcessor until all current searches are complete.
That would all JDT scope safe for parallel searches AFAICT.
Comment 21 Andrey Loskutov CLA 2020-12-07 05:11:48 EST
One interesting concern was raised on the patch: looks like clients expect that search results are supposed to be in a *specific* order, see for example bug 348507 and related commit 22d76cb44cc4a2aa3afa5a732ef27b4f46c352c3 that added JavaSearchBugsTests.testBug345807(). This test is failing now with parallel search, which brings us to the question how we want handle the changed search result order.

I'm honestly not the indexing expert, so from naive expectation I've assumed the search engine is a black box that returns some results and it is on clients to decide how to process those. Obviously that's not the case.

I don't see however Javadoc saying anywhere that the search results are supposed to be reported in some particular order, contrary, org.eclipse.jdt.core.search.SearchRequestor says "The order of the search results is unspecified and may vary from request to request; when displaying results, clients should not rely on the order but should instead arrange the results in an order that would be more meaningful to the user.".

So we see that while the API told clients that "order is not specified", clients simply ignored that (may be they were unaware about it) and used the fact that the search was sequential and *had* a *specific* order. The whole patch for bug 348507 was based on this implicit assumption.

The concrete code for bug 348507 triggers the search via  org.eclipse.jdt.internal.debug.ui.actions.OpenTypeAction.findUniqueTypeInWorkspace() and stops immediately after the first match (because with the sequential order that is the "right" match).

So from client point of view we don't want to continue the search once we've found the "right" match, but from parallel search point of view we have no idea which search index is the "right" one. For this particular client it was beneficial that the search was done in some particular order, because otherwise it would need to wait for all search results...

This is just one example we've stumbled upon because the good guys added a regression tests to bug 348507. So how many clients are there that alsy rely on (implicit) sequential search results order? No idea, but I guess a lot, and not only in SDK.

So if we would unconditionally change the search result order, we will break those clients.

We can't break existing clients => we must make sure we run parallel search only if we work with "good clients" (that either don't care about order or know how to sort results by themselves). So it looks like all that goes back again to the JavaSearchParticipant (that extends SearchParticipant) that is used to accept matches - it should be able to signal that parallel search is allowed or not.

In the concrete case we have client that works with org.eclipse.jdt.core.search.SearchEngine API method org.eclipse.jdt.core.search.SearchEngine.searchAllTypeNames() that must keep the sequential search order, but I assume that for "good" clients we can provide another version of this API method that can run search in parallel.

The next question arises here - what should all *other* methods in SearchEngine do? Also provide a second variant with "parallel" flag?

I assume the same applies to all of them - but if we would add new versions of each method to this class, we will end up in an unmaintainable code. 

I have no ready solution here, but it looks like we are going to add new ParalleSearchEngine  API with only few methods that could be used by "parallel aware" clients (which could be identified either by introducing IParallelizable interface or a new method to SearchParticipant)?

Thoughts?
Comment 22 Gayan Perera CLA 2020-12-07 06:43:10 EST
(In reply to Andrey Loskutov from comment #21)
> 
> I have no ready solution here, but it looks like we are going to add new
> ParalleSearchEngine  API with only few methods that could be used by
> "parallel aware" clients (which could be identified either by introducing
> IParallelizable interface or a new method to SearchParticipant)?
> 
> Thoughts?

I think it will do the minimum damage to client who depends on this order. But having a separate engine means we need to duplicate the same tests other than the test which expect order to this new search engine right ? That nearly 500+ test cases. 

What do you think ?
Comment 23 Julian Honnen CLA 2020-12-07 07:07:14 EST
(In reply to Andrey Loskutov from comment #21)
> So from client point of view we don't want to continue the search once we've
> found the "right" match, but from parallel search point of view we have no
> idea which search index is the "right" one. For this particular client it
> was beneficial that the search was done in some particular order, because
> otherwise it would need to wait for all search results...
We could do both: parallel execution while strictly maintaining the index order.
The parallel search threads would just record their results and calling thread would notify the requestor in order.

The requestor would not even have to be thread safe for this.
Comment 24 Andrey Loskutov CLA 2020-12-07 07:15:51 EST
(In reply to Julian Honnen from comment #23)
> (In reply to Andrey Loskutov from comment #21)
> > So from client point of view we don't want to continue the search once we've
> > found the "right" match, but from parallel search point of view we have no
> > idea which search index is the "right" one. For this particular client it
> > was beneficial that the search was done in some particular order, because
> > otherwise it would need to wait for all search results...
> We could do both: parallel execution while strictly maintaining the index
> order.
> The parallel search threads would just record their results and calling
> thread would notify the requestor in order.
> 
> The requestor would not even have to be thread safe for this.

This would mean however, that clients that wanted to have the first result only (like OpenTypeAction) would now need to wait until all results arrive => regression.
Comment 25 Julian Honnen CLA 2020-12-07 07:18:34 EST
(In reply to Andrey Loskutov from comment #24)
> This would mean however, that clients that wanted to have the first result
> only (like OpenTypeAction) would now need to wait until all results arrive
> => regression.
Not necessarily. We could cancel all remaining tasks once the requestor throws an OperationCanceledException.
That would certainly do more work, but the wall clock time should stay more or less the same.
Comment 26 Andrey Loskutov CLA 2020-12-07 07:40:35 EST
(In reply to Julian Honnen from comment #25)
> (In reply to Andrey Loskutov from comment #24)
> > This would mean however, that clients that wanted to have the first result
> > only (like OpenTypeAction) would now need to wait until all results arrive
> > => regression.
> Not necessarily. We could cancel all remaining tasks once the requestor
> throws an OperationCanceledException.
> That would certainly do more work, but the wall clock time should stay more
> or less the same.

No, it won't work this way. The current first result is already "right one", so OpenTypeAction can cancel immediately. But in the parallel world, the first result is not the "right one", it is just some random one, so in order to guarantee the search results order to clients we have to wait for *all* results so we can sort them and call org.eclipse.jdt.internal.core.search.IndexQueryRequestor.acceptIndexMatch() in right order.
Comment 27 Julian Honnen CLA 2020-12-07 07:48:10 EST
(In reply to Andrey Loskutov from comment #26)
> But in the parallel world, the
> first result is not the "right one", it is just some random one, so in order
> to guarantee the search results order to clients we have to wait for *all*
> results so we can sort them and call
> org.eclipse.jdt.internal.core.search.IndexQueryRequestor.acceptIndexMatch()
> in right order.

Why? We would schedule a parallel task for each Index which returns a Future<List<Result>>. In the same order we can then await the futures and pass them to the requestor.

Once the task for the first index completes (and contains an acceptable result), all others can be canceled.
Comment 28 Andrey Loskutov CLA 2020-12-07 08:08:09 EST
(In reply to Julian Honnen from comment #27)
> (In reply to Andrey Loskutov from comment #26)
> > But in the parallel world, the
> > first result is not the "right one", it is just some random one, so in order
> > to guarantee the search results order to clients we have to wait for *all*
> > results so we can sort them and call
> > org.eclipse.jdt.internal.core.search.IndexQueryRequestor.acceptIndexMatch()
> > in right order.
> 
> Why? We would schedule a parallel task for each Index which returns a
> Future<List<Result>>. In the same order we can then await the futures and
> pass them to the requestor.
> 
> Once the task for the first index completes (and contains an acceptable
> result), all others can be canceled.

OK, I've got what you mean. That should work.
Comment 29 Julian Honnen CLA 2020-12-07 10:43:24 EST
I pushed a new version of Gayan's patch based on this approach.
Comment 30 Gayan Perera CLA 2020-12-07 14:18:03 EST
(In reply to Julian Honnen from comment #29)
> I pushed a new version of Gayan's patch based on this approach.
Did you only added the api_filters or you did some more changes ? 

Should we work on the current patch or should be go with the IParallelizable Markup interface and may be preserve the order of the execution ?

Also if we decide so we may need to work on the delata processing as well.

WDYT ?

The AndPatterns has a issue with parallel search. Specially in my experimentation with IParallelizable and read/write locks on the scopes. Seems because more threads are reading parallel than sync block implementation.

So i thought to check Patterns as well to make sure they implement IParallelizable.
Comment 31 Gayan Perera CLA 2020-12-07 16:16:44 EST
Created attachment 284986 [details]
Patch with marker interface

@Julian and @Andrey following is a working in progress patch of new implementation. Please let me know your feedback. I will try to rewrite the stream usage with Futures to enable ordering.
Comment 32 Julian Honnen CLA 2020-12-08 02:10:00 EST
(In reply to Gayan Perera from comment #30)
> (In reply to Julian Honnen from comment #29)
> > I pushed a new version of Gayan's patch based on this approach.
> Did you only added the api_filters or you did some more changes ? 
The consistent ordering we discussed above, please check the gerrit.

> The AndPatterns has a issue with parallel search. Specially in my
> experimentation with IParallelizable and read/write locks on the scopes.
> Seems because more threads are reading parallel than sync block
> implementation.
The pattern iteration in IntersectingPattern is not threadsafe and would need to change in a follow up bug.

> So i thought to check Patterns as well to make sure they implement
> IParallelizable.
Looks like that's necessary. But a marker interface is not enough. AndPattern should only be parallelizable iff itself and its contained patterns are parallelizable. --> we need a supportsParallelSearch() or similar.
Comment 33 Gayan Perera CLA 2020-12-08 05:04:03 EST
(In reply to Julian Honnen from comment #32)
> > So i thought to check Patterns as well to make sure they implement
> > IParallelizable.
> Looks like that's necessary. But a marker interface is not enough.
> AndPattern should only be parallelizable iff itself and its contained
> patterns are parallelizable. --> we need a supportsParallelSearch() or
> similar.

I thought about it as well. But since we are reluctant on changing api i still didn’t went that far implementing it. But since this And and Or patterns are internal we could use a internal extension to check if this composite patterns and parallel safe, wdyt ?
Comment 34 Julian Honnen CLA 2020-12-08 05:11:13 EST
(In reply to Gayan Perera from comment #33)
> I thought about it as well. But since we are reluctant on changing api i
> still didn’t went that far implementing it. But since this And and Or
> patterns are internal we could use a internal extension to check if this
> composite patterns and parallel safe, wdyt ?
The marker interface is already an API change. Non-JDT patterns might need that method as well.
Comment 35 Gayan Perera CLA 2020-12-08 05:38:33 EST
(In reply to Julian Honnen from comment #34)
> (In reply to Gayan Perera from comment #33)
> > I thought about it as well. But since we are reluctant on changing api i
> > still didn’t went that far implementing it. But since this And and Or
> > patterns are internal we could use a internal extension to check if this
> > composite patterns and parallel safe, wdyt ?
> The marker interface is already an API change. Non-JDT patterns might need
> that method as well.

Not the IParallelizable API. I was thinking about a internal interface with a boolean method which will be implemented to OR and AND patterns. And when i check for parallel search i will check if the pattern is implementating that interface and if so check the method. This way we can easily adapt other composition patterns easily if we add more in future, rather than casting for OR and AND patterns in when checking.
Comment 36 Julian Honnen CLA 2020-12-08 07:08:38 EST
(In reply to Gayan Perera from comment #35)
> Not the IParallelizable API. I was thinking about a internal interface with
> a boolean method which will be implemented to OR and AND patterns. And when
> i check for parallel search i will check if the pattern is implementating
> that interface and if so check the method. This way we can easily adapt
> other composition patterns easily if we add more in future, rather than
> casting for OR and AND patterns in when checking.
IParallelizable needs the method. External projects need the same opt-in mechanism, not just that specific JDT pattern.
Comment 37 Gayan Perera CLA 2020-12-08 07:22:15 EST
(In reply to Julian Honnen from comment #36)
> (In reply to Gayan Perera from comment #35)
> > Not the IParallelizable API. I was thinking about a internal interface with
> > a boolean method which will be implemented to OR and AND patterns. And when
> > i check for parallel search i will check if the pattern is implementating
> > that interface and if so check the method. This way we can easily adapt
> > other composition patterns easily if we add more in future, rather than
> > casting for OR and AND patterns in when checking.
> IParallelizable needs the method. External projects need the same opt-in
> mechanism, not just that specific JDT pattern.

Ok you are saying that we might have composite patterns in external projects which just a marker only cannot satisfy the eligibility like AND and OR patterns. 

Did you check the attached patch ? I can add ordering there and push it for review if thats ok. 

And I didn’t try to look into delaying process delta. Do you think we need to do that ?
Comment 38 Julian Honnen CLA 2020-12-08 07:27:51 EST
(In reply to Gayan Perera from comment #37)
> Ok you are saying that we might have composite patterns in external projects
> which just a marker only cannot satisfy the eligibility like AND and OR
> patterns.
Exactly. 
> Did you check the attached patch ? I can add ordering there and push it for
> review if thats ok. 
No, please push all patches to gerrit. Or in this case: modify your existing one (make sure to pull my latest changeset first).

> And I didn’t try to look into delaying process delta. Do you think we need
> to do that ?
I had another look, the processDelta is not the issue but the lazy initialization of HierarchyScope.
Comment 39 Gayan Perera CLA 2020-12-10 16:36:18 EST
I incorporate Julians patch with my Parallelizable implementation. As soon as the IndexqueryRequester is not using a single synchronized instance it starts to give wrong results for many tests :(.
Comment 40 Gayan Perera CLA 2020-12-11 03:43:19 EST
I think i found the issue. It seems the patterns cannot be consumed parallel since some of instance fields are common to all threads. With the previous implementation a pattern is only invoked once because of synchronization. So now we need to make the patterns to be used by multiple threads by making the fields involved in data processing to be thread local
Comment 41 Julian Honnen CLA 2020-12-11 11:06:15 EST
(In reply to Gayan Perera from comment #40)
> I think i found the issue. It seems the patterns cannot be consumed parallel
> since some of instance fields are common to all threads. With the previous
> implementation a pattern is only invoked once because of synchronization. So
> now we need to make the patterns to be used by multiple threads by making
> the fields involved in data processing to be thread local
Any pattern that has mutable state should declare supportsParallelSearch=false for now. Then we can make them thread safe in followup bugs.
Comment 42 Julian Honnen CLA 2020-12-11 11:09:23 EST
BTW: the current patchset #13 without the synchronized requestor has no test failures.
Comment 43 Gayan Perera CLA 2020-12-11 14:02:00 EST
(In reply to Julian Honnen from comment #41)
> (In reply to Gayan Perera from comment #40)
> > I think i found the issue. It seems the patterns cannot be consumed parallel
> > since some of instance fields are common to all threads. With the previous
> > implementation a pattern is only invoked once because of synchronization. So
> > now we need to make the patterns to be used by multiple threads by making
> > the fields involved in data processing to be thread local
> Any pattern that has mutable state should declare
> supportsParallelSearch=false for now. Then we can make them thread safe in
> followup bugs.

If we do that, then there will be not much we will gain from this work. I think i can make the patterns thread safe, at least the ones which are used for Type Hierarchy and SubType.

The current patchset it working because of the scope lock. Which blocks the Patterns from getting parallel executed.
Comment 44 Eclipse Genie CLA 2020-12-13 06:44:54 EST
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/173731
Comment 45 Gayan Perera CLA 2020-12-13 14:45:50 EST
@Julian @Andrey I have uploaded the latest patch with support for preserving the order as well. Please check and let me know your feedback.
Comment 46 Eclipse Genie CLA 2020-12-30 10:51:47 EST
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.ui/+/174149
Comment 49 Eclipse Genie CLA 2021-01-04 04:04:35 EST
New Gerrit change created: https://git.eclipse.org/r/c/www.eclipse.org/eclipse/news/+/174208
Comment 50 Andrey Loskutov CLA 2021-01-04 04:17:21 EST
The patch is merged and parallel search is enabled by default, so unless we will find some last minute major issue, it will be in 4.19 M1.

New preference option was added and enabled by default: Preferences > Java > [x] Enable parallel index search. Depending on available hardware, this option should improve performance for all index based Java search operations, but could also lead to possible regressions. To switch back to the old sequential index search, turn this option off.

@Gunnar / Julian / Gayan: I see that different search operations are faster now, haven't however reproduced original problem with 5000 index files, so it would be nice if you could double check if the original issue is resolved too.

Resolving for now, if you see any issues with new search, feel free to reopen this bug.
Comment 52 Sarika Sinha CLA 2021-01-04 04:56:10 EST
The subject line and "regression" keyword needs to be re looked for this bug. 

This has been a good work and substring completion just unearthed the existing issues.
Comment 53 Gayan Perera CLA 2021-01-04 06:18:29 EST
(In reply to Andrey Loskutov from comment #50)
> The patch is merged and parallel search is enabled by default, so unless we
> will find some last minute major issue, it will be in 4.19 M1.
> 
> New preference option was added and enabled by default: Preferences > Java >
> [x] Enable parallel index search. Depending on available hardware, this
> option should improve performance for all index based Java search
> operations, but could also lead to possible regressions. To switch back to
> the old sequential index search, turn this option off.
> 
> @Gunnar / Julian / Gayan: I see that different search operations are faster
> now, haven't however reproduced original problem with 5000 index files, so
> it would be nice if you could double check if the original issue is resolved
> too.
> 
> Resolving for now, if you see any issues with new search, feel free to
> reopen this bug.

I checked the my large project and it seems its better now, but still its slow when compared to IJ. I think now the problem is in the hierarchy builder which tries to perform the same index search with types that are found in current search iteration. But i think we will gain nothing by trying to make that parallel as well. So we need to find a way to limit the search in a meaning full manner. 

Because for example in mu project i have a interface which is implemented by 20 classes. When try to see all implementation of this interface using TH, the time i need to wait until i get 20 classes is too much.
Comment 54 Lars Vogel CLA 2021-01-06 08:01:10 EST
We should also add new key words to the search for this setting to make it easier for the user to find this setting. Opened Bug 570132 for that.
Comment 55 Lars Vogel CLA 2021-01-06 08:12:06 EST
Thanks to Julian, Gayan and Andrey for this work, JDT search feels much faster to me in my "all eclipse top-level projects" workspace.
Comment 56 Vikas Chandra CLA 2021-01-07 02:24:26 EST
verified that new option is present and enabled by default.
Comment 57 Vikas Chandra CLA 2021-01-07 03:17:38 EST
verified on
Version: 2021-03 (4.19)
Build id: I20210102-1800
Comment 58 Wim Jongman CLA 2021-01-07 13:52:55 EST
Created attachment 285224 [details]
Type Hierarchy duplicates

I see type hierarchy duplicates with the new indexer.
Comment 59 Andrey Loskutov CLA 2021-01-07 14:40:21 EST
(In reply to Wim Jongman from comment #58)
> Created attachment 285224 [details]
> Type Hierarchy duplicates
> 
> I see type hierarchy duplicates with the new indexer.

I assume you have tried with non-parallel search too, and it was OK?

Please provide steps to reproduce, I can't reproduce it with org.eclipse.jface.dialogs.InputDialog & I20210106-1800.

My steps: open org.eclipse.jface.dialogs.InputDialog in the editor, right click and say "Open Type Hierarchy". Another way: "Ctrol+3" and say "Open type in Hierarchy..". Both ways I see only single org.eclipse.jface.window.Window type.
Comment 60 Wim Jongman CLA 2021-01-07 16:24:52 EST
(In reply to Andrey Loskutov from comment #59)

It's the same in both indexes. 

After some investigation I found it was caused by the PDE pref setting "include all plugins from target ..."