Bug 570078 - Open Type Hierarchy is slow and cause UI Freeze for large projects
Summary: Open Type Hierarchy is slow and cause UI Freeze for large projects
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.8.2   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 4.21 M2   Edit
Assignee: Gayan Perera CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
: 570395 574817 (view as bug list)
Depends on: 574171 574172 574845
Blocks:
  Show dependency tree
 
Reported: 2021-01-05 05:38 EST by Gayan Perera CLA
Modified: 2022-03-10 09:51 EST (History)
12 users (show)

See Also:


Attachments
single type hierarchy creation with patch set 1 (11.29 KB, application/octet-stream)
2021-01-11 03:52 EST, Julian Honnen CLA
no flags Details
same type hierarchy without patch (9.12 KB, application/octet-stream)
2021-01-11 05:59 EST, Julian Honnen CLA
no flags Details
patchset 2 (10.57 KB, application/octet-stream)
2021-01-12 02:22 EST, Julian Honnen CLA
no flags Details
patch set 3 (9.00 KB, application/octet-stream)
2021-01-13 02:28 EST, Julian Honnen CLA
no flags Details
patch3 withfix cold eclipse (57.08 KB, application/octet-stream)
2021-01-13 16:34 EST, Gayan Perera CLA
no flags Details
Patch3 withfix warm eclipse (33.20 KB, application/octet-stream)
2021-01-13 16:35 EST, Gayan Perera CLA
no flags Details
Patch3 withoutfix cold eclipse (59.09 KB, application/octet-stream)
2021-01-13 16:35 EST, Gayan Perera CLA
no flags Details
Patch3 withoutfix warm eclipse (35.04 KB, application/octet-stream)
2021-01-13 16:36 EST, Gayan Perera CLA
no flags Details
IJ cold TH (155.78 KB, application/octet-stream)
2021-01-13 16:42 EST, Gayan Perera CLA
no flags Details
IJ cold TH (155.78 KB, application/octet-stream)
2021-01-13 16:43 EST, Gayan Perera CLA
no flags Details
IJ warm TH (69.24 KB, application/octet-stream)
2021-01-13 16:44 EST, Gayan Perera CLA
no flags Details
out.txt (888.47 KB, text/plain)
2021-02-11 11:55 EST, Gayan Perera CLA
no flags Details
Large workspace snapshot (79.57 KB, application/octet-stream)
2021-02-11 11:56 EST, Gayan Perera CLA
no flags Details
hierarchy change test (1.13 KB, text/plain)
2021-03-15 05:32 EDT, Julian Honnen CLA
no flags Details
failing test case for hierarchy change (1.30 KB, text/plain)
2021-03-15 07:48 EDT, Julian Honnen CLA
no flags Details
Stat Scrips Python (1.54 KB, application/zip)
2021-05-01 14:52 EDT, Gayan Perera CLA
no flags Details
pde deadlock (141.46 KB, text/plain)
2021-06-12 15:44 EDT, Andrey Loskutov CLA
no flags Details
navigator deadlock (91.82 KB, text/plain)
2021-06-12 15:44 EDT, Andrey Loskutov CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gayan Perera CLA 2021-01-05 05:38:26 EST
Even with the recent parallel search enabled the Type Hierarchy is still slow for projects which has lot of dependencies. The dependencies include all transitive dependencies as well since the TH search runs through all indexes for all dependencies.

Even though the result is less like 20 subtypes, it take lot of time since the TH builder will run through all the indexes every time it finds a subtype to expand it.

Here I would like to suggest that we try to optimize this operation either by making the search asynchronous or do the expansion at n + 1 level and repeat it when user expand that n + 1 level.

If you have more better ideas please let discuss.
Comment 1 Eclipse Genie CLA 2021-01-09 14:49:19 EST
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/174531
Comment 2 Gayan Perera CLA 2021-01-09 17:06:50 EST
The current implementation is 2x faster from the tests i did.
Comment 3 Stephan Herrmann CLA 2021-01-10 07:58:59 EST
(In reply to Gayan Perera from comment #0)
> Here I would like to suggest that we try to optimize this operation either
> by making the search asynchronous or do the expansion at n + 1 level and
> repeat it when user expand that n + 1 level.

regarding the incremental approach: do you have an idea how to still fulfill the contract of ITypeHierarchy.getAllSubtypes() (which explicitly mentions "direct and indirect")?

Do you plan to make this a new option of TypeHierarchy and CreateTypeHierarchyOperation?
Comment 4 Gayan Perera CLA 2021-01-10 11:35:04 EST
(In reply to Stephan Herrmann from comment #3)
> (In reply to Gayan Perera from comment #0)
> > Here I would like to suggest that we try to optimize this operation either
> > by making the search asynchronous or do the expansion at n + 1 level and
> > repeat it when user expand that n + 1 level.
> 
> regarding the incremental approach: do you have an idea how to still fulfill
> the contract of ITypeHierarchy.getAllSubtypes() (which explicitly mentions
> "direct and indirect")?
> 
> Do you plan to make this a new option of TypeHierarchy and
> CreateTypeHierarchyOperation?

Hi Stephan

I took another approach after looking closer at the problem we have right now before going for drastic change in building the Type Hierarchy. Because these methods are using in internal JDT features as well in addition to Type Hierarchy view. So i thought why not optimize the operation while keeping the existing API contract.

I have created a gerrit with the change. The idea was to accumulate the result of the single search iteration and feed it to again to the next search iteration instead of doing search iteration for each result one by one.

Basically the earlier operation was like: given a hierarchy with n number of nodes, there will be n+1 number of search iterations.

With my fix we will have d + 1 search iterations where d is depth of the hierarchy.
Comment 5 Julian Honnen CLA 2021-01-11 03:52:29 EST
Created attachment 285240 [details]
single type hierarchy creation with patch set 1
Comment 6 Gayan Perera CLA 2021-01-11 04:20:15 EST
@Julian can i ask you to attach a snapshot without this fix as well ? And also can you check the quick type hierarchy as well to see whether there is a difference?
Comment 7 Julian Honnen CLA 2021-01-11 05:59:31 EST
Created attachment 285244 [details]
same type hierarchy without patch

quick type hierarchy performs the same as the regular one
Comment 8 Andrey Loskutov CLA 2021-01-11 06:12:18 EST
(In reply to Julian Honnen from comment #7)
> Created attachment 285244 [details]
> same type hierarchy without patch
> 
> quick type hierarchy performs the same as the regular one

Julian, how one can open .nps files? Is the only way to use Netbeans? Can you share something else (png/csv/whatever)?
Comment 9 Gayan Perera CLA 2021-01-11 06:21:00 EST
@Andrey you can use visualvm.
Comment 10 Andrey Loskutov CLA 2021-01-11 06:23:26 EST
(In reply to Gayan Perera from comment #9)
> @Andrey you can use visualvm.

Not really, my OpenJDK from Red Hat doesn't have it :-)
Comment 11 Gayan Perera CLA 2021-01-11 06:35:38 EST
Download from here https://visualvm.github.io/
Comment 12 Gayan Perera CLA 2021-01-11 12:44:36 EST
Found the issue, the problem is with the SubType Index cache, Previously the indexes are startQuery once for the whole search and stoped after finishing, But with this change it is called each iteration since we create a new job. This is getting highlighted when searching for small hierarchies. 

I will look into fixing this.
Comment 13 Gayan Perera CLA 2021-01-11 16:19:38 EST
@Julian and @Andrey i fixed the issue, please check again with your datasets. I profiled the before and after and the after fix measurement is 500ms less. But since those results are effected by instrumentation. Please let me know what you find.
Comment 14 Julian Honnen CLA 2021-01-12 02:22:01 EST
Created attachment 285256 [details]
patchset 2

Better, but on average still a bit slower than before.

What is a "small" hierarchy in your tests?
Comment 15 Gayan Perera CLA 2021-01-12 02:30:50 EST
It was a simple 2 level class hierarchy. But i think the problem you might is due to the wrong focus type update on the pattern. From the test failures i see that we are processing lot of classes which might be the reason for slowness you see.
Comment 16 Gayan Perera CLA 2021-01-12 12:16:36 EST
@Julian can you check with the latest patchset ?
Comment 17 Julian Honnen CLA 2021-01-13 02:28:24 EST
Created attachment 285266 [details]
patch set 3

it's now about as fast as the original code
Comment 18 Gayan Perera CLA 2021-01-13 03:04:45 EST
@Julian can you try it on large project with many dependencies with considerable hierarchy to see if the patchset 3 is efficient than before ?
Comment 19 Julian Honnen CLA 2021-01-13 03:29:11 EST
(In reply to Gayan Perera from comment #18)
> @Julian can you try it on large project with many dependencies with
> considerable hierarchy to see if the patchset 3 is efficient than before ?
All my tests had exactly this scenario.
Comment 20 Gayan Perera CLA 2021-01-13 03:34:12 EST
@Julian you sample rate is set to minimum right which is 20ms ?
Comment 21 Gayan Perera CLA 2021-01-13 05:27:43 EST
@Andrey will you be able to do some tests to see if we have improved the TH build with large projects ?
Comment 22 Andrey Loskutov CLA 2021-01-13 06:18:51 EST
(In reply to Gayan Perera from comment #21)
> @Andrey will you be able to do some tests to see if we have improved the TH
> build with large projects ?

Not soon, we have few major issues that require my time, bug 570238 & bug 569878 :-(
Comment 23 Julian Honnen CLA 2021-01-13 06:39:08 EST
(In reply to Gayan Perera from comment #20)
> @Julian you sample rate is set to minimum right which is 20ms ?
No, it was sampled with 100ms.

My workspace has 1800 bundles, with >1000 (transitive) dependents on the types I've tested.
I don't know in which cases you saw a performance improvement, but the new algorithm apparently doesn't scale any better than the old one.
Comment 24 Gayan Perera CLA 2021-01-13 09:01:08 EST
(In reply to Julian Honnen from comment #23)
> (In reply to Gayan Perera from comment #20)
> > @Julian you sample rate is set to minimum right which is 20ms ?
> No, it was sampled with 100ms.
> 
> My workspace has 1800 bundles, with >1000 (transitive) dependents on the
> types I've tested.
> I don't know in which cases you saw a performance improvement, but the new
> algorithm apparently doesn't scale any better than the old one.

I will try to upload my sample and profile snapshots. I think i need to run both types on my spring framework workspace and see.
Comment 25 Gayan Perera CLA 2021-01-13 12:11:01 EST
@Julian you are correct there is not significant improvement over the new approach. Seems like the initial improvement i saw must be due to flaw in my sampling.
Comment 26 Gayan Perera CLA 2021-01-13 14:45:26 EST
I will try if to see if there any other way to improve the search. But does anyone has any suggestion on improving the type hierarchy is different way ? At least the Quick Type Hierarchy. I assume the Quick should be Quick, may be just show the TH from current project scope or only show the first level of subtypes and may be have a toggle to expand it with a preference to set the default toggle mode.

WDYT ?
Comment 27 Gayan Perera CLA 2021-01-13 16:34:57 EST
Created attachment 285275 [details]
patch3 withfix cold eclipse
Comment 28 Gayan Perera CLA 2021-01-13 16:35:23 EST
Created attachment 285276 [details]
Patch3 withfix warm eclipse
Comment 29 Gayan Perera CLA 2021-01-13 16:35:49 EST
Created attachment 285277 [details]
Patch3 withoutfix cold eclipse
Comment 30 Gayan Perera CLA 2021-01-13 16:36:26 EST
Created attachment 285278 [details]
Patch3 withoutfix warm eclipse
Comment 31 Gayan Perera CLA 2021-01-13 16:38:04 EST
@Julian please look at my new snapshots, earlier i was testing on a wrong branch. if you look at both warm and cold with and without fix you can see a improvement in the subtype search. I will see whats going on in the buildFromPotentialSubtypes as well.
Comment 32 Gayan Perera CLA 2021-01-13 16:42:04 EST
Created attachment 285279 [details]
IJ cold TH
Comment 33 Gayan Perera CLA 2021-01-13 16:43:11 EST
Created attachment 285280 [details]
IJ cold TH
Comment 34 Gayan Perera CLA 2021-01-13 16:44:07 EST
Created attachment 285281 [details]
IJ warm TH
Comment 35 Gayan Perera CLA 2021-01-13 16:45:10 EST
In IJ snapshots search for com.intellij.ide.hierarchy.type.SubtypesHierarchyTreeStructure.buildChildren ()

You will see with this patch we are really close with the times compare to IJ on the same project and same TH
Comment 36 Julian Honnen CLA 2021-01-14 02:25:54 EST
(In reply to Gayan Perera from comment #25)
> @Julian you are correct there is not significant improvement over the new
> approach. Seems like the initial improvement i saw must be due to flaw in my
> sampling.
Better to measure timings directly (e.g. in TypeHierarchy::compute) rather than look via sampling.

(In reply to Gayan Perera from comment #26)
> I will try if to see if there any other way to improve the search. But does
> anyone has any suggestion on improving the type hierarchy is different way ?
> At least the Quick Type Hierarchy. I assume the Quick should be Quick, may
> be just show the TH from current project scope or only show the first level
> of subtypes and may be have a toggle to expand it with a preference to set
> the default toggle mode.
> 
> WDYT ?
Computing it only for the current project would make it worthless for many projects.
Comment 37 Gayan Perera CLA 2021-01-14 03:32:16 EST
> Computing it only for the current project would make it worthless for many
> projects.

What about the second option. Computing the first level to start with and build the rest on demand with the toggle I explained?

Please check the new samples as well. Hot and cold once. I wonder why you don’t see such improvement in your samples.
Comment 38 Gayan Perera CLA 2021-01-14 03:41:23 EST
> Better to measure timings directly (e.g. in TypeHierarchy::compute) rather
> than look via sampling.

You mean add support to dump the timing in verbose mode ?
Comment 39 Stephan Herrmann CLA 2021-01-14 06:45:42 EST
(In reply to Gayan Perera from comment #37)
> > Computing it only for the current project would make it worthless for many
> > projects.
> 
> What about the second option. Computing the first level to start with and
> build the rest on demand with the toggle I explained?

Providing new API for this purpose is one thing. The other is to think of an intuitive gesture. You are certainly aware of the fact that the Quick Type Hierarchy already has a toggle to switch between super and sub types. What is your proposal for a new gesture?

Also note, that there is a text field to search within the hierarchy. What behavior do you have in mind for this, when the number of levels is constrained?
Comment 40 Gayan Perera CLA 2021-01-14 07:10:56 EST
> Providing new API for this purpose is one thing. The other is to think of an
> intuitive gesture. You are certainly aware of the fact that the Quick Type
> Hierarchy already has a toggle to switch between super and sub types. What
> is your proposal for a new gesture?

It is to get the first level of hierarchy quickly. So that you can switch to full hierarchy if you want to. 


> Also note, that there is a text field to search within the hierarchy. What
> behavior do you have in mind for this, when the number of levels is
> constrained?

Well i do see it as a filter. So even with one level you can 100s of nodes in that level. So this could me use to filter the types you want out of that list.
Comment 41 Gayan Perera CLA 2021-01-14 07:19:16 EST
Also Yes this behavior should be in new (overloaded) API. So the old API should always return the full hierarchy
Comment 42 Stephan Herrmann CLA 2021-01-14 07:50:20 EST
(In reply to Gayan Perera from comment #40)
> > Providing new API for this purpose is one thing. The other is to think of an
> > intuitive gesture. You are certainly aware of the fact that the Quick Type
> > Hierarchy already has a toggle to switch between super and sub types. What
> > is your proposal for a new gesture?
> 
> It is to get the first level of hierarchy quickly. So that you can switch to
> full hierarchy if you want to. 

This doesn't answer the question about a gesture :)
(key/modifier press? button click? menu option? swipe? blink "SOS" with both eyes :) ?)

> > Also note, that there is a text field to search within the hierarchy. What
> > behavior do you have in mind for this, when the number of levels is
> > constrained?
> 
> Well i do see it as a filter. So even with one level you can 100s of nodes
> in that level. So this could me use to filter the types you want out of that
> list.

If name-search operates on an incomplete hierarchy, we need clear visual indication that more hits could be found when searching more levels. IMHO, in this scenario good visual feedback is even more important than when directly navigating the tree. Search is more of a global operation, not typically bound to layers.

Also: should the gesture expand all further levels at once (that would still be expensive, right?), or only one at a time? If the latter, how will users see whether or not more levels can still be expanded?

IMHO, only if all these questions are answered with a good concept, it's worth considering the proposal, because users confused by a complex UI could easily spend more time navigating their way through options than just waiting for the full hierarchy in the first place.

E.g., I don't want the following dialog in a future bug report:
User: Quick Type Hierarchy no longer finds my type, although I know it's there.
Dev: What do you mean by "no longer finds my type"
... several answers later: ...
User: I started to type the name of the type in the search field of the dialog, after 4 characters typed, the list was empty.
Dev: Did you expand all levels of the hierarchy?
... several answers later: ...
User: No.
Dev closes as WORKSFORME (while still having to write a nice explanation to the user).
Comment 43 Gayan Perera CLA 2021-01-14 08:51:22 EST
(In reply to Stephan Herrmann from comment #42)
> (In reply to Gayan Perera from comment #40)
> > > Providing new API for this purpose is one thing. The other is to think of an
> > > intuitive gesture. You are certainly aware of the fact that the Quick Type
> > > Hierarchy already has a toggle to switch between super and sub types. What
> > > is your proposal for a new gesture?
> > 
> > It is to get the first level of hierarchy quickly. So that you can switch to
> > full hierarchy if you want to. 
> 
> This doesn't answer the question about a gesture :)
> (key/modifier press? button click? menu option? swipe? blink "SOS" with both
> eyes :) ?)
> 

Ahh sorry for misunderstanding the question. Well I’m not really good at UX stuff so thats why i just mentioned way of toggling. Since what we is a keybinding there i would say another keybinding for full and shallow hierarchy
Comment 44 Gayan Perera CLA 2021-01-14 12:04:43 EST
I do agree that making a the UI more complex with partially loading the hierarchy can confuse the users. so i would think this as the last option to go for.

Even if we go for that, i would keep it at two step like
1. load the first level 
2. load everything else

But this needs to be verified with majority of developer use case with type hierarchy.
Comment 45 Andrey Loskutov CLA 2021-01-17 14:17:49 EST
*** Bug 570395 has been marked as a duplicate of this bug. ***
Comment 46 Gayan Perera CLA 2021-01-17 14:39:58 EST
I sharing the questionnaire link here as well https://forms.gle/LjyNe2VAKcRvYyfy9
Comment 47 Gayan Perera CLA 2021-02-11 11:55:16 EST
Created attachment 285525 [details]
out.txt

This is a patched jdt version i ran to capture how many entries are read from addQueryResults and how many have been returned.

As you can see in this case there are nearly 15k index reads while building the TH for just 3 classes including the parent.
Comment 48 Gayan Perera CLA 2021-02-11 11:56:21 EST
Created attachment 285526 [details]
Large workspace snapshot
Comment 49 Gayan Perera CLA 2021-02-11 12:31:58 EST
Provided with the data from my workplace workspace, i feel TH build actually run through all dependencies in all projects in the workspace. Is that case ? @Julian and @Andrey
Comment 50 Gayan Perera CLA 2021-02-11 15:25:41 EST
I just checked the code, basically the subtypes i'm looking for is actually only in one jar file, but it nearly search in 1273 jar and projects, this is done recursively. And the number of indexes the IndexSelector gives is 658.

So i think its really good if we can narrow down the indexes which we gonna search. Because in this case i only have the subtypes only in one jar file, which means if i new the index, i only want to search that.

When i say if i knew, i mean if we could locate indexes from package fragment usages,

like 
Index123 = com.google.guava, org.apache.commons

Now this will say that the index123 contains usages of these package fragments which means when i do a subtype search for anything inside com.google.guava i will search Index123 to find more accurate results.

WDYT @Andrey @Lars @Julian about this idea ?
Comment 51 Lars Vogel CLA 2021-02-14 10:21:12 EST
(In reply to Gayan Perera from comment #50)
> WDYT @Andrey @Lars @Julian about this idea ?

Sounds like a cool idea. Please provide a Gerrit so that other can test this approach.
Comment 52 Gayan Perera CLA 2021-03-02 12:39:49 EST
Some early results from the work.

Fix fix: 1162ms
Without fix: 2139ms

This is for a small workspace with 21 index files.

I will try this fix with my 3k index file workspace as well. And once master is open for 4.20 i will submit a patch to get feedback.
Comment 53 Gayan Perera CLA 2021-03-06 14:24:59 EST
Final result after fixing some bugs

Without Fix
Thread[ModalContext,6,main] -> execution time: 4897ms -IndexBasedHierarchyBuilder


With Fix
Thread[ModalContext,6,main] -> execution time: 673ms - IndexBasedHierarchyBuilder


@Andrey is it ok to push the gerrit for reviews right now ? Or should i wait till 4.19 is released and master is opened.
Comment 54 Andrey Loskutov CLA 2021-03-06 17:27:58 EST
(In reply to Gayan Perera from comment #53)
> @Andrey is it ok to push the gerrit for reviews right now ? Or should i wait
> till 4.19 is released and master is opened.

You can push gerrit at any time, it is merge what is not allowed yet.
Comment 55 Eclipse Genie CLA 2021-03-07 05:55:32 EST
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/177321
Comment 56 Stephan Herrmann CLA 2021-03-07 08:23:50 EST
(In reply to Gayan Perera from comment #50)
> So i think its really good if we can narrow down the indexes which we gonna
> search. Because in this case i only have the subtypes only in one jar file,
> which means if i new the index, i only want to search that.
> 
> When i say if i knew, i mean if we could locate indexes from package
> fragment usages,

I'm not an expert in JDT's indexing, but I recall that the indexer tries hard to avoid resolving type references. Therefor, I'm not sure that the indexer will see all relevant package references.

I saw you hook into acceptImport(), which is good, but not all type references necessarily use either qualified name or import. I don't have a counter example at hand, but obviously inheritance or nesting can make a type accessible without ever explicitly referring to its qualified name. So, can you argue that this will not cause your new index to miss necessary entries?

Functional interfaces already created the need to perform resolving, so that gap has already been filled - has it? I.e., do you capture (packages of) functional types of lambdas and method references?

Search encounters many situations, where a word based match via the index is only a potential match until MatchLocator resolved everything and confirms the match. Are all these cases covered by evaluating imports?
Comment 57 Gayan Perera CLA 2021-03-07 09:22:44 EST
(In reply to Stephan Herrmann from comment #56)
Good point Stephan, i think i need to help from everyone who are experienced in JDT to improve this further.

> 
> I saw you hook into acceptImport(), which is good, but not all type
> references necessarily use either qualified name or import. I don't have a
> counter example at hand, but obviously inheritance or nesting can make a
> type accessible without ever explicitly referring to its qualified name. So,
> can you argue that this will not cause your new index to miss necessary
> entries?

When referring a type, either it should be imported, or a start import or it should be referred as a qualified name. What i found out is,
- When processing binary class files, all references are FQNs, such as field types, super classes, super interfaces etc.

- When processing source files, until the CU is resolved we don't know the qualified names. But unless a type from java.lang, all others should be either FQNs or they should be imported. In a nested situation it will be the same. For example if we have class x.y.z.A.B extending x.y.z.A, then that index will added as it contains x.y.z since i collect package name of a CU as a qualified as well. Please let me know if i'm missing some aspect here.


> 
> Functional interfaces already created the need to perform resolving, so that
> gap has already been filled - has it? I.e., do you capture (packages of)
> functional types of lambdas and method references?

- I think i need to add support for lambda's, i will update the code.
 
> Search encounters many situations, where a word based match via the index is
> only a potential match until MatchLocator resolved everything and confirms
> the match. Are all these cases covered by evaluating imports?

The new index will only narrow down the number of indexes that you need to perform the word base search. Based on the focus type package, if package is not present the search will search on all indexes, this is same for java.lang package as well.
But here if i missing something please let me know.
Comment 58 Gayan Perera CLA 2021-03-07 09:37:18 EST
@Stephan the functional types are indexed as well. Added a unit test for lambdas.
Comment 59 Stephan Herrmann CLA 2021-03-07 11:26:47 EST
(In reply to Gayan Perera from comment #57)
> (In reply to Stephan Herrmann from comment #56)
> Good point Stephan, i think i need to help from everyone who are experienced
> in JDT to improve this further.
> 
> > 
> > I saw you hook into acceptImport(), which is good, but not all type
> > references necessarily use either qualified name or import. I don't have a
> > counter example at hand, but obviously inheritance or nesting can make a
> > type accessible without ever explicitly referring to its qualified name. So,
> > can you argue that this will not cause your new index to miss necessary
> > entries?
> 
> When referring a type, either it should be imported, or a start import or it
> should be referred as a qualified name. What i found out is,
> - When processing binary class files, all references are FQNs, such as field
> types, super classes, super interfaces etc.

correct.

> - When processing source files, until the CU is resolved we don't know the
> qualified names. But unless a type from java.lang, all others should be
> either FQNs or they should be imported. In a nested situation it will be the
> same. For example if we have class x.y.z.A.B extending x.y.z.A, then that
> index will added as it contains x.y.z since i collect package name of a CU
> as a qualified as well. Please let me know if i'm missing some aspect here.

Let me try to be mean: 

in package p1:
   class COuter {
        protected class Inner { ... }
        ...
   }
in package p2:
   class Middle extends COuter {}
in package p3:
   class C extends p2.Middle {
        Inner f;
   }

p3.C contains a reference to p1.COuter.Inner but never mentionds package p1.

> > Functional interfaces already created the need to perform resolving, so that
> > gap has already been filled - has it? I.e., do you capture (packages of)
> > functional types of lambdas and method references?
> 
> - I think i need to add support for lambda's, i will update the code.

OK, I hope the way how lambdas are indexed will point to you a solution for this issue.

 
> > Search encounters many situations, where a word based match via the index is
> > only a potential match until MatchLocator resolved everything and confirms
> > the match. Are all these cases covered by evaluating imports?
> 
> The new index will only narrow down the number of indexes that you need to
> perform the word base search. Based on the focus type package, if package is
> not present the search will search on all indexes, this is same for
> java.lang package as well.
> But here if i missing something please let me know.

See above: if all packages are in different indexes then searching for references to p1.COuter.Inner will not know that it needs to search the index for p3 - which would be a bug. Am I missing anything?
Comment 60 Gayan Perera CLA 2021-03-07 12:36:30 EST
@Stephan, i see your point. I guess then we need to make sure that if there are any types (super, superinterfaces, return types, parameter types, field types etc) not available in imports, then we do a CU resolve and use the FQNs for indexing.

Do you see any other way ?
Comment 61 Gayan Perera CLA 2021-03-08 03:59:25 EST
If someone can let me know if my above suggestion is a good way forward i can try to implement it and see how it goes.
Comment 62 Stephan Herrmann CLA 2021-03-08 08:58:06 EST
(In reply to Gayan Perera from comment #60)
> @Stephan, i see your point. I guess then we need to make sure that if there
> are any types (super, superinterfaces, return types, parameter types, field
> types etc) not available in imports, then we do a CU resolve and use the
> FQNs for indexing.
> 
> Do you see any other way ?

I basically agree. Still asking for some clarifications:

How exactly do you plan to determine "available in imports"?
(1) Will wildcard imports make a difference?
(2) Will you detect 'partial imports' like "import a.B" then reference "B.C"?
(3) Could the strategy be tricked, still, due to shadowing? I.e., you believe an import to be used, where in fact it is not?

I think we agree that resolving must be kept at a minimum to avoid performance degradation of the indexing operation.

I assume you have a plan (similar to functional types?) how resolved information can be used for your purposes.


More globally speaking: can we assume the additional meta-index to be purely optional? Asking specifically because of the pre-built-indexes feature.
Comment 63 Gayan Perera CLA 2021-03-08 11:55:01 EST
(In reply to Stephan Herrmann from comment #62)
> 
> I basically agree. Still asking for some clarifications:
> 
> How exactly do you plan to determine "available in imports"?
> (1) Will wildcard imports make a difference?
No we cannot check if a given types comes from a whild card until we resolve.

> (2) Will you detect 'partial imports' like "import a.B" then reference "B.C"?
Here if we think about Type "C", type "C" it self will not have a import. So we will try to resolve.

> (3) Could the strategy be tricked, still, due to shadowing? I.e., you
> believe an import to be used, where in fact it is not?
Yes this we need to make sure with the strategy we use to identify if resolution is needed.

> I think we agree that resolving must be kept at a minimum to avoid
> performance degradation of the indexing operation.
> 
> I assume you have a plan (similar to functional types?) how resolved
> information can be used for your purposes.
Yes, what i have in mind right now is, while visiting each type, method, field in source indexer, see if we have a relevant import for each type. if we find a type which is not part of imports we will mark to current CU to be resolved and then indexed. One downside is here we will resolve the CU in case if we have a FQN reference as well. But such a chance might be really low.
 
> 
> More globally speaking: can we assume the additional meta-index to be purely
> optional? Asking specifically because of the pre-built-indexes feature.
May be we could always include the pre-build-index names as part of the meta.index query results. Thats will solve the pre-build-index issue given that the pre-build-indexes usage will be minimum.
Comment 64 Gayan Perera CLA 2021-03-08 12:08:58 EST
I checked about pre-built-indexes, but what i found out is that, when we make a request the rebuild indexes, by deleting the index files, the pre-built-indexes will also get indexed again. So when the meta-index(i like this word) is not available we request to rebuild the indexes, so it should ideally process the pre-built indexes as well right ?
Comment 65 Gayan Perera CLA 2021-03-09 04:15:13 EST
Also further testing with some existing workspace shows that forcing the index rebuild seems to not update the new meta index correctly. I will dig more into it and see why that is happening. 

May be without been aggressive, i might be able to only filter the indexes which are available in the meta index and include rest of the index names without doing any filtering. That might not require a expensive index rebuild by default and solve the pre build indexes as well. 

But so we have a scenario where we partially update a index without rebuilding it ?
@Stephan and @Andrey
Comment 66 Gayan Perera CLA 2021-03-09 13:36:20 EST
Made the solution not to be aggressive on rebuilding indexes, so basically when indexes get rebuilt the performance will get better and better now.
Comment 67 Gayan Perera CLA 2021-03-12 05:06:41 EST
Resolving the CU seems to effect the index time greater. So i have a question in general. Today we only resolve if we find functional expressions. So going forward new projects will use lambdas and specially projects who use rx java libs will use lambda heavily. So at that point our indexing will degrade right ? @Manoj @Vikas @Stephan do we have a plan for that ? Or I’m I thinking unrealistic?
Comment 68 Gayan Perera CLA 2021-03-13 08:49:33 EST
We have another issue, When source documents are updated, the indexing only happens to that particular file only. So with that we are unable to update this meta index. It would be great if the original index can hold all the package qualifications as well so that once a index is rebuild we can query the up to date package qualifications of all documents inside that index and update the meta index. But this will increase the normal index size.

WDYT @Stephan, @Julian, @Vikas, @Manoj ?
Comment 69 Julian Honnen CLA 2021-03-15 03:41:00 EDT
I don't quite understood what exactly you index, but would the following work?

You have a type hierarchy C extends B extends A.
Now you change B to extend X instead. Can your index track that C no longer extends A or does that not matter?
Comment 70 Gayan Perera CLA 2021-03-15 05:06:20 EDT
@Julian this new index will maintain packages usages of each index. For example

java.util - 123765.index, 675489.index

So for example the type you are searching is in java.util you only need to search on those two indexes. 

Thats the high level idea. 

Now i have added to update this new index based on dependency changes and file changes as well. So if X in your question from different package. Yes the index will be updated. 

I have also added a fixed for managed indexes as well. So basically if a certain index is not part of my new index i will always include them in the result.
Comment 71 Gayan Perera CLA 2021-03-15 05:07:02 EDT
Please check the new IndexManagerTests class you will get an idea from that @Julian.
Comment 72 Julian Honnen CLA 2021-03-15 05:32:23 EDT
Created attachment 285838 [details]
hierarchy change test

I've attached a testcase with the scenario I had in mind.

But it seems that Q2 is never indexed in the first place. Is that correct?

Q1<E> extends java.util.ArrayList<E>
Q2<E> extends Q1<E>

My expectation would be that the new index includes both Q1 and Q2, but it only contains two index files even if Q2 doesn't exist.
Comment 73 Gayan Perera CLA 2021-03-15 07:17:11 EDT
Q1 and Q2 both are in same project you will have only 2 indexes. And in this new index you will not find Q1 and Q2 but rather the index names. Since you are importing java.util.ArrayList your project will be always part of the results.
Comment 74 Gayan Perera CLA 2021-03-15 07:18:02 EDT
Basically this new index will only narrow down the indexes to search based pn package usages. Rest is done by the pattern search.
Comment 75 Julian Honnen CLA 2021-03-15 07:48:09 EDT
Created attachment 285840 [details]
failing test case for hierarchy change

I've updated the testcase, it fails as I expected.

So a hierarchy change in X would have to reindex all subtypes of X.
Comment 76 Gayan Perera CLA 2021-03-15 08:07:00 EDT
@Julian this new index only knows about packages. It doesn’t know about hierarchy. So when you change Q3 that index will not be assigned to java.util since there is not direct java.util usage. Its part of the IndexHiearchyBuilder that find each level. But since we don’t index default packages, IHB will use all indexes when searching subclasses of a class which is in default package.
Comment 77 Julian Honnen CLA 2021-03-15 08:21:03 EDT
(In reply to Gayan Perera from comment #76)
> @Julian this new index only knows about packages. It doesn’t know about
> hierarchy. So when you change Q3 that index will not be assigned to
> java.util since there is not direct java.util usage. Its part of the
> IndexHiearchyBuilder that find each level. But since we don’t index default
> packages, IHB will use all indexes when searching subclasses of a class
> which is in default package.
I'm not thinking of type hierarchy searches in particular, but usages of this new index in general.

So if we try to apply this index to a "find references of java.util.ArrayList.someMethod" search, we would miss usages in Q3 because we never indexed that Q3 now has indirect access to java.util.*
I know that regular searches don't use this new index yet, but if we pay such a heavy cost at indexing time, that should at least theoratically be possible.
Comment 78 Gayan Perera CLA 2021-03-15 08:55:33 EDT
Good point Julian. My intention was to use this search for other search operations as well. But I didn’t gave much thought until now. Do you have any different suggestion for this new index on what we should index ?

And in your example about ArrayList.someMethod means if you call the super method right ?
Comment 79 Julian Honnen CLA 2021-03-15 09:44:03 EDT
(In reply to Gayan Perera from comment #78)
> And in your example about ArrayList.someMethod means if you call the super
> method right ?
That example is a bit contrived, but the following would be more realistic:

abstract class R1 {
  public abstract void test();
}
abstract class R2 {
  public abstract void test();
}


public abstract class A extends R1 { }
public class B extends A {
  @Override public void test() { }
}

Search for R1.test should find the overridden method in B.

If we then change A to extend R2 instead, the index must be updated so that a search for R2.test finds the method.
Comment 80 Gayan Perera CLA 2021-03-15 10:15:23 EDT
I think what you are looking for is call hierarchy right. Because there is no reference to R1.test in your sub classes. I tried with a unmodified eclipse as well. For example what you are saying is correct you should get all sub types of List classes when search for List.add right ?
Comment 81 Julian Honnen CLA 2021-03-15 10:48:19 EDT
(In reply to Gayan Perera from comment #80)
> I think what you are looking for is call hierarchy right. Because there is
> no reference to R1.test in your sub classes. I tried with a unmodified
> eclipse as well. For example what you are saying is correct you should get
> all sub types of List classes when search for List.add right ?
No, I meant a declaration search for R1/R2.test (i.e. Ctrl+G) which should find the overridden method in B.


BTW: why does Ctrl+G find overridden methods, but Java Search dialog limited to declarations does not?
Comment 82 Gayan Perera CLA 2021-03-15 11:39:45 EDT
May be a different bug how matched results are resolved. But lets see how this indexing can be improved if declarations suppose to find overriden ones as well.
Comment 83 Gayan Perera CLA 2021-03-15 14:17:03 EDT
@Manoj @Vikas @Stephan is there any good reason we have IndexingParser which creates a inconsistence AST tree to reduce memory usage ?

For our new index implementation we would like to have imports correctly resolved and in future make it more accurate we might want all qualified references to be accurate.

For now i will enable correct import statements to fix the index time issue.
Comment 84 Stephan Herrmann CLA 2021-03-15 14:26:36 EDT
(In reply to Gayan Perera from comment #83)
> @Manoj @Vikas @Stephan is there any good reason we have IndexingParser which
> creates a inconsistence AST tree to reduce memory usage ?

What kind of inconsistency are you referring to?
Comment 85 Gayan Perera CLA 2021-03-16 02:36:14 EDT
(In reply to Stephan Herrmann from comment #84)
> (In reply to Gayan Perera from comment #83)
> > @Manoj @Vikas @Stephan is there any good reason we have IndexingParser which
> > creates a inconsistence AST tree to reduce memory usage ?
> 
> What kind of inconsistency are you referring to?

For example the IndexParser use 3 single i stances for ImportRef, QaulifiedNameRef and SingleNameRef. This cause the AST tree to have the last value all over the place for each those ref types. So the AST cannot be used for anything other than getting notified by the requester. But for imports it doesn’t work as well since imports are notified after finishing the parsing. The name refs are notified as Type and Unknown references while parsing.
Comment 86 Stephan Herrmann CLA 2021-03-16 16:19:40 EDT
(In reply to Gayan Perera from comment #85)
> (In reply to Stephan Herrmann from comment #84)
> > (In reply to Gayan Perera from comment #83)
> > > @Manoj @Vikas @Stephan is there any good reason we have IndexingParser which
> > > creates a inconsistence AST tree to reduce memory usage ?
> > 
> > What kind of inconsistency are you referring to?
> 
> For example the IndexParser use 3 single i stances for ImportRef,
> QaulifiedNameRef and SingleNameRef.

You're referring to "constant" fields in IndexParser, ok.

> This cause the AST tree to have the last
> value all over the place for each those ref types.

Right, I'm beginning to see what you mean. 
In fact the purpose of IndexParser is *not* to create valid AST but just to collect information into the index - on the fly: Any information is only kept as long as necessary.

I'd even say the result is illegal as an AST: it's not a tree since the three constant nodes are shared in many locations.

> So the AST cannot be used
> for anything other than getting notified by the requester.

correct

> But for imports
> it doesn’t work as well since imports are notified after finishing the
> parsing. The name refs are notified as Type and Unknown references while
> parsing.

Where do you see things concerning imports happening after parsing?

Right now I'm actually puzzled about the (originally) empty implementation of SourceIndexerRequestor.acceptImport(..). I see you're adding an implementation right here, which sounds like a good approach. If needed this will allow you to maintain your on list of import names locally in the requestor, no? Unfortunately you will not know, how many segments of the import name comprise the package, what's a type, potentially a member ...

What else is missing?


Answer to the original question: yes, indexing significantly "cripples" the compiler to do much less work, even already during parsing. Since {name,type}References are the leaves in the AST, I assume they would otherwise form the majority of all AST nodes, so not creating a new one each time assumebly takes a lot of pressure off memory / garbage collector.
Comment 87 Gayan Perera CLA 2021-03-17 04:28:36 EDT
(In reply to Stephan Herrmann from comment #86)
 
> Where do you see things concerning imports happening after parsing?

While parsing the imports they are recorded as type references. This loose some information about imports such if they are star imports or FQ imports. 

The collected import reference in the AST are notified after building the AST using the SourceIndexerRequestor.acceptImport. So basically if i had 10 import statements i will receive the last import 10 times instead of individual imports.  

> Right now I'm actually puzzled about the (originally) empty implementation
> of SourceIndexerRequestor.acceptImport(..). I see you're adding an
> implementation right here, which sounds like a good approach. If needed this
> will allow you to maintain your on list of import names locally in the
> requestor, no? Unfortunately you will not know, how many segments of the
> import name comprise the package, what's a type, potentially a member ...
> 
> What else is missing?
The information i need is here. But as I mentioned above i get the last import in source code repeatedly to the number of imports statements i have in the file. 
> 
> Answer to the original question: yes, indexing significantly "cripples" the
> compiler to do much less work, even already during parsing. Since
> {name,type}References are the leaves in the AST, I assume they would
> otherwise form the majority of all AST nodes, so not creating a new one each
> time assumebly takes a lot of pressure off memory / garbage collector.
But compared to the search improvement we will get having correct information, can we parse the source once and reuse the AST for typebinding resolution as well if we need to index resolved document. This might benefit some projects who are heavily using lambdas as well.
Comment 88 Gayan Perera CLA 2021-03-22 14:48:16 EDT
My latest patch contains the preparation for addressing @Julian's failing test scenarios for searching for declarations. The idea is keep two types of qualifications,

type_qualifications : These are qualification of type usages such as super class, super interface qualifications. This will be used when building type hierarchy.

member_qualifications: These are qualifications from method, field invocations and access references , method parameters, method return types, field types and resolving overriden method qualifications.

I think trying to resolve this information will not impact on index type alot since i have NameEnviroment caching inplace and diet resolving in the source indexer.

@Julian @Stephan @Vikas let me know if you have any input for improving this approach.
Comment 89 Gayan Perera CLA 2021-03-22 17:16:44 EDT
I checked a bit more and it seems if we want to support other searches with this new index, we need to 

- resolve the full CU in first resolution in SourceIndexer, i mean without using the IndexParser. This will enable us to read all type reference package informations such as from FQ references without imports.

- using the above resolved CU we need to resolve typebindings to capture overriding methods from MethodBindings. But i haven't inspected if the MethodBinding contains information of the original method that it is overriding.

I really need your input on this @Julian @Stephan @Vikas.
Comment 90 Gayan Perera CLA 2021-03-22 17:30:44 EDT
And for binary indexes this might be more harder since we don’t have bindings. So i was thinking should we consider going this extra step since

- only type hierarchy search to a recursive search. And the new index reduced the number of index searched in each iteration. 

- other search operations are linear and we have parallel search for searching indexes parallel which should improve the search performance. Using new index should improve it more, but trying to capture required information has a cost. 

WDYT?
Comment 91 Stephan Herrmann CLA 2021-03-22 18:05:54 EDT
Gayan, after briefly scanning the change, it feels the code is not perfectly in line with the discussion here in the bug. Since this is a fundamental change at the core of JDT, I think it's time for some design documentation: please explain at some detail what is your strategy and where in the code do we find which part?

Perhaps a wiki page is better suited than bugzilla comments, which cannot be updated as the design evolves.

Generally, I'm a bit worried about all this talk about resolving. Generally, the index is designed to work without any resolved information. It's the task for the MatchLocator to filter potential matches (syntax / word based) to answer only those that are semantically correct. Before we go that road of resolving, we need a very solid reason, why resolving is unavoidable (for functional types we have a reason).

Thinking out of the box: *if* we would think that resolving should naturally be part of indexing, than indexing should probably be tied much closer to building. That's where all sources are resolved. Just you would break all people working without "build automatically".


As for caching NameEnvironments: these can be big structures. Can you convince us that this will not create a heap problem?


One more thought: I don't think it would hurt much, if the identification of relevant indexes is not precise: finding one or two indexes, which later prove to be irrelevant shouldn't cost too much. That would be in line with the existing design: start from an over-approximation and narrow it down later. Maybe this can help when handling source references which can have many shapes: even if you find a qualified name, it doesn't necessarily mean "package.Type". E.g.: when the indexer finds "Outer.Inner", it could just stupidly add "Outer" as a potential referenced package. What are the odds that s.o. will search for something in a *package* "Outer"? And if so, we will simply not find any references in that particular index. So?
Comment 92 Gayan Perera CLA 2021-03-23 04:07:06 EDT
(In reply to Stephan Herrmann from comment #91)
> Gayan, after briefly scanning the change, it feels the code is not perfectly
> in line with the discussion here in the bug. Since this is a fundamental
> change at the core of JDT, I think it's time for some design documentation:
> please explain at some detail what is your strategy and where in the code do
> we find which part?
> 
> Perhaps a wiki page is better suited than bugzilla comments, which cannot be
> updated as the design evolves.
Thats a good idea, How should I create a Wiki page and where should I create ?

> 
> Generally, I'm a bit worried about all this talk about resolving. Generally,

The resolving came into discussion with the comment https://bugs.eclipse.org/bugs/show_bug.cgi?id=570078#c59
 and then with the https://bugs.eclipse.org/bugs/show_bug.cgi?id=570078#c79
But I think we can skip the second one since it won't do a recursive search when finding declarations.


> the index is designed to work without any resolved information. It's the
> task for the MatchLocator to filter potential matches (syntax / word based)
> to answer only those that are semantically correct. Before we go that road
> of resolving, we need a very solid reason, why resolving is unavoidable (for
> functional types we have a reason).
> 
> Thinking out of the box: *if* we would think that resolving should naturally
> be part of indexing, than indexing should probably be tied much closer to
> building. That's where all sources are resolved. Just you would break all
> people working without "build automatically".
> 

Good point Stephan, trying resolve always might be overwhelming since we will end up resolving twice in compiler and indexer.



> 
> As for caching NameEnvironments: these can be big structures. Can you
> convince us that this will not create a heap problem?
> 

Well today if you have X number of source files which contains Lambdas in the same project, we end up creating X number of NameEnviroments in short period of time. When comparing the profile snapshots, the NameEnviroment creation takes 35% of the CPU time. And the curren caching is something very close to a LRU cache using SoftReferences and cleaning up at the end of the indexing and project changes/deletions. So rather than creating NameEnviroment X number of time with a high cost, having a single instance might be efficient for the GC as well. From my testing I didn't saw any significant increase in heap.
 


> One more thought: I don't think it would hurt much, if the identification of
> relevant indexes is not precise: finding one or two indexes, which later
> prove to be irrelevant shouldn't cost too much. That would be in line with
> the existing design: start from an over-approximation and narrow it down
> later. Maybe this can help when handling source references which can have
> many shapes: even if you find a qualified name, it doesn't necessarily mean
> "package.Type". E.g.: when the indexer finds "Outer.Inner", it could just
> stupidly add "Outer" as a potential referenced package. What are the odds
> that s.o. will search for something in a *package* "Outer"? And if so, we
> will simply not find any references in that particular index. So?

Yes, right now we only need resolution if we have an inner class usage which is the current type hierarchy without an explicit import like in https://bugs.eclipse.org/bugs/show_bug.cgi?id=570078#c59

And about "Outer.Inner", this exactly how I have implemented now, I just check for '.' and treat it as qualified and take the qualifier. I explained it in the gerrit comment as well.
Comment 93 Stephan Herrmann CLA 2021-03-23 06:57:46 EDT
(In reply to Gayan Perera from comment #92)
> (In reply to Stephan Herrmann from comment #91)
> > Gayan, after briefly scanning the change, it feels the code is not perfectly
> > in line with the discussion here in the bug. Since this is a fundamental
> > change at the core of JDT, I think it's time for some design documentation:
> > please explain at some detail what is your strategy and where in the code do
> > we find which part?
> > 
> > Perhaps a wiki page is better suited than bugzilla comments, which cannot be
> > updated as the design evolves.
> Thats a good idea, How should I create a Wiki page and where should I create
> ?

What about a sub page of https://wiki.eclipse.org/JDT_Core_Programmer_Guide ?

You'll mark it as work in progress and there you go!

> > As for caching NameEnvironments: these can be big structures. Can you
> > convince us that this will not create a heap problem?
> > 
> 
> Well today if you have X number of source files which contains Lambdas in
> the same project, we end up creating X number of NameEnviroments in short
> period of time. When comparing the profile snapshots, the NameEnviroment
> creation takes 35% of the CPU time.

If that caching is specific to indexing of files containing lambdas, then I'm confident the wiki page will convince us of your design :) Do you actually need more than one NameEnvironment at a time?
Comment 94 Gayan Perera CLA 2021-03-23 08:33:55 EDT
(In reply to Stephan Herrmann from comment #93)
> (In reply to Gayan Perera from comment #92)
> > (In reply to Stephan Herrmann from comment #91)
> > > Gayan, after briefly scanning the change, it feels the code is not perfectly
> > > in line with the discussion here in the bug. Since this is a fundamental
> > > change at the core of JDT, I think it's time for some design documentation:
> > > please explain at some detail what is your strategy and where in the code do
> > > we find which part?
> > > 
> > > Perhaps a wiki page is better suited than bugzilla comments, which cannot be
> > > updated as the design evolves.
> > Thats a good idea, How should I create a Wiki page and where should I create
> > ?
> 
> What about a sub page of https://wiki.eclipse.org/JDT_Core_Programmer_Guide ?
> 
> You'll mark it as work in progress and there you go!

I need some help how to start, I mean is this page part of a git repo ? I have never added any content in JDT wiki :).

> 
> > > As for caching NameEnvironments: these can be big structures. Can you
> > > convince us that this will not create a heap problem?
> > > 
> > 
> > Well today if you have X number of source files which contains Lambdas in
> > the same project, we end up creating X number of NameEnviroments in short
> > period of time. When comparing the profile snapshots, the NameEnviroment
> > creation takes 35% of the CPU time.
> 
> If that caching is specific to indexing of files containing lambdas, then
> I'm confident the wiki page will convince us of your design :) Do you
> actually need more than one NameEnvironment at a time?

Well I see your point, today each project is processed one by one, so may be the cache should only keep one NameEnviroment at a given time. So we could simplify this to keep only one NameEnviroment like a Entry with the related project and perform all cleanup as we have done right now. And we can change this later if we decide to parallel the indexing. Shall I proceed with that approach ?
Comment 95 Stephan Herrmann CLA 2021-03-23 09:16:31 EDT
(In reply to Gayan Perera from comment #94)
> (In reply to Stephan Herrmann from comment #93)
> > (In reply to Gayan Perera from comment #92)
> > > (In reply to Stephan Herrmann from comment #91)
> > > > Gayan, after briefly scanning the change, it feels the code is not perfectly
> > > > in line with the discussion here in the bug. Since this is a fundamental
> > > > change at the core of JDT, I think it's time for some design documentation:
> > > > please explain at some detail what is your strategy and where in the code do
> > > > we find which part?
> > > > 
> > > > Perhaps a wiki page is better suited than bugzilla comments, which cannot be
> > > > updated as the design evolves.
> > > Thats a good idea, How should I create a Wiki page and where should I create
> > > ?
> > 
> > What about a sub page of https://wiki.eclipse.org/JDT_Core_Programmer_Guide ?
> > 
> > You'll mark it as work in progress and there you go!
> 
> I need some help how to start, I mean is this page part of a git repo ? I
> have never added any content in JDT wiki :).

It's a stock MediaWiki installation, you may have heard about wikipedia using it :)

To give you a jump start I created an empty page for you: https://wiki.eclipse.org/JDT_Core_Programmer_Guide/MetaIndex and ensured it's listed as a subpage of its parent and in the "JDT" category.

Just make sure you're logged in to eclipse.org, click Edit and there you go. There are links for help on editing, help for formatting is a few clicks away: https://www.mediawiki.org/wiki/Help:Formatting - previewing is your friend.

BTW, mediawiki has it's own version control, much simpler than git actually.
Comment 96 Gayan Perera CLA 2021-03-23 13:25:07 EDT
> It's a stock MediaWiki installation, you may have heard about wikipedia
> using it :)
> 
> To give you a jump start I created an empty page for you:
> https://wiki.eclipse.org/JDT_Core_Programmer_Guide/MetaIndex and ensured
> it's listed as a subpage of its parent and in the "JDT" category.
> 
> Just make sure you're logged in to eclipse.org, click Edit and there you go.
> There are links for help on editing, help for formatting is a few clicks
> away: https://www.mediawiki.org/wiki/Help:Formatting - previewing is your
> friend.
> 
> BTW, mediawiki has it's own version control, much simpler than git actually.

Thanks Stephan
Comment 97 Gayan Perera CLA 2021-03-23 14:58:32 EDT
Updated the wiki Stephan
Comment 98 Stephan Herrmann CLA 2021-03-23 17:01:57 EDT
(In reply to Gayan Perera from comment #97)
> Updated the wiki Stephan

I don't see any content, did you press Save?
Comment 99 Gayan Perera CLA 2021-03-24 02:22:14 EDT
(In reply to Stephan Herrmann from comment #98)
> (In reply to Gayan Perera from comment #97)
> > Updated the wiki Stephan
> 
> I don't see any content, did you press Save?

Yes it said the content was sent to moderatoon. Even I can’t see the content after saving. I can only see when i press edit.
Comment 100 Gayan Perera CLA 2021-03-24 07:06:33 EDT
@Stephan do you think we should change the cache implementation to keep single environment at any given time or do you need more time to provide feedback on it ?
Comment 101 Stephan Herrmann CLA 2021-03-29 16:27:16 EDT
(In reply to Gayan Perera from comment #97)
> Updated the wiki Stephan

Meanwhile the content has passed moderation. I dropped a few comments / questions on its talk/discussion page.
Comment 102 Sebastian Zarnekow CLA 2021-03-30 02:03:59 EDT
Gayan, I read through the provided docs and given the dance with the binding resolution during indexing: Could we completely avoid that if we'd make the MetaIndex based on simple type names rather than package names?

To my understanding, the current approach is to build up a structure like

package -> index

java.util -> id123.index
java.awt -> id137.index
com.acme.container -> id4242.index
com.acme.util -> id777.index

for a source file

package com.acme.container;
import java.util.List;
import java.awt.Component;
abstract class MyIndexedList<T extends Component> implements List<T> {
    static class IndexKey extends com.acme.util.AbstractKey {..}
}

This approach requires (in some cases) type resolution for source files.

Would it make sense to build this based on the simple names instead to avoid the need for binding resolution?

simpleName -> index

List -> 123.index, id137.index, id4242.index
MyIndexedList -> id4242.index
Component -> id137.index
IndexKey -> id4242.index
AbstractKey -> id123.index, id777.index

From my experience, while not being impossible, simple name conflicts are still very rare. As a selection criteria this may be good enough. The upside of this would be, that simple names do appear in the source code. For the example about the Middle / Inner from the docs, no resolution would be necessary at all. The downside of this is apparently that we'll put more keys into the MetaIndex since generally there exist more distinct simple names than package names.

One could even go a step further and index all name fragments in the meta index and select the correct index files based on the intersection for the entire compound name:

java -> id123.index, id137.index
util -> id123.index
awt -> ide137.index
List -> id123.index, id137.index

Now when looking for java.awt.List, we'd ask the meta index for `java`, `awt`, and `List` and only query the intersection of the index files which would allow us to drop the id123.index in this case from the index search.

Just dumping a thought here that most likely was already considered, but I couldn't easily find a trace here in the ticket where this idea was discussed.
Comment 103 Gayan Perera CLA 2021-03-30 02:59:46 EDT
@Sebastian thanks for sharing your ideas they are really valuable. 

I didn’t thought of indexing the fragments as you explained. 

But i did think of using simple names. 

One problem that i see in both these approaches is that then this meta index will be a huge type reference index, that was the reason i did not pursue on the classname -> index mapping approach. 

But i think i can give it a try with a minor modification to the code. Of course we might find false positive i indexes if we use simple names.  But the fragment approach might help to further lower the number of false positives from binary indexes. 

I can give it a try. I would like to have a perf test to measure this improvement. Does anyone have any suggestion how i can do that ?
Comment 104 Stephan Herrmann CLA 2021-03-30 05:22:05 EDT
Thanks Sebastian, for your suggestions.

(In reply to Gayan Perera from comment #103)
> One problem that i see in both these approaches is that then this meta index
> will be a huge type reference index, that was the reason i did not pursue on
> the classname -> index mapping approach. 

We don't know if that's a problem, do we? IMHO It's one of the things worth measuring in real-life experiments: Run a modified indexer on a big, real workspace and report the size of the meta index. 

> But i think i can give it a try with a minor modification to the code. Of
> course we might find false positive i indexes if we use simple names.

I can only again encourage you to accept "false positives" from the meta index. If the meta index has an accuracy as low as 50%, it could still improve performance by 50%, which is a huge gain already! The overall goal is to set up a pipeline of cheap filters that successively reduce the number of potential matches from huge and fuzzy to small and exact.

Only false negatives are not acceptable at all (which could possibly kill the approach using fragment unions - again due to inheritance and such).

> I can give it a try. I would like to have a perf test to measure this
> improvement. Does anyone have any suggestion how i can do that ?

I think we're at a point to appreciate the multiple facets of "performance" in this field:
* how long does indexing take?
* how big will the meta index be?
* how long does a query against the meta index take?
* what do we gain by using the meta index (fraction of indexes we need to query out of total number of indexes)

Measuring total time of certain searches in a big workspace will summarize several of the above, but I believe we get a much better picture if we have more detailed figures. Those will hopefully allow us to make some prediction about how it behaves in different scenarios, not just the one workspace used for measurements.
Comment 105 Gayan Perera CLA 2021-03-30 06:43:36 EDT
@Stephan and @Sebastian i will try to take some detail measurements first with the current solution and then i will implement the new idea and do the same measurements to see if we see a big difference. 

About caching the name environment, i guess we can move to a different issue if we choose not to do additional source resolutions in this issue. Because caching name environment seems to reduce the indexing time on workspace which uses lambda expressions.
Comment 106 Sebastian Zarnekow CLA 2021-03-30 07:37:09 EDT
(In reply to Gayan Perera from comment #105)
> About caching the name environment, i guess we can move to a different issue
> if we choose not to do additional source resolutions in this issue. Because
> caching name environment seems to reduce the indexing time on workspace
> which uses lambda expressions.

I agree - this should give a nice boost and can probably land without the meta index being fully ironed out.

> Only false negatives are not acceptable at all (which could possibly kill the approach using fragment unions - again due to inheritance and such).

True - the fragment approach would only work if we'd do queries for nested types differently from queries for top-level types. 

I'd really like to see the results of an experiment purely based on simple names.
Comment 107 Gayan Perera CLA 2021-03-30 08:29:40 EDT
@Sebastian will push a different gerrit using same base with simple names in meta index. I have a feeling that if we go for fragments we might need to spend more time in meta index search. So lets try with the simple name first. Now i feel that false positives compared to packages vs simple names should be almost same if we look at the bigger picture of this meta index (not only for subtypes)
Comment 108 Gayan Perera CLA 2021-03-30 16:29:41 EDT
switching to SimpleName and searching for Runnable subtypes in STS4 repo workspace was as follows

Master:
execution time: 5578ms - IndexBasedHierarchyBuilder


Simple Name:
execution time: 6253ms - IndexBasedHierarchyBuilder

Package Fragment:
execution time: 1278ms - IndexBasedHierarchyBuilder


It seems the simple name seems to give worst results. But i will do some sampling tomorrow to see where the most time is spent.
Comment 109 Sebastian Zarnekow CLA 2021-03-30 16:38:16 EDT
Interesting. Are there other types in that same workspace with the simple name Runnable“? If not (or only very few) it’s hard to imagine where the differences should come from.
Comment 110 Gayan Perera CLA 2021-03-31 01:15:01 EDT
This is spring tooling 4 workspace. There are lot of subtypes of Runnable and the tree depth is also around 3-4 levels. There is no big difference in index filtering. The simple name solution is picking 5 more indexes in many iterations than package fragment solution. My doubt is its the meta index search which takes time. I will do some sampling today and publish the results.
Comment 111 Gayan Perera CLA 2021-03-31 12:20:31 EDT
Its the metaindex that takes most of the time

org.eclipse.jdt.internal.core.search.indexing.IndexManager.findMatchingIndexNames ()	6 128 ms (45,1 %)	6 128 ms (47,7 %)
- org.eclipse.jdt.internal.core.index.Index.query ()	5 140 ms (37,8 %)	5 140 ms (40 %)
- org.eclipse.jdt.internal.core.index.Index.queryDocumentNames ()	485 ms (3,6 %)	485 ms (3,8 %)
Comment 112 Gayan Perera CLA 2021-04-01 10:31:02 EDT
Hi all @Stephan @Vikas @Sebastian what do you think based on my latest findings. Should we continue with the package fragment approach ?
Comment 113 Sebastian Zarnekow CLA 2021-04-01 10:46:47 EDT
(In reply to Gayan Perera from comment #112)
> Hi all @Stephan @Vikas @Sebastian what do you think based on my latest
> findings. Should we continue with the package fragment approach ?

I'd like to understand better where the slowdown comes from.
If it can only be explained be the larger number of entries in the Meta index, that would be sad.

Why am I so very surprised:
Based on the assumption, that simple name conflicts are rare, the number of index queries that are necessary with the variant based on simple names should be smaller than the number of index queries based on packages.

Let's say we use `java.lang.Runnable`. There are likely not too many other types with the simple name `Runnable` but quite a bunch of other types from `java.lang`. The first query of the MetaIndex based on simple names should yield fewer entries compared to the query based on package fragments.
Driving this further: The next cascade is than based on the simple names of all direct subtypes of Runnable. So likely almost all intentional matches. These will have only very few other subtypes (based on the assumption that people rarely inherit from a concrete Runnable). Based on packages, there will be many more next index candidates since usages of other types from the same package that contains the ConcreteRunnable are more likely. So here we ask with fewer package names, but I'd expect more subsequent index queries.

Into the bargain: What is the cost of the indexing itself if we need fewer resolutions when we base the index on simple names?

To provide more actionable thoughts:
- So you see the same results with both approaches?
- How many steps of index queries are performed? How many indices are queried per nesting level?
- How long does each query of the MetaIndex take? How are these queries submitted? 
- How many entries does the MetaIndex have in both variants?

I think we need a few of these metrics to make an informed decision and draw conclusion for the most promising path forward.
Comment 114 Gayan Perera CLA 2021-04-02 10:29:10 EDT
@Sebastian this is some data analysis i did from verbose output using panda library in python

pcts are percentiles


== SimpleName search ============================
total meta index search time: 4592ms
meta index search pcts (50,80,100)ms: [ 2.  2. 23.]
search iteration index selection pcts (50,80,100)ms: [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100)ms: [578. 578. 578.]
search iteration time pcts (50,80,100)ms: [  2.   3. 108.]
search iteration cumulative time pcts (50,80,100)ms: [  0.   1. 543.]
iteration size: 15894
total time: 6495ms
=================================================
== Package Fragement search =====================
total meta index search time: 485ms
meta index search pcts (50,80,100)ms: [ 1.  1. 30.]
search iteration index selection pcts (50,80,100)ms: [ 18.  29. 553.]
search iteration total indexes pcts (50,80,100)ms: [578. 578. 578.]
search iteration time pcts (50,80,100)ms: [  2.   2. 108.]
search iteration cumulative time pcts (50,80,100)ms: [  1.   3. 693.]
iteration size: 3618
total time: 1452ms
=================================================
== Master search ================================
search iteration time pcts (50,80,100)ms: [  2.   2. 124.]
search iteration cumulative time pcts (50,80,100)ms: [  6.   8. 794.]
iteration size: 7947
total time: 5152ms
=================================================

Seems like the SimpleName search iterations are higher than master. I will check why. And it seems package fragment might have some missing results since it runs a lower number of iterations.

The SimpleName seems to be performing better if the number of iterations are same as master. I will check that
Comment 115 Gayan Perera CLA 2021-04-03 09:20:55 EDT
(In reply to Gayan Perera from comment #114)
> @Sebastian this is some data analysis i did from verbose output using panda
> library in python
> 
> pcts are percentiles
> 
> 
> == SimpleName search ============================
> total meta index search time: 4592ms
> meta index search pcts (50,80,100)ms: [ 2.  2. 23.]
> search iteration index selection pcts (50,80,100)ms: [ 17.  18. 462.]
> search iteration total indexes pcts (50,80,100)ms: [578. 578. 578.]
> search iteration time pcts (50,80,100)ms: [  2.   3. 108.]
> search iteration cumulative time pcts (50,80,100)ms: [  0.   1. 543.]
> iteration size: 15894
> total time: 6495ms
> =================================================
> == Package Fragement search =====================
> total meta index search time: 485ms
> meta index search pcts (50,80,100)ms: [ 1.  1. 30.]
> search iteration index selection pcts (50,80,100)ms: [ 18.  29. 553.]
> search iteration total indexes pcts (50,80,100)ms: [578. 578. 578.]
> search iteration time pcts (50,80,100)ms: [  2.   2. 108.]
> search iteration cumulative time pcts (50,80,100)ms: [  1.   3. 693.]
> iteration size: 3618
> total time: 1452ms
> =================================================
> == Master search ================================
> search iteration time pcts (50,80,100)ms: [  2.   2. 124.]
> search iteration cumulative time pcts (50,80,100)ms: [  6.   8. 794.]
> iteration size: 7947
> total time: 5152ms
> =================================================
> 
> Seems like the SimpleName search iterations are higher than master. I will
> check why. And it seems package fragment might have some missing results
> since it runs a lower number of iterations.
> 
> The SimpleName seems to be performing better if the number of iterations are
> same as master. I will check that


Following are corrected stats

== SimpleName search ============================
total meta index search time: 4592ms
meta index search time(ms) pcts (50,80,100): [ 2.  2. 23.]
search iteration index selection pcts (50,80,100): [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  2.   3. 108.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 543.]
iteration size: 2649
total time: 6495ms
=================================================
== Package Fragement search =====================
total meta index search time: 485ms
meta index search time(ms) pcts (50,80,100): [ 1.  1. 30.]
search iteration index selection pcts (50,80,100): [ 18.  29. 553.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  2.   2. 108.]
search iteration cumulative time(ms) pcts (50,80,100): [  1.   3. 693.]
iteration size: 603
total time: 1452ms
=================================================
== Master search ================================
search iteration time(ms) pcts (50,80,100): [  2.   2. 124.]
search iteration cumulative time(ms) pcts (50,80,100): [  6.   8. 794.]
iteration size: 2649
total time: 5152ms
=================================================

Here we can see the PackageFragment search seems not running the full number of cycles, which mean it might be missing data in this huge tree structure.

if we take master and simplename, they do run the same number of iterations, but the meta index search seems to be high which cause additional overhead over the savings we get from narrowing indexes.
Comment 116 Gayan Perera CLA 2021-04-03 12:28:30 EDT
Did some fine tuning on the meta index search. Previously it was using Index.isMatch which consumed lot of time. By making the search a EXACT & CASE_SENSITIVE i saved few seconds. This search seems to touch lot of indexes in each iteration, you can see it from the index selection percentiles. I will also do a search on another workspace which has lot of indexes, but the subtypes are on the same project.

Following is after improvement above on the same workspace as previous results.

== Improved search ================================
total meta index search time: 1214ms
meta index search time(ms) pcts (50,80,100): [0. 1. 9.]
search iteration index selection pcts (50,80,100): [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  1.   1. 126.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 714.]
iteration size: 2649
total time: 3456ms
=================================================
Comment 117 Gayan Perera CLA 2021-04-03 13:28:51 EDT
Further improved by caching the all document name query during the type hierarchy search

== Improved search ================================
total meta index search time: 531ms
meta index search time(ms) pcts (50,80,100): [0. 0. 9.]
search iteration index selection pcts (50,80,100): [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  1.   1. 127.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 628.]
iteration size: 2649
total time: 2571ms
Comment 118 Gayan Perera CLA 2021-04-03 14:02:03 EDT
Further improved

== Improved search ================================
total meta index search time: 91ms
meta index search time(ms) pcts (50,80,100): [0. 0. 1.]
search iteration index selection pcts (50,80,100): [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  0.   1. 126.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 491.]
iteration size: 2649
total time: 2064ms
=================================================
Comment 119 Eclipse Genie CLA 2021-04-03 14:07:37 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/178816
Comment 120 Gayan Perera CLA 2021-04-03 14:08:42 EDT
@Sebastian @Julian @Vikas please check my new gerrit.

@Jualian i think now we can use this meta index for other search operations as well. You can give it a try.
Comment 121 Sebastian Zarnekow CLA 2021-04-05 09:27:44 EDT
Hey Gayan,

I see that the build was successful. Very nice, excellent job!

Are these still the up-to-date numbers (master vs meta-index based)?
Can you confirm, that the results of the meta-index based query are correct and the same as on master? 

(In reply to Gayan Perera from comment #114)
> == Master search ================================
> [..]
> iteration size: 7947
> total time: 5152ms
> =================================================

(In reply to Gayan Perera from comment #118)
> == Improved search ================================
> [..]
> iteration size: 2649
> total time: 2064ms
> =================================================

I've quickly read the code of the most recent Gerrit which is based on the SimpleNames in the meta-index.

As far as I understand it, we have (at least) two use cases for the meta-index. First, we want to reduce the number of files that are taken into account for all kinds of exact searches for type occurrences in the index. Secondly, the hierarchy search, which is special since it requires multiple searches for each revealed subtype of the root type in the hierarchy. At a first glance, the hierarchy case is equivalent to running the first case in a loop, but it is somewhat different still.

There is a somewhat important distinction to be made though: For arbitrary index searches, we need to reveal all locations of the type that is requested. For the hierarchy search, we only need to take a subset of occurrences into account. Namely the ones that appear in the extends or implements clause (basically super types).

If I understood the change-set correctly, all type occurrences are indexed by their SimpleName in the meta-index. Since "all type occurrences" is a superset of "occurrences in extends or implements clauses", the hierarchy case visits still unnecessary index files.

What if we change the meta-indexing scheme slightly to be still be based on SimpleName, but with a grain of additional information:
Instead of indexing the plain SimpleName as is, we could add an additional character which would be illegal in a simple Java identifier, e.g. '.' to each and every indexed occurrence. If it is an occurrence in an extends or implements clause, we add a second char though, e.g. '$'. Examples:

class MyRunnable implements Runnable {}

Instead of `Runnable`, we index `Runnable.$` (amongst other entries).

class Runner {
  void runNow(Runnable r);
}

Instead of `Runnable`, we index `Runnable.`  (amongst other entries).

Why all this:
The hierarchy case could query the meta index for a more specific subset of occurrences to reduce the number of relevant indexes further.
Other index-searches can use an exact prefix search to find all occurrences:
Meta-query exact prefix for `Runnable.§` vs `Runnable.`.

I suspect that a case sensitive prefix query is almost as fast as an exact query in this case (to be measured). If it is slightly slower, I assume that the number of index files to be scanned for the next layer in the hierarchy will be considerably smaller and therefore it should be beneficial to encode the occurrence in the index key as a suffix to the simple name.

What do you think? Does that makes sense? I'd expect that we'll see a larger boost compared to the search on master with this encoding while still having a very simple indexing logic itself.
Comment 122 Gayan Perera CLA 2021-04-05 12:18:41 EDT
Here is the final stats with latest patch

== SimpleName search ============================
total meta index search time: 91ms
meta index search time(ms) pcts (50,80,100): [0. 0. 1.]
search iteration index selection pcts (50,80,100): [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  0.   1. 126.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 491.]
iteration size: 2649
total time: 2064ms
=================================================
== Package Fragement search =====================
total meta index search time: 485ms
meta index search time(ms) pcts (50,80,100): [ 1.  1. 30.]
search iteration index selection pcts (50,80,100): [ 18.  29. 553.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  2.   2. 108.]
search iteration cumulative time(ms) pcts (50,80,100): [  1.   3. 693.]
iteration size: 603
total time: 1452ms
=================================================
== Master search ================================
search iteration time(ms) pcts (50,80,100): [  2.   2. 124.]
search iteration cumulative time(ms) pcts (50,80,100): [  6.   8. 794.]
iteration size: 2649
total time: 5152ms
=================================================
Comment 123 Gayan Perera CLA 2021-04-05 12:24:29 EDT
(In reply to Sebastian Zarnekow from comment #121)
> 
> What if we change the meta-indexing scheme slightly to be still be based on
> SimpleName, but with a grain of additional information:
> Instead of indexing the plain SimpleName as is, we could add an additional
> character which would be illegal in a simple Java identifier, e.g. '.' to
> each and every indexed occurrence. If it is an occurrence in an extends or
> implements clause, we add a second char though, e.g. '$'. Examples:
> 
> 
> What do you think? Does that makes sense? I'd expect that we'll see a larger
> boost compared to the search on master with this encoding while still having
> a very simple indexing logic itself.

Hey @Sebastian,

Yes this was something i had planned for my previous patch, but i like your suggestion. May be instead of using prefixes in qualifications, we can have two categories,
- SUPER_QUALIFICATIONS : this contains extends and implements usages
- REFERENCE_QUALIFICATIONS : this contains field, method (return and parameter types), type parameters etc.

This way the index search will be much efficient since we will only search a single category table.

WDYT ?
Comment 124 Sebastian Zarnekow CLA 2021-04-05 12:32:15 EDT
(In reply to Gayan Perera from comment #123)
> Hey @Sebastian,
> 
> Yes this was something i had planned for my previous patch, but i like your
> suggestion. May be instead of using prefixes in qualifications, we can have
> two categories,
> - SUPER_QUALIFICATIONS : this contains extends and implements usages
> - REFERENCE_QUALIFICATIONS : this contains field, method (return and
> parameter types), type parameters etc.
> 

I thought that it would be great if we would find the SUPER_QUALIFICATIONS and the 
REFERENCE_QUALIFICATIONS together with a single query if we'd use the prefixes. But it was only an idea - might be, that the exact queries are way faster in which case it would be beneficial to have two categories.

Searching for REFERENCE_QUALIFICATIONS certainly has to reveal also the SUPER_QUALIFICATIONS.

> This way the index search will be much efficient since we will only search a
> single category table.

I wasn't aware the a case sensitive prefix search would be much less efficient.

In any case: certainly worth a try.
Comment 125 Gayan Perera CLA 2021-04-05 14:53:45 EDT
The case sensitive search with prefix will be same as what we have now. But since we will run through all type to find the matches in a single category table, the meta index search time will be same as today + we need to do some type naming mangling at index time and search type (on qualifier).

if we use categories, we can narrow the number of keys that the meta index needs to search based on the category we are interested and we don't need to play with type names.

I will modify the patch tomorrow and may be add some tests for other type of search operations to see if they are also working as expected with meta index when a qualifier is given. The down side is we need to make the qualifier composite like CATEGORY:QUALIFIER, if the composite qualifier doesn't match we can ignore the qualifier and log.
Comment 126 Gayan Perera CLA 2021-04-07 05:42:10 EDT
@Sebastian @Vikas @Stephan the simple name index gives better results for more unique class names. But if you have class names like Value, List, Tag, Collection etc the hierarchy build time is really high, because we end up hitting lot of matches for these type of names, specially in j2ee projects where you have xmlbinding, hibernate, spring etc. 

Using package fragment approach provided consistent response times regards the type name is unique or not. 

I actually search in a private workspace of mine where it took 12sec for simple name and 1 sec for package fragment based searches. 

With the name environment caching and more and more projects switching to use lambdas do you think source resolution becomes a big problem ? We could of course do the resolution as the last resort if we can figure out from the imports and current package we are in.

The same private workspace in intellij provide the same type hierarchy in less that 1 sec. which is very similar to package fragment solution we have. 

WDYT ?
Comment 127 Sebastian Zarnekow CLA 2021-04-07 06:00:19 EDT
(In reply to Gayan Perera from comment #126)
> I actually search in a private workspace of mine where it took 12sec for
> simple name and 1 sec for package fragment based searches. 


Are these times based on the current patch or based on different qualifiers for super-references and arbitrary references?
Comment 128 Gayan Perera CLA 2021-04-07 06:23:57 EDT
(In reply to Sebastian Zarnekow from comment #127)
> (In reply to Gayan Perera from comment #126)
> > I actually search in a private workspace of mine where it took 12sec for
> > simple name and 1 sec for package fragment based searches. 
> 
> 
> Are these times based on the current patch or based on different qualifiers
> for super-references and arbitrary references?

Based on super vs arbitrary qualifiers. The main problem is indexbasedhierarchybuilder spends time on finding subtypes for classes which we don’t want, but they do match the search because they have same simple name.
Comment 129 Sebastian Zarnekow CLA 2021-04-07 06:38:50 EDT
(In reply to Gayan Perera from comment #128)
> (In reply to Sebastian Zarnekow from comment #127)
> > (In reply to Gayan Perera from comment #126)
> > > I actually search in a private workspace of mine where it took 12sec for
> > > simple name and 1 sec for package fragment based searches. 
> > 
> > 
> > Are these times based on the current patch or based on different qualifiers
> > for super-references and arbitrary references?
> 
> Based on super vs arbitrary qualifiers. The main problem is
> indexbasedhierarchybuilder spends time on finding subtypes for classes which
> we don’t want, but they do match the search because they have same simple
> name.

I wonder how this can lead to a 1200% runtime cost.
Assuming you are looking for subtypes of java.util.List:
- First query by simple name reveals all index files that have subtypes of java.awt.List and java.util.List. So we have "some" unnecessary indexes to scan but those would not lead to more wrong indexes to be inspected on the next hierarchy layer. 

Can you describe the shape of your type hierarchy? Nesting depth? Number of simple name collisions per type / layer? Total number of valid hits?

So far it's hard to make an educated guess on the reasons why it is so much slower.
Comment 130 Gayan Perera CLA 2021-04-07 08:00:37 EDT
(In reply to Sebastian Zarnekow from comment #129)
> (In reply to Gayan Perera from comment #128)
> > (In reply to Sebastian Zarnekow from comment #127)
> I wonder how this can lead to a 1200% runtime cost.
> Assuming you are looking for subtypes of java.util.List:
> - First query by simple name reveals all index files that have subtypes of
> java.awt.List and java.util.List. So we have "some" unnecessary indexes to
> scan but those would not lead to more wrong indexes to be inspected on the
> next hierarchy layer. 
> 
> Can you describe the shape of your type hierarchy? Nesting depth? Number of
> simple name collisions per type / layer? Total number of valid hits?
> 
> So far it's hard to make an educated guess on the reasons why it is so much
> slower.

Its looks like this

The type i'm searching for it a.b.c.Value and the hierarchy for that looks like this

a.b.c.Value
|-- a.b.c.ValueA
            |-- a.b.c.ValueAImpl
|-- a.b.c.ValueB
            |-- a.b.c.ValueBImpl
|-- a.b.c.ValueC
            |-- a.b.c.ValueCImpl
|-- a.b.c.ValueImpl
            |-- a.b.c.ValueAImpl
            |-- a.b.c.ValueBImpl
            |-- a.b.c.ValueCImpl


Now when we look for simple name type in works space which match "Value" I get nearly 70 classes + interfaces. And some of these classes has their own hierarchies as well.

So as you can see since we don't use a qualification but only simple name, the hierarchy expansion can further expand with unwanted types when they have more and more subtypes.

But if we used package fragment, I will be only searching on indexes where a.b.c is used. And in a situation like mind I will be only searching on the index for my project only where the expansion will be happening.


Hope you understand why it takes more time with the simple name
Comment 131 Sebastian Zarnekow CLA 2021-04-07 08:46:04 EDT
(In reply to Gayan Perera from comment #130)
> a.b.c.Value
> |-- a.b.c.ValueA
>             |-- a.b.c.ValueAImpl
> |-- a.b.c.ValueB
>             |-- a.b.c.ValueBImpl
> |-- a.b.c.ValueC
>             |-- a.b.c.ValueCImpl
> |-- a.b.c.ValueImpl
>             |-- a.b.c.ValueAImpl
>             |-- a.b.c.ValueBImpl
>             |-- a.b.c.ValueCImpl
> 
> 
> Now when we look for simple name type in works space which match "Value" I
> get nearly 70 classes + interfaces. And some of these classes has their own
> hierarchies as well.

Does that mean that you have 70+ types that do have the exact simple name `Value` and that do have subtypes across the entire "population" of indices available? Or do you perform a prefix search instead of an exact search? Or is it the case that you have multiple `ValueA`, `ValueB`, `ValueAImpl` and the number of searched indexes is so big because of that?
Comment 132 Gayan Perera CLA 2021-04-07 09:03:39 EDT
(In reply to Sebastian Zarnekow from comment #131)
> Does that mean that you have 70+ types that do have the exact simple name
> `Value` and that do have subtypes across the entire "population" of indices
> available? Or do you perform a prefix search instead of an exact search? Or
> is it the case that you have multiple `ValueA`, `ValueB`, `ValueAImpl` and
> the number of searched indexes is so big because of that?


Yes in my workspace i have 70+ types which has the exact name Value, most of them are coming from jackson, xstream, apache.commons, hibernate etc.

Now the initial search will first try to find all indexes which has the simple name of "Value" which will endup with indexes

- jackson
- apache.common
- xstream
- hibernate
- myproject

and some more if you have subtypes of such 70+ Value classes

Then the SubType search will search for types which has "Value" as the super class or interface.

This will give class such as

- ValueABC from hibernate index
- ValueTYX from jackson
- ValueA, ValueB, ValueC, ValueImpl from myproject

Then the subtype search will use all these types, ValueABC, ValueTYX, ValueA, ValueB, ValueC, ValueImpl to find their subtypes.

Now you see how it grows right ?
Comment 133 Sebastian Zarnekow CLA 2021-04-07 09:50:27 EDT
(In reply to Gayan Perera from comment #132)
> This will give class such as
> 
> - ValueABC from hibernate index
> - ValueTYX from jackson
> - ValueA, ValueB, ValueC, ValueImpl from myproject
> 
> Then the subtype search will use all these types, ValueABC, ValueTYX,
> ValueA, ValueB, ValueC, ValueImpl to find their subtypes.
> 
> Now you see how it grows right ?

If you are searching for subtypes of myProject.Value - why does it continue to query for hibernates ValueABC if that is not a subtype of myProject.Value?
Comment 134 Gayan Perera CLA 2021-04-07 09:55:20 EDT
(In reply to Sebastian Zarnekow from comment #133)
> (In reply to Gayan Perera from comment #132)
> > This will give class such as
> > 
> > - ValueABC from hibernate index
> > - ValueTYX from jackson
> > - ValueA, ValueB, ValueC, ValueImpl from myproject
> > 
> > Then the subtype search will use all these types, ValueABC, ValueTYX,
> > ValueA, ValueB, ValueC, ValueImpl to find their subtypes.
> > 
> > Now you see how it grows right ?
> 
> If you are searching for subtypes of myProject.Value - why does it continue
> to query for hibernates ValueABC if that is not a subtype of myProject.Value?

That is because the TypeHierarchy search use Workspacescope for searching. And event if we use ProjectScope, if those hibernate etc is my dependencies it ends up searching those types as well.

The reason why is search for other types is because the type resolution is done after the pattern matching. So basically even we find 1000 odd types in the pattern search job, all those are reduce based on focus type resolution which is done after the pattern search.

Basically JDT search first do a pattern search to find potential matches and then do a type resolution on them to narrow is down to accurate matches.
Comment 135 Sebastian Zarnekow CLA 2021-04-07 10:29:46 EDT
(In reply to Gayan Perera from comment #134)
> (In reply to Sebastian Zarnekow from comment #133)
> > (In reply to Gayan Perera from comment #132)
> > > This will give class such as
> > > 
> > > - ValueABC from hibernate index
> > > - ValueTYX from jackson
> > > - ValueA, ValueB, ValueC, ValueImpl from myproject
> > > 
> > > Then the subtype search will use all these types, ValueABC, ValueTYX,
> > > ValueA, ValueB, ValueC, ValueImpl to find their subtypes.
> > > 
> > > Now you see how it grows right ?
> > 
> > If you are searching for subtypes of myProject.Value - why does it continue
> > to query for hibernates ValueABC if that is not a subtype of myProject.Value?
> 
> That is because the TypeHierarchy search use Workspacescope for searching.
> And event if we use ProjectScope, if those hibernate etc is my dependencies
> it ends up searching those types as well.
> 
> The reason why is search for other types is because the type resolution is
> done after the pattern matching. So basically even we find 1000 odd types in
> the pattern search job, all those are reduce based on focus type resolution
> which is done after the pattern search.
> 
> Basically JDT search first do a pattern search to find potential matches and
> then do a type resolution on them to narrow is down to accurate matches.

Oh, that's sad.

Thank you for explaining this. I was living under the assumption that there would be a processing pipeline like -> candidates -> filter by resolution -> more candidates from the next layer. But in fact it is first creating a big result set to filter that one further. Understood.
Comment 136 Gayan Perera CLA 2021-04-08 04:03:23 EDT
@Sebastian @Vikas @Stephan should we drop the simple name solution and continue on package fragment solution. I have added support to track information other than super types to expand this meta index for other kind of search. I have not committed the code yet. 

But when i tried indexing with the cached environment it seems that resolving source when needed is not effecting index time.
Comment 137 Sebastian Zarnekow CLA 2021-04-08 04:26:56 EDT
(In reply to Gayan Perera from comment #136)
> @Sebastian @Vikas @Stephan should we drop the simple name solution and
> continue on package fragment solution.

Probably yes.

>  I have added support to track
> information other than super types to expand this meta index for other kind
> of search. I have not committed the code yet. 
> 
> But when i tried indexing with the cached environment it seems that
> resolving source when needed is not effecting index time.

What about the other extreme and *always* resolve during indexing to put the fully qualified name instead of the package fragment into the meta index?
Comment 138 Sebastian Zarnekow CLA 2021-04-08 04:29:08 EDT
(In reply to Sebastian Zarnekow from comment #137)
> What about the other extreme and *always* resolve during indexing to put the
> fully qualified name instead of the package fragment into the meta index?

Or we could work with a combination of both: Use the package fragments (with resolution only if strictly necessary) and the simple names, too. That way we should get more precise results from the meta index, too.

Just dumping ideas here since the final numbers are still not as great as I had hoped for.
Comment 139 Stephan Herrmann CLA 2021-04-08 04:59:15 EDT
Re simple names: I wouldn't yet rule out this approach completely. We've seen a worst case where it doesn't help much, but that doesn't mean there's no benefit.

Re resolving: to make an educated decision we'd then need measurements of indexing times with / without resolving.

If relevant we could consider yet another combination: when indexes are built due to a user request, use current method without resolving and mark the index as unresolved. When the IDE is idle (above some threshold) rebuild indexes with resolving - one by one.

Or, could the meta index "borrow" information from org.eclipse.jdt.internal.core.builder.State and org.eclipse.jdt.internal.core.builder.ReferenceCollection:
* when is that information available?
* would that information help?
* how can it be accessed?
  * use it as a shortcut only when available - existing solution as fallback?
Comment 140 Gayan Perera CLA 2021-04-08 06:32:33 EDT
(In reply to Sebastian Zarnekow from comment #137)
> (In reply to Gayan Perera from comment #136)
> > @Sebastian @Vikas @Stephan should we drop the simple name solution and
> > continue on package fragment solution.
> 
> Probably yes.
> 
> >  I have added support to track
> > information other than super types to expand this meta index for other kind
> > of search. I have not committed the code yet. 
> > 
> > But when i tried indexing with the cached environment it seems that
> > resolving source when needed is not effecting index time.
> 
> What about the other extreme and *always* resolve during indexing to put the
> fully qualified name instead of the package fragment into the meta index?

One downside using FQN in metaindex is that we will have more finegrain index entries than using package fragments and simple names.
Comment 141 Gayan Perera CLA 2021-04-08 06:34:48 EDT
(In reply to Sebastian Zarnekow from comment #138)
> (In reply to Sebastian Zarnekow from comment #137)
> > What about the other extreme and *always* resolve during indexing to put the
> > fully qualified name instead of the package fragment into the meta index?
> 
> Or we could work with a combination of both: Use the package fragments (with
> resolution only if strictly necessary) and the simple names, too. That way
> we should get more precise results from the meta index, too.
> 
> Just dumping ideas here since the final numbers are still not as great as I
> had hoped for.

Sure we can see how we can use both, because for binary indexes we always have a qualified references, this problem is there only for source indexes. So may be we could treat these two differently.

But before that i would like to produce some stats about indexing with and with resolution to see how much of a difference we are looking at. Because if its like 1-2 sec difference i don't think it might be a big issue.
Comment 142 Gayan Perera CLA 2021-04-08 06:37:01 EDT
(In reply to Stephan Herrmann from comment #139)
> Re simple names: I wouldn't yet rule out this approach completely. We've
> seen a worst case where it doesn't help much, but that doesn't mean there's
> no benefit.
> 
> Re resolving: to make an educated decision we'd then need measurements of
> indexing times with / without resolving.
> 
> If relevant we could consider yet another combination: when indexes are
> built due to a user request, use current method without resolving and mark
> the index as unresolved. When the IDE is idle (above some threshold) rebuild
> indexes with resolving - one by one.

This will complex the IDE indexing mechanism and the functionality based on the indexes as well isn't it Stephan ?

> 
> Or, could the meta index "borrow" information from
> org.eclipse.jdt.internal.core.builder.State and
> org.eclipse.jdt.internal.core.builder.ReferenceCollection:
> * when is that information available?
> * would that information help?
> * how can it be accessed?
>   * use it as a shortcut only when available - existing solution as fallback?

But will the builder state will hold information for all source projects ? i assume it will only keep information about the recent built project right ?
Comment 143 Stephan Herrmann CLA 2021-04-08 09:29:33 EDT
(In reply to Gayan Perera from comment #142)
> (In reply to Stephan Herrmann from comment #139)
> > Or, could the meta index "borrow" information from
> > org.eclipse.jdt.internal.core.builder.State and
> > org.eclipse.jdt.internal.core.builder.ReferenceCollection:
> > * when is that information available?
> > * would that information help?
> > * how can it be accessed?
> >   * use it as a shortcut only when available - existing solution as fallback?
> 
> But will the builder state will hold information for all source projects ? i
> assume it will only keep information about the recent built project right ?

I just dug out a VERY interesting read for you: bug 102279 
Please scan the discussion in the bug and have a look at commit https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=30b862416cba759ca39b029e0f535018291e11dc
Comment 144 Gayan Perera CLA 2021-04-08 09:34:18 EDT
(In reply to Stephan Herrmann from comment #143) 
> I just dug out a VERY interesting read for you: bug 102279 
> Please scan the discussion in the bug and have a look at commit
> https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/
> ?id=30b862416cba759ca39b029e0f535018291e11dc

Sure @Stephan i will have a look
Comment 145 Gayan Perera CLA 2021-04-08 09:36:42 EDT
I ran both master and package fragment branch of my solution to measure indexing time. I captured index time by using visualvm sampling with 20ms rate. 

Master indexing: 352730ms
My branch indexing: 250633ms

My branch contains resolutions with named environment caching.
Comment 146 Stephan Herrmann CLA 2021-04-08 09:55:02 EDT
(In reply to Gayan Perera from comment #145)
> I ran both master and package fragment branch of my solution to measure
> indexing time. I captured index time by using visualvm sampling with 20ms
> rate. 
> 
> Master indexing: 352730ms
> My branch indexing: 250633ms
> 
> My branch contains resolutions with named environment caching.

So, in both cases numbers are way too high.

It's hard to believe that indexing with resolving should be faster than indexing without resolving (where for the latter case NameEnvironments should rarely be used). For that reason a bit more background would help us understand your measurement: 
* How did you invoke indexing? 
* Was the workspace warmed up (e.g., by performing the same operation several times)?
* What is the impact of name environment caching (compare your branch with vs without NE caching - what about master plus NE caching)?
Comment 147 Stephan Herrmann CLA 2021-04-08 10:03:28 EDT
(In reply to Stephan Herrmann from comment #143)
> I just dug out a VERY interesting read for you: bug 102279 
> Please scan the discussion in the bug and have a look at commit
> https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/
> ?id=30b862416cba759ca39b029e0f535018291e11dc

Since all this is about searching, here is the pretty short search path towards the above:

Given: dependency information is maintained in org.eclipse.jdt.internal.core.builder.State

1) Search for references to the above class

2) Locate the single search hit in source folder "search"

3) Team > Show Revision Information

Voila: bug 102279 :)

Unfortunatly, English doesn't have a succinct equivalent of the German "Hilfe zur Selbsthilfe" :)
Comment 148 Andrey Loskutov CLA 2021-04-08 10:27:09 EDT
(In reply to Stephan Herrmann from comment #147)
> Unfortunatly, English doesn't have a succinct equivalent of the German
> "Hilfe zur Selbsthilfe" :)

But Russian: "Спасение утопающих - дело рук самих утопающих" :-)
Comment 149 Gayan Perera CLA 2021-04-08 10:30:06 EDT
(In reply to Stephan Herrmann from comment #146)
> (In reply to Gayan Perera from comment #145)
> > I ran both master and package fragment branch of my solution to measure
> > indexing time. I captured index time by using visualvm sampling with 20ms
> > rate. 
> > 
> > Master indexing: 352730ms
> > My branch indexing: 250633ms
> > 
> > My branch contains resolutions with named environment caching.
> 
> So, in both cases numbers are way too high.
> 
> It's hard to believe that indexing with resolving should be faster than
> indexing without resolving (where for the latter case NameEnvironments
> should rarely be used). For that reason a bit more background would help us
> understand your measurement: 
> * How did you invoke indexing? 
> * Was the workspace warmed up (e.g., by performing the same operation
> several times)?
> * What is the impact of name environment caching (compare your branch with
> vs without NE caching - what about master plus NE caching)?


This is the normal time for my workspace since i have large multi module projects with lot of dependencies. 

I invoke indexing from preference window by rebuild index. 

I do rebuild as soon as eclipse is started so its a cold eclipse instance. 

Same with without name environment caching is : 986019ms

Sampling time with 20ms rate in visualvm
Comment 150 Stephan Herrmann CLA 2021-04-08 11:39:15 EDT
(In reply to Andrey Loskutov from comment #148)
> (In reply to Stephan Herrmann from comment #147)
> > Unfortunatly, English doesn't have a succinct equivalent of the German
> > "Hilfe zur Selbsthilfe" :)
> 
> But Russian: "Спасение утопающих - дело рук самих утопающих" :-)

That's not exactly what I meant :)
Comment 151 Gayan Perera CLA 2021-04-08 12:23:59 EDT
More information about how much cost on indexResolvedDocument is

master : 220596ms
package_fragment_no_caching : 396591ms
package_fragment_with_caching : 31864ms


So you see the named environment caching is helping a lot on indexing resolved documents. 

I'm not sure if we really want to go for more complex solutions for this or depend on some like build state which is not reliable according the org.eclipse.jdt.internal.core.search.IndexSelector.canSeeFocus(IJavaElement, JavaProject, char[][][]).

If we got in that direction we will get bugs like "time to time the type heirarchy build time fluctuate vastly"

Because in my workspace which i have given the above stats,

TH build times looks like this:

== SimpleName search ============================
total meta index search time: 222ms
meta index search time(ms) pcts (50,80,100): [0. 0. 2.]
search iteration index selection pcts (50,80,100): [ 16.  17. 132.]
search iteration total indexes pcts (50,80,100): [580. 580. 580.]
search iteration time(ms) pcts (50,80,100): [ 0.  1. 68.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   0. 258.]
iteration size: 21872
total time: 13229ms
=================================================
== Package Fragement search =====================
total meta index search time: 5ms
meta index search time(ms) pcts (50,80,100): [0. 0. 1.]
search iteration index selection pcts (50,80,100): [18. 19. 19.]
search iteration total indexes pcts (50,80,100): [572. 572. 572.]
search iteration time(ms) pcts (50,80,100): [1. 1. 6.]
search iteration cumulative time(ms) pcts (50,80,100): [0.  0.6 9. ]
iteration size: 33
total time: 269ms
=================================================
== Master search ================================
search iteration time(ms) pcts (50,80,100): [  1.   1. 127.]
search iteration cumulative time(ms) pcts (50,80,100): [  6.   7. 858.]
iteration size: 21872
total time: 28997ms
=================================================

You can clearly see the difference right ? Both SimpleName and Master seems to run same number if iterations due to they find unwanted types which they drop later as i have described to Sebastian earlier. The simple name has a reduce time because the number indexes it search in each iteration are less.

But see the difference in package fragment search. it always runs with nearly constant amount of indexes and the iterations are very closer to the number of classes i find in the hierarchy. And the time i see is same as in IJ and Eclipse take much much less time in indexing compared to IJ for the same workspace.

Let me know what you think so I can continue on once path. For me i think the package fragment approach gives better results and less index times with more accurate results.
Comment 152 Stephan Herrmann CLA 2021-04-08 13:00:45 EDT
(In reply to Gayan Perera from comment #151)
> More information about how much cost on indexResolvedDocument is
> 
> master : 220596ms
> package_fragment_no_caching : 396591ms
> package_fragment_with_caching : 31864ms
> 
> 
> So you see the named environment caching is helping a lot on indexing
> resolved documents. 

I see, and I think there was agreement that NE caching is worthwhile on its own right. Did you file a separate bug for that? I'd prefer to focus on one improvement at a time. If we bring NE caching to master first, that topic would be covered once and for all. 

OTOH I'm not 100% sure what the above figures mean. 
- this is about indexing not searching, right?
- is master actually resolving a huge number of CUs, or did you force resolving somehow?
- (what would be the result on master with NE caching?)

> I'm not sure if we really want to go for more complex solutions for this or
> depend on some like build state which is not reliable according the
> org.eclipse.jdt.internal.core.search.IndexSelector.canSeeFocus(IJavaElement,
> JavaProject, char[][][]).

If unreliable state is the reason for not choosing this approach, then we should have a closer look at what kind of unreliability this is. People didn't see it as a reason against the solution in bug 102279.
The complexity could actually be quite low, given that we have a precedent. Read: Perhaps the State could fulfill the role of the meta index without much ado.


> Because in my workspace which i have given the above stats,
> 
> TH build times looks like this:

numbers look nice, just keep in mind, the solution has to be good for all use cases, not just your use case which I consider extreme (as mentioned above).

And to make sure we're on the same page: in this experiment "package fragment" means you are using a meta index of package fragments built from resolved compilation units? If resolved, why say package *fragment*? Isn't it complete package names? (and IPackageFragment is a different concept, still :) ).
Comment 153 Gayan Perera CLA 2021-04-08 13:34:23 EDT
(In reply to Stephan Herrmann from comment #152)
> (In reply to Gayan Perera from comment #151)
> > More information about how much cost on indexResolvedDocument is
> > 
> > master : 220596ms
> > package_fragment_no_caching : 396591ms
> > package_fragment_with_caching : 31864ms
> > 
> > 
> > So you see the named environment caching is helping a lot on indexing
> > resolved documents. 
> 
> I see, and I think there was agreement that NE caching is worthwhile on its
> own right. Did you file a separate bug for that? I'd prefer to focus on one
> improvement at a time. If we bring NE caching to master first, that topic
> would be covered once and for all.
No i have not reported a separate bug for it yet to merge into master.
 
 
> OTOH I'm not 100% sure what the above figures mean. 
> - this is about indexing not searching, right?
> - is master actually resolving a huge number of CUs, or did you force
> resolving somehow?
> - (what would be the result on master with NE caching?)
I have not added NE on master to see the difference, but you can get an rough idea about it from the above state. My workspace has lambdas a lot, so it does lot of resolving. 

> 
> > I'm not sure if we really want to go for more complex solutions for this or
> > depend on some like build state which is not reliable according the
> > org.eclipse.jdt.internal.core.search.IndexSelector.canSeeFocus(IJavaElement,
> > JavaProject, char[][][]).
> 
> If unreliable state is the reason for not choosing this approach, then we
> should have a closer look at what kind of unreliability this is. People
> didn't see it as a reason against the solution in bug 102279.
> The complexity could actually be quite low, given that we have a precedent.
> Read: Perhaps the State could fulfill the role of the meta index without
> much ado.

if we use this approach we endup running this on each iteration to find the indexes to narrow down based on dependencies, isn't it much costly than the using a proper meta index. And canSeeFocus is based on IJavaElement, and during each iteration of hierarchy builder we don't have JavaElements for found types in each level to perform this again and again.

I mean by unreliable from the code it self, canBeSeen is evaluated only if build state is present. otherwise it just returns PROJECT_CAN_SEE_FOCUS as a fallback. So if the project is not built after opening a workspace, we end up is searching lot of indexes unwanted indexes. i can see from my results it self the IndexSelector has 580 total indexes after filtering through canBeSeen which should be less if it was working as we thought it should be.
 
Thread[ModalContext,6,main] -> selected 26 indexes out of total indexes 580 after qualify filtering - org.eclipse.jdt.internal.core.search.IndexSelector@114fdca4

the 580 is the indexes that IndexSelector selected after running running through org.eclipse.jdt.internal.core.search.IndexSelector.initializeIndexLocations()







> 
> 
> > Because in my workspace which i have given the above stats,
> > 
> > TH build times looks like this:
> 
> numbers look nice, just keep in mind, the solution has to be good for all
> use cases, not just your use case which I consider extreme (as mentioned
> above).
> 
> And to make sure we're on the same page: in this experiment "package
> fragment" means you are using a meta index of package fragments built from
> resolved compilation units? If resolved, why say package *fragment*? Isn't
> it complete package names? (and IPackageFragment is a different concept,
> still :) ).
Comment 154 Gayan Perera CLA 2021-04-08 13:38:21 EDT
> 
> 
> > Because in my workspace which i have given the above stats,
> > 
> > TH build times looks like this:
> 
> numbers look nice, just keep in mind, the solution has to be good for all
> use cases, not just your use case which I consider extreme (as mentioned
> above).

My workspace i used is a type hierarchy which is only in a single project index. I also have posted data from a difference workspace which gives results trying to build a hierarchy for java.lang.Runnable which touches lot of source and binary indexes. i can post them as well.




> 
> And to make sure we're on the same page: in this experiment "package
> fragment" means you are using a meta index of package fragments built from
> resolved compilation units? If resolved, why say package *fragment*? Isn't
> it complete package names? (and IPackageFragment is a different concept,
> still :) ).

Yes it's complete package names. Thanks for pointing out :).
Comment 155 Gayan Perera CLA 2021-04-08 14:43:14 EDT
I looked at more closer to package meta index solution, it seems there are some scenarios i have not covered on the resolution aspect. For example deeply nested anon classes in methods. if such class required resolution we need to traverse the whole resolved CU and each statement to be sure we capture all.

Thanks @Stephan for always opening my eyes by being so hard to please :). Its really good you ask those questions, the reason is you have more gutt experience in JDT codebase that you know something is fishy.

I will try to see if we can narrow down our simple name solution with more predicates if possible.
Comment 156 Gayan Perera CLA 2021-04-11 08:06:56 EDT
@Stephan and @Sebastian i found an issue while trying have qualified names for binary types and simple names for source types. But i problem with that approach is to resolve qualified type reference for lambda expressions in binary types. Is there a way we can resolve methods returning lambda types and variables which are initialized with lambda types ? 

We can also use both super and normal ref reference for super type search. But then we will most of the time search for both types of references. Which brings us the question do we need to split the meta index reference type as super and normal.
Comment 157 Gayan Perera CLA 2021-04-11 09:35:18 EDT
(In reply to Gayan Perera from comment #156)
> @Stephan and @Sebastian i found an issue while trying have qualified names
> for binary types and simple names for source types. But i problem with that
> approach is to resolve qualified type reference for lambda expressions in
> binary types. Is there a way we can resolve methods returning lambda types
> and variables which are initialized with lambda types ? 
> 
> We can also use both super and normal ref reference for super type search.
> But then we will most of the time search for both types of references. Which
> brings us the question do we need to split the meta index reference type as
> super and normal.

Did some more digging and found out lambda expressions doesn’t come as part of super type reference search at all in binary indexes. Because for lambda there are no super-refs generation. So we don’t need to support it in meta index either for now. Will report a seperate issue for that to digg in more of the possibility.
Comment 158 Gayan Perera CLA 2021-04-12 14:06:08 EDT
@Stephan and @Sebastian the latest patchset on simple name based solution stats looks as follows

== Master search ================================
search iteration time(ms) pcts (50,80,100): [  2.   2. 124.]
search iteration cumulative time(ms) pcts (50,80,100): [  6.   8. 794.]
iteration size: 2649
total time: 5152ms
=================================================

== Previous SimpleName search ============================
total meta index search time: 91ms
meta index search time(ms) pcts (50,80,100): [0. 0. 1.]
search iteration index selection pcts (50,80,100): [ 17.  18. 462.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  0.   1. 126.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 491.]
iteration size: 2649
total time: 2064ms
=================================================

== Improved SimpleName  search ================================
total meta index search time: 57ms
meta index search time(ms) pcts (50,80,100): [0. 0. 1.]
search iteration index selection pcts (50,80,100): [ 17.  18. 182.]
search iteration total indexes pcts (50,80,100): [578. 578. 578.]
search iteration time(ms) pcts (50,80,100): [  1.   1. 100.]
search iteration cumulative time(ms) pcts (50,80,100): [  0.   1. 451.]
iteration size: 760
total time: 1153ms
=================================================
Comment 159 Gayan Perera CLA 2021-04-13 05:46:05 EDT
Checked the latest patch with my other workspace where the whole subtype tree was in one project. And the latest change gives me instant results which is around 298ms.
Comment 160 Sebastian Zarnekow CLA 2021-04-13 05:50:58 EDT
(In reply to Gayan Perera from comment #159)
> Checked the latest patch with my other workspace where the whole subtype
> tree was in one project. And the latest change gives me instant results
> which is around 298ms.

Nice nice!

Do you still see potential for the workspace with the large number of simple name conflicts?

I can imagine that it will greatly improve performance if the flow for the type hierarchy would be adjusted such that intermediate layers are resolved and only the matching subtypes are used further along to find the next level in the hierarchy. But that'd be likely an own ticket.
Comment 161 Sebastian Zarnekow CLA 2021-04-13 05:54:04 EDT
@Gayan
Would you please be so kind and align the content of the wiki page with your latest findings and briefly outline the differences between the various attempts that you made.

Meanwhile I plan to read through the code and provide feedback.
Comment 162 Gayan Perera CLA 2021-04-13 05:55:12 EDT
(In reply to Sebastian Zarnekow from comment #160)
> 
> Nice nice!
> 
> Do you still see potential for the workspace with the large number of simple
> name conflicts?
> 
> I can imagine that it will greatly improve performance if the flow for the
> type hierarchy would be adjusted such that intermediate layers are resolved
> and only the matching subtypes are used further along to find the next level
> in the hierarchy. But that'd be likely an own ticket.

The last result was in the workspace with name conflicts. What i did was, i use qualified names for binary indexes and simple name for source indexes. In workspace the number of binary indexes are higher most of the time compared to number of source indexes. So this should give better results most of the time. 

Please check the patch and let me know your feedback.
Comment 163 Gayan Perera CLA 2021-04-13 05:56:16 EDT
(In reply to Sebastian Zarnekow from comment #161)
> @Gayan
> Would you please be so kind and align the content of the wiki page with your
> latest findings and briefly outline the differences between the various
> attempts that you made.
> 
> Meanwhile I plan to read through the code and provide feedback.

Sure i will do it in the evening with latest solution and how we ended up there.
Comment 164 Gayan Perera CLA 2021-04-16 14:50:54 EDT
@Sebastian @Stephan @Vikas i have update the wiki page as well, please look at the patch and let me know if something more needs to be changed, i have done the changes that Stebastian as pointed out.
Comment 165 Sebastian Zarnekow CLA 2021-04-28 08:37:27 EDT
(In reply to Gayan Perera from comment #164)
> @Sebastian @Stephan @Vikas i have update the wiki page as well, please look
> at the patch and let me know if something more needs to be changed, i have
> done the changes that Stebastian as pointed out.

Sorry for radio silence the last days. I got trapped in other work.
I plan to try the patch later this week / beginning next week. I'll keep you posted.
Comment 166 Gayan Perera CLA 2021-05-01 14:52:33 EDT
Created attachment 286287 [details]
Stat Scrips Python
Comment 167 Gayan Perera CLA 2021-05-01 14:54:45 EDT
(In reply to Gayan Perera from comment #166)
> Created attachment 286287 [details]
> Stat Scrips Python

@Sebastian you can use the attached python scripts to analyze the verbose output of the 

debug/indexmanager/ and debug/indexmanager/advance
Comment 168 Gayan Perera CLA 2021-05-18 14:15:38 EDT
@Sebastian @Stephan @Manoj @Vikas i have added this to 4.21-M1 so that we can merge this to master for further testing since the change is in the heart of JDT Indexing/Searching.
Comment 170 Eclipse Genie CLA 2021-06-12 03:00:41 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/181858
Comment 172 Andrey Loskutov CLA 2021-06-12 05:25:36 EDT
(In reply to Eclipse Genie from comment #171)
> Gerrit change https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/181858 was
> merged to [master].
> Commit:
> http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/
> ?id=c2d6207a81bedea241579f33a6bc55967920dbaf

That should fix releng error about broken javadoc, see https://download.eclipse.org/eclipse/downloads/drops4/I20210611-2000/testresults/html/org.eclipse.releng.tests_ep421I-unit-cen64-gtk3-java11_linux.gtk.x86_64_11.html
Comment 173 Gayan Perera CLA 2021-06-12 05:54:43 EDT
Thanks @Andrey
Comment 174 Gayan Perera CLA 2021-06-12 05:55:30 EDT
@Vikas should we change the issue status ? Its still in NEW status
Comment 175 Andrey Loskutov CLA 2021-06-12 06:41:06 EDT
(In reply to Gayan Perera from comment #174)
> @Vikas should we change the issue status ? Its still in NEW status

I typically wait till next SDK nightly build test results and resolve if there are no new related regressions and I see that the new feature/bug fix works for me. Javadoc is fixed now, so you can resolve tomorrow if the new functionality works as expected and without regressions in the IDE.
Comment 176 Andrey Loskutov CLA 2021-06-12 15:11:27 EDT
Please check DNF tests, at least one from them seem to be dorectly related to this bug.

https://download.eclipse.org/eclipse/downloads/drops4/I20210611-2000/testResults.php

https://download.eclipse.org/eclipse/downloads/drops4/I20210611-2000/testresults/ep421I-unit-cen64-gtk3-java16_linux.gtk.x86_64_16/org.eclipse.ui.tests.navigator.NavigatorTestSuite.txt

Herewe wait on monitor in main thread:

java.lang.Exception: ThreadDump for thread "main"
	at java.base@16.0.1/java.lang.Object.wait(Native Method)
	at java.base@16.0.1/java.lang.Object.wait(Object.java:320)
	at org.eclipse.jdt.internal.core.search.indexing.ReadWriteMonitor.enterWrite(ReadWriteMonitor.java:49)
	at org.eclipse.jdt.internal.core.search.indexing.IndexManager.updateMetaIndex(IndexManager.java:1489)
	at org.eclipse.jdt.internal.core.search.indexing.IndexManager.removeIndex(IndexManager.java:860)
	at org.eclipse.jdt.internal.core.JavaModelManager$PerProjectInfo.forgetExternalTimestampsAndIndexes(JavaModelManager.java:1319)
	at org.eclipse.jdt.internal.core.JavaModelManager.removePerProjectInfo(JavaModelManager.java:4286)
	at org.eclipse.jdt.internal.core.DeltaProcessor.checkProjectsAndClasspathChanges(DeltaProcessor.java:503)
	at org.eclipse.jdt.internal.core.DeltaProcessor.checkProjectsAndClasspathChanges(DeltaProcessor.java:591)
	at org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged(DeltaProcessor.java:2112)
	at org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged(DeltaProcessingState.java:478)
	at org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:305)
	at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
	at org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:295)
	at org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:158)
	at org.eclipse.core.internal.resources.Workspace.broadcastPostChange(Workspace.java:381)
	at org.eclipse.core.internal.resources.Workspace.endOperation(Workspace.java:1503)
	at org.eclipse.core.internal.resources.Resource.delete(Resource.java:791)
	at org.eclipse.core.internal.resources.Project.delete(Project.java:326)
	at org.eclipse.ui.tests.harness.util.FileUtil.delete(FileUtil.java:82)
	at org.eclipse.ui.tests.navigator.NavigatorTestBase.clearAll(NavigatorTestBase.java:283)
	at org.eclipse.ui.tests.navigator.NavigatorTestBase.tearDown(NavigatorTestBase.java:270)

Similar stack also here:

https://download.eclipse.org/eclipse/downloads/drops4/I20210611-2000/testresults/ep421I-unit-cen64-gtk3-java16_linux.gtk.x86_64_16/org.eclipse.pde.ui.tests.AllPDETests.txt

java.lang.Exception: ThreadDump for thread "main"
	at java.base@16.0.1/java.lang.Object.wait(Native Method)
	at java.base@16.0.1/java.lang.Object.wait(Object.java:320)
	at org.eclipse.jdt.internal.core.search.indexing.ReadWriteMonitor.enterWrite(ReadWriteMonitor.java:49)
	at org.eclipse.jdt.internal.core.search.indexing.IndexManager.updateMetaIndex(IndexManager.java:1489)
	at org.eclipse.jdt.internal.core.search.indexing.IndexManager.removeIndex(IndexManager.java:860)
	at org.eclipse.jdt.internal.core.JavaModelManager$PerProjectInfo.forgetExternalTimestampsAndIndexes(JavaModelManager.java:1319)
	at org.eclipse.jdt.internal.core.JavaModelManager.removePerProjectInfo(JavaModelManager.java:4286)
	at org.eclipse.jdt.internal.core.DeltaProcessor.checkProjectsAndClasspathChanges(DeltaProcessor.java:503)
	at org.eclipse.jdt.internal.core.DeltaProcessor.checkProjectsAndClasspathChanges(DeltaProcessor.java:591)
	at org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged(DeltaProcessor.java:2112)
	at org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged(DeltaProcessingState.java:478)
	at org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:305)
	at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
	at org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:295)
	at org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:158)
	at org.eclipse.core.internal.resources.Workspace.broadcastPostChange(Workspace.java:381)
	at org.eclipse.core.internal.resources.Workspace.endOperation(Workspace.java:1503)
	at org.eclipse.core.internal.resources.Resource.delete(Resource.java:791)
	at org.eclipse.core.internal.resources.Project.delete(Project.java:326)
	at org.eclipse.pde.ui.tests.ee.ExecutionEnvironmentTests.deleteProject(ExecutionEnvironmentTests.java:51)
	at org.eclipse.pde.ui.tests.ee.ExecutionEnvironmentTests.testJava4Environment(ExecutionEnvironmentTests.java:184)
Comment 177 Andrey Loskutov CLA 2021-06-12 15:44:10 EDT
Created attachment 286580 [details]
pde deadlock
Comment 178 Andrey Loskutov CLA 2021-06-12 15:44:42 EDT
Created attachment 286581 [details]
navigator deadlock
Comment 179 Gayan Perera CLA 2021-06-13 04:51:28 EDT
Should i create a new gerrit for these deadlocks in this issue ?
Comment 180 Andrey Loskutov CLA 2021-06-13 05:50:07 EDT
(In reply to Gayan Perera from comment #179)
> Should i create a new gerrit for these deadlocks in this issue ?

Sure, we need a fix ASAP, tests today again deadlocked.
Comment 181 Gayan Perera CLA 2021-06-13 07:12:35 EDT
(In reply to Andrey Loskutov from comment #180)
> (In reply to Gayan Perera from comment #179)
> > Should i create a new gerrit for these deadlocks in this issue ?
> 
> Sure, we need a fix ASAP, tests today again deadlocked.

Working on a fix right now. Will be able to push late evening.
Comment 182 Gayan Perera CLA 2021-06-13 07:15:16 EDT
(In reply to Gayan Perera from comment #181)
> (In reply to Andrey Loskutov from comment #180)
> > (In reply to Gayan Perera from comment #179)
> > > Should i create a new gerrit for these deadlocks in this issue ?
> > 
> > Sure, we need a fix ASAP, tests today again deadlocked.
> 
> Working on a fix right now. Will be able to push late evening.

Is there away i can reproduce this deadlock locally? Tried running the pde test but it ran fine. But I’m working on a potential fix which will move all meta index operations to indexer thread to avoid race conditions.
Comment 183 Andrey Loskutov CLA 2021-06-13 10:54:38 EDT
(In reply to Gayan Perera from comment #181)
> (In reply to Andrey Loskutov from comment #180)
> > (In reply to Gayan Perera from comment #179)
> > > Should i create a new gerrit for these deadlocks in this issue ?
> > 
> > Sure, we need a fix ASAP, tests today again deadlocked.
> 
> Working on a fix right now. Will be able to push late evening.

Thanks. I've created dedicated bug 574171 for this, please use that for commit message.
Comment 184 Andrey Loskutov CLA 2021-06-13 11:05:41 EDT
@Gayan: I've also created bug 574172 for the new test failure which is most likely regression from this bug, please check that too.
Comment 185 Eclipse Genie CLA 2021-06-13 13:41:27 EDT
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/181874
Comment 187 Vikas Chandra CLA 2021-06-14 05:09:28 EDT
(In reply to Andrey Loskutov from comment #172)
> (In reply to Eclipse Genie from comment #171)
> > Gerrit change https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/181858 was
> > merged to [master].
> > Commit:
> > http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/
> > ?id=c2d6207a81bedea241579f33a6bc55967920dbaf
> 
> That should fix releng error about broken javadoc, see
> https://download.eclipse.org/eclipse/downloads/drops4/I20210611-2000/
> testresults/html/org.eclipse.releng.tests_ep421I-unit-cen64-gtk3-
> java11_linux.gtk.x86_64_11.html

I think we should change "Malformed javadoc comment" to 'Error' in every project.
I have created Bug 574183 for this. Also early M1 is a good time for this ( which we are in currently).
Comment 188 Manoj N Palat CLA 2021-07-09 04:13:14 EDT
Bulk move out of 4.21 M1; Owner/QA please readjust the target
Comment 189 Andrey Loskutov CLA 2021-07-12 01:44:15 EDT
We should be now fine, the new meta index is enabled now, no tests deadlock anymore. Thanks Gayan.
Comment 190 Andrey Loskutov CLA 2021-07-12 15:16:18 EDT
I believe we should have bumped index version for JDT, to trigger re-indexing for existing workspaces.

I've just had an issue that I was not able to find implementors for org.eclipse.jdt.core.ICodeAssist.codeSelect(int, int) method in my main wokrspace on the new SDK build, and right after I've re-created JDT index that was found.
Comment 191 Andrey Loskutov CLA 2021-07-12 15:29:30 EDT
(In reply to Andrey Loskutov from comment #190)
> I believe we should have bumped index version for JDT, to trigger
> re-indexing for existing workspaces.
> 
> I've just had an issue that I was not able to find implementors for
> org.eclipse.jdt.core.ICodeAssist.codeSelect(int, int) method in my main
> wokrspace on the new SDK build, and right after I've re-created JDT index
> that was found.

Stephan, Manoy: not sure how to do this. For workspace build state we have   org.eclipse.jdt.internal.core.builder.State.VERSION increment - but what is the right way to trigger re-indexing of everything *once*? Or is this one universal?
Comment 192 Stephan Herrmann CLA 2021-07-12 19:13:13 EDT
(In reply to Andrey Loskutov from comment #191)
> (In reply to Andrey Loskutov from comment #190)
> > I believe we should have bumped index version for JDT, to trigger
> > re-indexing for existing workspaces.
> > 
> > I've just had an issue that I was not able to find implementors for
> > org.eclipse.jdt.core.ICodeAssist.codeSelect(int, int) method in my main
> > wokrspace on the new SDK build, and right after I've re-created JDT index
> > that was found.
> 
> Stephan, Manoy: not sure how to do this. For workspace build state we have  
> org.eclipse.jdt.internal.core.builder.State.VERSION increment - but what is
> the right way to trigger re-indexing of everything *once*? Or is this one
> universal?

builder.State is just for building.

For search, Manoj is the guru, but I guess that org.eclipse.jdt.internal.core.index.DiskIndex.SIGNATURE might be what you are looking for?
Comment 193 Gayan Perera CLA 2021-07-13 02:12:24 EDT
(In reply to Andrey Loskutov from comment #190)
> I believe we should have bumped index version for JDT, to trigger
> re-indexing for existing workspaces.
> 
> I've just had an issue that I was not able to find implementors for
> org.eclipse.jdt.core.ICodeAssist.codeSelect(int, int) method in my main
> wokrspace on the new SDK build, and right after I've re-created JDT index
> that was found.

The code is written in a way that if the meta index is not present or if a particular index is not captured in meta index, then those indexes are included in the search. 

Andrey could it be that you had a partially built meta index with previous changes ? May be we can try to re init the workspace with meta idex disabled and build all indexes and then restart with meta index enable to see how it works.
Comment 194 Gayan Perera CLA 2021-07-13 02:13:45 EDT
And the implementation search I don’t think it goes through type hierarchy builder right ? If that so the meta index will not get involved at all.
Comment 195 Andrey Loskutov CLA 2021-07-13 05:55:54 EDT
(In reply to Gayan Perera from comment #193)
> The code is written in a way that if the meta index is not present or if a
> particular index is not captured in meta index, then those indexes are
> included in the search. 

OK.

> Andrey could it be that you had a partially built meta index with previous
> changes ?

Not sure what is needed for that, but yes, I used my main workspace that I use for SDK work with almost every nightly SDK build.

(In reply to Gayan Perera from comment #194)
> And the implementation search I don’t think it goes through type hierarchy
> builder right ? If that so the meta index will not get involved at all.

I have no idea.

OK, let close this one if there is no need for index update. I assume my use case with workspace that was used with every nightly SDK build is not very common :-)
Comment 196 Gayan Perera CLA 2021-07-13 06:30:18 EDT
I will do a test to verify this Andrey and report a issue if we need to act on it.
Comment 197 Andrey Loskutov CLA 2021-07-13 11:37:54 EDT
*** Bug 574817 has been marked as a duplicate of this bug. ***
Comment 198 Al Bundy CLA 2021-07-14 05:59:52 EDT
In addition to #574817


I've tested the I-build with a copy of my workspace.

But it takes extrem long to index my projects.
But at least with the I build I see the index-info in the statusbar.
-> my biggest projects containing ~90000 files...

after ctrl+T was pressed and the index is running I can do nothing in eclipse.
At the beginning I've seen a red button in the statusbar to cancel the index or the type-hierarchy.
After I switched to another application an back to eclipse the button was gone while the index was still running and the hierarchie still not oben.

I'm not sure if the button was removed or is just overlayed from "Create type hierarchy on java.lang.Cloneable..."
Comment 199 Gayan Perera CLA 2021-07-14 06:20:02 EDT
I think you are seeing different problems here. Shall we focus on slow Type Hierarchy first. Can you let the indexing run as I mentioned earlier? 90000 is bit large workspace. So it might take some time. But that can be measured later comparing with a index build on same workspace with a previous release. 

So shall we first let the index build to run and see whether you see an improvement on Type Hierarchy?
Comment 200 Al Bundy CLA 2021-07-14 08:06:36 EDT
I think in my case this was due to non indexed files or because the of large workspace and because of the already mentioned freeze.

With an "empty" workspace it feels that 4.21 is as fast as 4.20.

What I did:
- create new workspace for both versions (because my workspace needs too long for this test)
- create new java-project
- create new class
- import java.io.Serializable
- select Serializable and press ctrl+T

after ctrl+T was pressed a progress in statusbar is shown.
-> this needs 5 seconds to complete
after this 5 seconds it needs 5 seconds more until the hierarchy is open.

In this 10 seconds I can do nothing in eclipse.
If I press the red button in the statusbar (left of the progress) to cancel the current operation I can work with eclipse again.
-> But this works only in the first 5 seconds of course :-)

If you swithch to another app while the progress is shown and the button is visible both are vanished if you come back to eclipse.
-> in my case it's enough click on the windows task-bar and after that on the eclipse statusbar.
Comment 201 Gayan Perera CLA 2021-07-14 08:57:55 EDT
@AI i see you concern from a UX point of view. But for that could you report a separate issue with a screen recording if possible. 

But with your large workspace could you try to open TH after indexing is finished to see if 4.21 is improved. Keep in mind you need to let the indexing to finish in your first try. Then restart eclipse and try the TH, with this bug fix it should be fast to open the TH.
Comment 202 Al Bundy CLA 2021-07-15 09:52:16 EDT
I cannot say if 4.21 is improved because I don't want to do this test on 4.20 again. :-)

What I can say with my test:

- copied my workspace with all projects
- create new class
- import a class
- select the class and press ctrl+T

In 4.21 I can see the progress - but this takes still very long.
--> but I can cancel the job (if the button is visible).

after the progress is finished the TH is opened very quickly.

Depending on the class I've opened the TH is slower or faster.

e.g. java.lang.Clonable takes very long where java.util.concurrent.Executor is relativly fast.
TH of Cloneable is displayed without results but I think this is because of the big workspace and a lot of cloneables.
Comment 203 Al Bundy CLA 2021-07-15 09:59:06 EDT
for the issue with the cancel-button I've created a new report: https://bugs.eclipse.org/bugs/show_bug.cgi?id=574862
Comment 204 Gayan Perera CLA 2021-07-15 15:17:43 EDT
(In reply to Al Bundy from comment #202)
> I cannot say if 4.21 is improved because I don't want to do this test on
> 4.20 again. :-)
> 
> What I can say with my test:
> 
> - copied my workspace with all projects
> - create new class
> - import a class
> - select the class and press ctrl+T
> 
> In 4.21 I can see the progress - but this takes still very long.
> --> but I can cancel the job (if the button is visible).
> 
> after the progress is finished the TH is opened very quickly.
> 
> Depending on the class I've opened the TH is slower or faster.
> 
> e.g. java.lang.Clonable takes very long where java.util.concurrent.Executor
> is relativly fast.
> TH of Cloneable is displayed without results but I think this is because of
> the big workspace and a lot of cloneables.

I think you can just copy your .metadata folder in your workspace to some where safe and just open the same workspace with 4.20 and 4.21 and do index rebuilds. I do it all a lot for testing.
Comment 205 Gayan Perera CLA 2021-07-15 15:22:07 EDT
(In reply to Andrey Loskutov from comment #195)
> (In reply to Gayan Perera from comment #193)
> > The code is written in a way that if the meta index is not present or if a
> > particular index is not captured in meta index, then those indexes are
> > included in the search. 
> 
> OK.
> 
> > Andrey could it be that you had a partially built meta index with previous
> > changes ?
> 
> Not sure what is needed for that, but yes, I used my main workspace that I
> use for SDK work with almost every nightly SDK build.
> 
> (In reply to Gayan Perera from comment #194)
> > And the implementation search I don’t think it goes through type hierarchy
> > builder right ? If that so the meta index will not get involved at all.
> 
> I have no idea.
> 
> OK, let close this one if there is no need for index update. I assume my use
> case with workspace that was used with every nightly SDK build is not very
> common :-)

I think I see the issue you got. For example, say you open a workspace which doesn't has a meta index and then you edit few classes. The delta update will cause index changes which will partially update the meta index for that project or jar.

Now when you trigger a TH , the IndexManager thinks this jar or project is covered by meta index. So yes I think we need to either find a way to handle this, 

- like if we see that a index is not part of meta index where it has received some updates we could trigger a rebuild on that index only by removing that index.

- We could also change the index versions and make the whole workspace rebuild.

I prefer the option one, but need to check the code to see how complex it will be.
Comment 206 Jay Arthanareeswaran CLA 2021-08-26 05:45:42 EDT
As reported by Gayan, marking the bug as verified for 4.21 RC1.