497513 – [newindex] Rework the JDT model cache to work better with the new index

Bug 497513 - [newindex] Rework the JDT model cache to work better with the new index

Summary: [newindex] Rework the JDT model cache to work better with the new index

Status:	CLOSED WONTFIX

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	4.6
Hardware:	PC Linux

Importance:	P3 normal with 1 vote (vote)
Target Milestone:	---
Assignee:	JDT-Core-Inbox
QA Contact:

URL:
Whiteboard:	stalebug
Keywords:

Depends on:	481796
Blocks:
	Show dependency tree

Reported:	2016-07-07 13:21 EDT by Stefan Xenos
Modified:	2023-03-17 16:15 EDT (History)
CC List:	7 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stefan Xenos

2016-07-07 13:21:09 EDT

This bug relates to the new JDT index being worked on in bug 481796.


THE PROBLEM:
------------

To get optimal performance from the new index, we need to use different access patterns since the bottlenecks are in a different place. With the old (existing) code, opening a new class for the first time was slow and the IBinaryType implementation used a lot of memory. For this reason, the cache worked like this:

1. The first time a type is referenced (or any of its contents), recursively walk up the model and build all the ancestor model nodes.
2. When the node for a class is hit, load the class file and then do a depth-first traversal of the class's contents, inserting them into the cache. (I'll refer to this as the "eager top-down approach").
3. Children of a class file (methods, variables, and whatnot) don't contain any code for populating their own cache entries. They just rely on the fact that their container would have already inserted their data into the cache in step 2.

Step 2 is a problem for the new index.

With the new index, referencing a class for the first time is very cheap and so is the implementation of IBinaryType. However, there is an additional (but small) cost for each additional node read from the index. If you only read the bits you need as you need them, the index is much faster than reading the class files... but if you exhaustively read all the contents of a class from the index, you lose most of that speed-up.

Currently, the newindex branch contains code to suppress the depth-first traversal of in step 2 (and the cache itself) whenever reading data from the index. This addresses the speed problem but has been causing test failures (and presumably API breakage) because methods don't find their data in the index in step 3 and don't currently have the ability to construct that data if its missing. So clearly this has to change.


PROPOSAL:
---------

I propose the following algorithm for constructing (and caching) class contents:

1. The first time a class is referenced, it is loaded from disk using ClassFileReader and a model is built just for the class. No depth-first traversal occurs and nothing is inserted into the existing cache.
2. The ClassFileReader is inserted into a special queue for the indexer. The indexer will give top priority for indexing those classes and it will use the existing ClassFileReader rather than reading from disk if it exists.
3. Children of a class will construct their data lazily on first access, based on their parent's IBinaryType. If the class was unreferenced, it will get a newly-created ClassFileReader and step 1 kicks off. If the class file was previously referenced but hasn't been indexed yet, it gets the existing ClassFileReader from the queue. If the class was already indexed, it gets an IBinaryType from the index.
4. After the indexer processes one of the ClassFileReaders that were created in step 1, it frees up the memory used by the ClassFileReader. Subsequent accesses to that IBinaryType will be served by the index itself.


This has several nice properties:
- Class files won't be read more than once (either by the model and indexer, or by multiple model accesses).
- There will be no code replication between the code that needs to construct the model lazily bottom-up and the code that would construct it eagerly top-down.
- The model should perform well both for classes which haven't been indexed yet and for classes that are present in the index.
- It should address many of the test failures that are currently occurring on the newindex branch.


ALTERNATIVES CONSIDERED:
------------------------

We also considered a couple other approaches:

1. Construct the model lazily and bottom-up, and use a soft reference cache for the ClassFileReaders to prevent redundant reading of class files. Ignore the cache in JavaModelManager. This would address the test failures, would have no code replication, and would perform reasonably well... but the performance wouldn't be optimal. Each class file might be read twice (by the indexer and model), and memory wouldn't be cleaned up by the soft reference cache as quickly as it could.

2. Include code for both lazy bottom-up model construction (for data coming from the index) and eager top-down construction (for data coming from ClassFileReader). This should have better memory characteristics than 1 since it could free up the memory for ClassFileReaders sooner, but would have a lot of logic in two places which would be an easy place for bugs to hide. It would still parse each class file twice.

Comment 1 Lars Vogel

2016-08-17 06:04:34 EDT

Can the JDT code team answer Stefans request? This issue seems to block the work for the important change in Bug 481796. 

Our company supports one large customer which considers evaluating other IDEs belong of the long indexing time (>40 min) and I have hopes that the solution of 481796 will help that he stays with Eclipse.

Comment 2 Stefan Xenos

2016-08-17 12:50:41 EDT

Re: comment 1

Actually, I'm not really blocked on comments from the rest of the JDT core team in order to fix this (unless someone objects to the proposal, that is). I'm just swamped with work. I was planning to do this myself as soon as I've sorted out the rest of the unit test failures on the newindex branch (there are currently 27 remaining failures)

Comment 3 Stefan Xenos

2016-10-04 13:53:14 EDT

Still an issue.

Comment 4 Jay Arthanareeswaran

2016-10-28 02:46:59 EDT

Bulk change, moving out all bugs that couldn't make it to M3.

Comment 5 Stephan Herrmann

2017-01-22 18:36:09 EST

Is this still planned for Oxygen?

Comment 6 Manoj N Palat

2017-01-25 05:17:06 EST

(In reply to Stephan Herrmann from comment #5)
> Is this still planned for Oxygen?

@Stefan: moving to a general target of 4.7 - please move out if not planned for oxygen

Comment 7 Stephan Herrmann

2017-01-25 15:31:42 EST

Did I mention that this would probably make life much easier around bug 466299?
:)

Comment 8 Stefan Xenos

2017-01-25 15:39:44 EST

Still hoping to get to this in Oxygen, but better incremental indexing and adoption in search will come first so it's no sure thing.

If anyone would care to help, that would be lovely... :-)

Comment 9 Manoj N Palat

2018-04-16 09:38:46 EDT

bulk move out of 4.8

Comment 10 Stefan Xenos

2018-05-10 13:33:24 EDT

Note: if the new index isn't performing as expected, this is probably why.

Comment 11 Eclipse Genie

2021-11-24 12:51:21 EST

This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.

Comment 12 Simeon Andreev

2023-03-17 16:15:47 EDT

New index was disabled per default (bug 515496) and later on removed (bug 572978).