Bug 171834 - incremental index of header invalidates indexed information from dependents
Summary: incremental index of header invalidates indexed information from dependents
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 4.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-26 13:11 EST by Andrew Ferguson CLA
Modified: 2020-09-04 15:20 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Ferguson CLA 2007-01-26 13:11:13 EST
if you have

header.h:
   class E {};

references.cpp
   #include "header.h"
   E var;

and manually edit header.h so that its

header.h:
   enum E {A,B,C};

then the generated resource delta will cause header.h to be reindexed, but not the dependent "references.cpp" - in which case the binding for "var" will still point to the PDOMCPPClassType for "class E{};"

For correctness, I think we'd need to either
 (a) reindex all files that include header.h (I guess transitively)
or maybe
 (b) reindex all files that reference a binding defined/declared in header.h

Note this is in the opposite direction to the optimisation used in the fast indexer (reindex of reference.cpp does not need reindex of header.h)

I'm not sure what the right thing to do here is, but am logging this as a known issue
Comment 1 Andrew Ferguson CLA 2007-01-26 13:19:54 EST
(I've added a failure test case in IndexBugsTests)
Comment 2 Markus Schorn CLA 2007-01-29 03:52:34 EST
Automatically indexing all depended source files leads to heavy indexing. The computation of what has to be indexed isn't cheap either. That's why I'd rather stay away from doing this.
A way to improve user experience is to consider reindexing the underlying file of an editor when it is activated.
Comment 3 Andrew Ferguson CLA 2007-02-06 10:02:19 EST
I'm wondering if there isn't some heuristic we could use as I think in a good proportion of cases, reindexing might be feasible?

i.e. I'd assert there is a big difference between
 (a) editing a header thats included by everything in an sdk e.g. win32api.h
 (b) and editing a header local to a users application

and that we might get away with a reindex of the sub-tree in (b).

Someone suggested to me that we could use the dependency graph already stored in the pdom and invent a metric based on that (or possibly even just count how many times its included transitively, maxing out an arbitrary point). Additionally if we could summarize an idea of the size of a file (name count?), that could be taken into account in the metric.
Comment 4 Markus Schorn CLA 2007-11-26 05:10:54 EST
I don't have a complete solution for this. However, I intend to implement a heuristics to cover the some common cases. These can be described by two conditions:

Cond 1: a change in the header requires to edit all the files making use of the changed declaration.
Cond 2: in case you have more than 5 files making use of the declaration, you'll change the header first.

Cond 1 is certainly false in many cases, which is the main limitation of the following heuristics:

(1) The indexer can easily track a list of the 5 last recently changed files.
(2) Whenever a header is changed, the indexer can updated the 5 recently changed
    files together with the header.
Comment 5 Markus Schorn CLA 2007-11-27 04:32:50 EST
The heuristics is in place, IndexBugsTests.test171834() and IndexUpdateTests.testChangingSourceBeforeHeader_Bug171834() are passing. I don't claim that the bug is fixed.

Improved in 5.0 > 20071127.
Comment 6 John Liu CLA 2011-05-12 13:34:45 EDT
(In reply to comment #5)
> The heuristics is in place, IndexBugsTests.test171834() and
> IndexUpdateTests.testChangingSourceBeforeHeader_Bug171834() are passing. I
> don't claim that the bug is fixed.
> Improved in 5.0 > 20071127.

Hi, Markus:

Are there any updates to this bug fix? I found a side effect problem caused by the current heuristics fix. The problem scenario is as follows:

There are two source directories, A and B, both contain some source files with duplicate file names, for example directory A contains sameName.cpp and sameName.h and directory B also contains sameName.cpp and sameName.h. Then we create a C++ project Pa against the directory A, Pa gets indexed. When we create another project Pb against the directory B, I found the files of sameName.cpp and sameName.cpp under Pa gets index updated during the project Pb creation, even though there is no change within Pa. 

The same problem happens also if I create an empty project, i.e. Pc and then copy the files from Pa and paste them to Pc, all files under Pa gets index updated after the files are pasted.

I set a break point on the line to call "addLastRecentlyUsed(changeMap);" in the class CModelListener, and found out that unchanged project Pa is added to changeMap by this function, then from the comment in the code, I found this bugzilla number, so I guess this problem is a side effect of this bug fix.

When the project contains large size of these duplicate name files, this could be a very serious performance problem due to the unnecessary indexing works.
Comment 7 Markus Schorn CLA 2011-05-13 02:06:45 EDT
(In reply to comment #6)
I don't think that the heuristics can introduce a performance issue. Do you see one?
Comment 8 John Liu CLA 2011-05-13 10:56:45 EDT
(In reply to comment #7)
> (In reply to comment #6)
> I don't think that the heuristics can introduce a performance issue. Do you see
> one?

In my example, there are unnecessary index updating to the files under the project Pa in two scenarios(project creation and copy/paste files). These unnecessary index updating could slow down the operation. If there are lots of files under the project or the files are large, then this could be a big impact to the performance.
Comment 9 Markus Schorn CLA 2011-05-16 03:49:18 EDT
(In reply to comment #8)
> In my example, there are unnecessary index updating to the files under the
> project Pa in two scenarios(project creation and copy/paste files). These
> unnecessary index updating could slow down the operation. If there are lots of
> files under the project or the files are large, then this could be a big impact
> to the performance.
That's a conjecture of yours. As far as I can recall the additional updates are limited to 5 or 10 files. Again, do you actually see a performance problem?
Comment 10 John Liu CLA 2011-05-16 11:00:20 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > In my example, there are unnecessary index updating to the files under the
> > project Pa in two scenarios(project creation and copy/paste files). These
> > unnecessary index updating could slow down the operation. If there are lots of
> > files under the project or the files are large, then this could be a big impact
> > to the performance.
> That's a conjecture of yours. As far as I can recall the additional updates are
> limited to 5 or 10 files. Again, do you actually see a performance problem?

Oh, I didn't know there is a limited number to the additional updates, I thought all files under the additional project will be updated. Then the problem I concerned is not valid anymore. Thanks for clarifying this.