51638 – Indexing speed needs to be increased

Bug 51638 - Indexing speed needs to be increased

Summary: Indexing speed needs to be increased

Status:	RESOLVED DUPLICATE of bug 59468

Alias:	None

Product:	CDT
Classification:	Tools
Component:	cdt-core (show other bugs)
Version:	2.0
Hardware:	PC Windows XP

Importance:	P3 major (vote)
Target Milestone:	---
Assignee:	Bogdan Gheorghe
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2004-02-11 10:05 EST by Bogdan Gheorghe
Modified:	2004-06-21 11:16 EDT (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Bogdan Gheorghe

2004-02-11 10:05:02 EST

As a result of making the indexer more fault tolerant, we run the parser in a 
thread. This contains any potential parser blow ups but as a result, the 
overall indexing job takes much longer.

For 2.0, we need the best of both worlds:

i) ensure that all parser exceptions and errors are handled
ii) ensure that indexing performance doesn't take a hit

Comment 1 Nikolay Metchev

2004-06-02 10:37:51 EDT

I am working on a project that has about 2000 files and with CDT M9 + eclipse 
RC1 the indexer causes eclipse memory size to grow to 400 MB and uses 100% of 
my CPU. After about 20 minutes I had to kill eclipse because nothing much was 
happening and my hard disk was getting thrashed excessively.
Is it not possible to run the indexer in an incremental manner?
i.e. Whenever a file is opened for editing it gets indexed and cached (plus 
any of its includes). 

Perhaps even make this an option so that for small projects indexing does 
occur for all files in the background!

Comment 2 Bogdan Gheorghe

2004-06-02 11:01:52 EDT

The indexer does work in an incremental manner - once the initial index is 
complete - and it runs on its own background thread. I'm assuming you are 
using a standard make project? Are all the paths set up? We have a new tool to 
help diagnose potential problems:

i) Index problem reporting: this puts a marker on problems reported by the 
parser (such as  missing includes etc). Go to project properties->C/C++ 
Indexer->Enable C/C++ Index problem reporting

ii) Narrow the scope of the index (valid if you are using standard make 
projects): Go to C/C++ Projects Paths and click on the source tab. Select
which folders/files you want included in the index. (Default is entire 
project).

Comment 3 Nikolay Metchev

2004-06-02 11:24:26 EDT

The project I am working on is compiled using microsoft visual C++ compiler. 
So I only want to use eclipse as my editor (I don't think it can be configured 
to use microsoft yet). 

I started using your new tool and noticed that I hadn't set any of my paths. 

But once I put all the microsoft SDKS, platform SDKS and all the Teamphone 
source on the project path the indexer began to index really slow and exhibit 
the characteristics outlined in comment #1.

Is it not possible for the initial indexing to be incremental as well? I tried 
cancelling the background task but if I do that I don't think any files get 
reindexed EVER!. 

1. I had a file which was #including X and the indexer was warning me that it 
can't find it.
2. Added X to the include path of the project
3. Cancel background indexing
4. the warning doesn't go away even if I close and open the 
file/project/eclipse
5. If I touch the file then the indexer runs and gets rid of the warning. This 
is not a good solution because our build process is incremental and any files 
that get touched will get recompiled needlessly.

Comment 4 Thomas Fletcher

2004-06-02 21:19:03 EDT

The other issue, that Bogdan didn't mention, is that indexing is used for
a large number of operations so it is not practical to only do it once your
file is open.  The C/C++ indexer should not be confused with a simple
text indexer.

The development of a storage incremental indexer (the indexer is incremental
and run in the background already as Bogdan mentioned but last time I looked
you needed to give it a full run over the source before it stored its results)
which can index with enough know how to be able to present conditional 
information is non-trivial. This is part of the area of work for the DOM.

Comment 5 Nikolay Metchev

2004-06-03 04:04:23 EDT

I was afraid of that. The other thing I am afraid of is that not much can be 
done to improve performance (both in terms of CPU and memory) of the initial 
indexing that you say needs to happen once. 
The interesting thing is that if my include paths are all wrong (non-existant) 
then indexing occurs pretty quickly. However once the include paths are 
corrected performance degrades extreemly quickly.

Comment 6 Thomas Fletcher

2004-06-03 08:27:15 EDT

If your include paths are note set up properly, then you aren't
really indexing the true context of the source.  The parser is
the equivalent to a preprocessor/compiler that does no compiling.
If you don't have include paths set up properly, then that source
is "in error" and you won't get any results for it.  But it will
be fast =;-(

Comment 7 Nikolay Metchev

2004-06-03 08:29:57 EDT

The question remains is it theoretically possible to improve performance so 
that it scales to big projects.

Comment 8 Thomas Fletcher

2004-06-03 08:41:43 EDT

Yes ... without a doubt.  There are a number of strategies being investigated
(Bogdan/JohnC/DougS can talk more about the details) but these include:
- Changing the indexing "mode" so that even with changes in include paths
and defines the index doesn't have to be re-generated for all files (which
it does today).  This is tangentially related to the C/C++ DOM discussion
and doesn't get away from the fact that you still have to index everything
once at some point in time.
- Improve performance in the parser.  The initial kick here is to make the
parser work, then work fast.  John C. and David D. have made some _huge_
changes from CDT 1.2 to 2.0 which address both correctness, speed as well 
as memory footprint.  
- Do a better job of limiting the scope.  This has to do with having proper
configurations and settings and there has been work to do autodetection
in this area (Vlad) as well as more limiting constraints on what is source
and what is not using the path entry settings (Alain/Dave I.).
- Do a better job of partially persisting the index out of memory.  I
_think_ that this is where much of the memory bloat is coming from currently
but I don't have any hard facts to back that up.  This is an open area which
I don't think Bogdan is working on currently (Bogdan?)

As you can see lots of work going on to make things better.  If you are
interested in helping directly with the development, talk to Bogdan.
These large source benchmarks are invaluable for us.

Incidentally, last time I checked our indexing speed was on par with Visual
Slick Edit for large projects ... the difference was that Slick Edit was in 
your face about the progress rather than doing all of the work in the 
background.

Comment 9 Nikolay Metchev

2004-06-03 09:33:33 EDT

Thanks Thomas for the reply.
I am afraid that I haven't got any spare time to actually do coding. 
However I do tend to use the latest versions and test them and report any bugs 
I find. 
Sometimes if the bug is small enough I can even try and debug the workspace 
myself so I can better report the error but thats about it. 
Keep up the good work and I look forward to future enhancements.

Comment 10 Nikolay Metchev

2004-06-04 07:19:22 EDT

Thomas,
Regarding items you outlined in comment 8. Do they have corresponding bugzilla 
items? Is there a webpage which details the plans for them?

Comment 11 David Daoust

2004-06-07 17:19:04 EDT

There is also the option of turning off the index totally for your project.  
You will lose the ability to refactor and search (other than file search), but 
you will be able to use CDT for editing and content assist (as well as the 
usual compile-debug cycle.)  This is obviously not the way that we are pushing 
CDT, but it can get you started until we fix our issues.

Comment 12 David Daoust

2004-06-21 11:16:29 EDT

I am going to mark this defect as a duplicate of the "Magic performance 
placeholder" defect 59468  --  I am trying to get a clean view of the defect 
list, and there are a number of defects that have the same cause (and solution).

*** This bug has been marked as a duplicate of 59468 ***