Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cdt-dev] Parallelization of indexer

Hi Volker!
Ad 1)
I am not interested in the discussion whether one should use a monolithic project or should split up the source into multiple projects. Both ways are a valid way of using CDT.

Ad 2) 
We have a specific handling of project references in place (index of referenced project is reused). One can challenge this approach (I am not very convinced of this approach). The performance issue would be a way to start this challenge. However, we cannot simply change the behavior of CDT without a discussion and some analysis on the matter, and as long as we use this approach your patch has to honor it.

Ad 3) and 4)
As you realized yourself in 4), parallelization on file level would solve 2), because than you can index the reference project before the dependent one and at every time you would do that in parallel on file-level. As always, there are pros and cons to each approach.

Ad 1)
Right, the discussion on parallelization on file-level vs. project level can be done in parallel. 

Ad 2)
You have two options: Either you make your patch honor project references, or you work towards changing CDT such that it ignores project references. For the latter a bugzilla on the performance issue may be your starting point. 

Markus.

-----Original Message-----
From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Volker Diesel
Sent: Monday, November 28, 2011 21:26
To: cdt-dev@xxxxxxxxxxx
Subject: [cdt-dev] Parallelization of indexer

Hi Markus.

Yes, this discussion has been somewhat frustrating and I still do not agree with your comments (and the comments of others about that topic) for several reasons.
1) Everyone involved in this discussion seems to believe, that it is possible to setup one monolithic Eclipse project for a large-scale and real-life C++ project, and that this is the "normal" use-case for CDT, and that parallelization of indexer jobs across Eclipse projects therefore doesn't help much. I already questioned that opinion in b#351659, because I cannot see, how you would setup one monolithic Eclipse project, if the sources require e.g. different sets of #define's or different include pathes (and source files in real-life projects normally do require this). You won't get correct indexer results in that case. But maybe, I missed some point...
2) In fact, my patch does not honour project references, but I explicitly asked, if that would be a real problem, and your reply was, that the only impact is potential replication of some of the symbols in multiple project indices. Therefore, my understanding was that there is no "real" issue, if parallelized indexer does not take project references into account.
3) I cannot see, how parallelization on file level instead of project level can resolve issue 2). If indexer job#1 indexes a file from project A and indexer job#2 indexes a file from project B, then you are back at the same point... Should these jobs honour references between project A and B? I cannot see any difference.
4) Only solution to problem 3) would be to limit parallel indexer jobs (on file level) to the set of files of one and the same project. In that case, there will be at least three other issues... First, when there are many projects with only a few files, parallelization will be poor. Second, at the end of the process of indexing each project, parallization will be poor, because there are more potential free CPU cores than there are files left to index in that single project. Third (and most important) all these parallel jobs on the files of one single project will run into lock contention on the project's index write lock. I already faced a similar issue with my patch and had to change some of the locking code to achieve enough throuput/CPU utilization with my approach.

Therefore, from my point of view...
1) Discussion about paralellization across projects vs. parallelization across source files is independent of the question of honouring project dependencies. Both approaches need either a fix for the performance issues mentioned in b#351659 or a decission not to honour project dependencies while indexing.
2) And yes, of course I could open another bugzilla about that performance issue, but what would that help? I already mentioned that issue, I captured jprof profiling information and attached the profiler data to b#351659, and I asked for someone to look into that data. Nothing happend. Why should that change, if I opened a second bugzilla and attached the same profiler data again?

Kind regards.
Volker



-----Ursprüngliche Nachricht-----
Date: Mon, 28 Nov 2011 06:40:34 +0000
From: "Schorn, Markus" <Markus.Schorn@xxxxxxxxxxxxx>
To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
Subject: Re: [cdt-dev] Parallelization of indexer
Message-ID:
<30D36C1BA62C5F4892C482E607D5E77E1FA57921@xxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

Hi Volker!
I can understand your frustration, however there is an issue with the patch as provided in bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=351659. My view on the matter is the following:

(1) Your patch does not deal with indexing dependent projects. Currently the index of a project is reused by a dependent project. This requires the dependent project to be indexed after its dependencies. Your patch ignores this requirement.
You have identified that indexing with dependencies does introduce a performance issue. This needs further investigation and may lead us to changing the indexer, such that it ignores the project references. However, before we have made such a decision, your patch cannot be applied.

(2) The approach of parallelizing indexing on project level does not help for large projects. I do agree with Sergey, that it would be more rewarding to make parallelization work on file-level.


To move forward on the issue, I encourage you to open a new bug on the performance issue of dependent projects. We need a discussion on that and only if we drop the requirement of reusing the index of a dependent project we can go back to consider your patch.

In parallel it makes sense to think about parallelization of file-level. Because thinking long-term, this is the more promising approach the approach would find more traction.

Markus.


-----Original Message-----
From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Volker Diesel
Sent: Friday, November 25, 2011 23:34
To: CDT Dev
Subject: [cdt-dev] (no subject)

Hello, everybody.
There used to be some discussion about C/C++ indexer parallelization some months ago and (initially) most people agreed, that this would be a great feature.
There is a patch in place, that brings down full C/C++ indexing time from 4hrs to 20mins in our project (see bug#351659).
This patch has now been used in our team (200+ people, 10+Mio lines of C/C++ code) without any issue for several months.
I provided a git patch for CDT master.
I provided a git patch for CDT 8.
I have not received any answer to my latest questions in the above mentioned bugzilla since months.
I wonder, if anyone out there in CDT DEV is still interested in that topic?
I wonder, how such an enhancement will finally find its way to any CDT codeline and what I can else do to bring this feature into official CDT release?
If noone at CDT DEV is any longer interrested in that topic, please let me know. In that case I would simply close that useless bugzilla.
Thanks and kind regards.
Volker
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev




Back to the top