Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cdt-dev] Parallelization of indexer

Hi Greg,
There is still support for multiple indexers, however the UI for that does not show up as long as you do not supply an alternative indexer. Different to earlier days of CDT, it is no longer simple to provide an alternative indexer that can come close to the one that is built into CDT. Therefore the usual path for a new indexer feature is to make it part of the existing indexer. Clearly such a feature can be made dependent on a preference setting.

Whatever feature goes into CDT causes bug reports and when it comes to dealing with issues the enthusiasm of contributors and also committers is limited. I have quite a list of annoying 'experimental' features in CDT that simply don't work correctly (and probably never will). I do think it is a good idea to discuss and analyze the impact of new features before putting them into CDT.

In the given case the new feature (parallelizing the indexer across projects) is simply incomplete in that it does not respect that there needs to be some order in indexing projects. It is not really difficult to implement the missing piece.

Another path is to get rid of reusing indexes from referenced projects. The effect of this would be the duplication of index-information about files used from multiple dependent projects. While this makes indexing easier, it puts the burden on the clients working with the index-data because they have to deal with the redundant information. As written before, I am not convinced that it is important and we may want to change the indexer not to reuse those indexes. 

We may also end up with another preference setting, that allows for turning off reusing indexes from other projects. The parallelization would always work, but would work better when reusing indexes is turned off. 

Markus.


-----Original Message-----
From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Greg Watson
Sent: Tuesday, November 29, 2011 14:51
To: CDT General developers list.
Subject: Re: [cdt-dev] Parallelization of indexer

Hi,

Would it be possible to add this as an experimental indexer that could be enabled though the preferences? There used to be support for multiple indexers, but this seems to have been removed in CDT 8, presumably to avoid confusion. From the user's perspective, what's important is the speed and accuracy of the indexer. From the discussion, it sounds like the new indexer improves speed but reduces accuracy for some types of projects. I think users would be willing to give this a try if it was easy to enable/disable.

Regards,
Greg

On Nov 29, 2011, at 5:17 AM, Schorn, Markus wrote:

> Hi Volker!
> Ad 1)
> I am not interested in the discussion whether one should use a monolithic project or should split up the source into multiple projects. Both ways are a valid way of using CDT.
> 
> Ad 2) 
> We have a specific handling of project references in place (index of referenced project is reused). One can challenge this approach (I am not very convinced of this approach). The performance issue would be a way to start this challenge. However, we cannot simply change the behavior of CDT without a discussion and some analysis on the matter, and as long as we use this approach your patch has to honor it.
> 
> Ad 3) and 4)
> As you realized yourself in 4), parallelization on file level would solve 2), because than you can index the reference project before the dependent one and at every time you would do that in parallel on file-level. As always, there are pros and cons to each approach.
> 
> Ad 1)
> Right, the discussion on parallelization on file-level vs. project level can be done in parallel. 
> 
> Ad 2)
> You have two options: Either you make your patch honor project references, or you work towards changing CDT such that it ignores project references. For the latter a bugzilla on the performance issue may be your starting point. 
> 
> Markus.
> 
> -----Original Message-----
> From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Volker Diesel
> Sent: Monday, November 28, 2011 21:26
> To: cdt-dev@xxxxxxxxxxx
> Subject: [cdt-dev] Parallelization of indexer
> 
> Hi Markus.
> 
> Yes, this discussion has been somewhat frustrating and I still do not agree with your comments (and the comments of others about that topic) for several reasons.
> 1) Everyone involved in this discussion seems to believe, that it is possible to setup one monolithic Eclipse project for a large-scale and real-life C++ project, and that this is the "normal" use-case for CDT, and that parallelization of indexer jobs across Eclipse projects therefore doesn't help much. I already questioned that opinion in b#351659, because I cannot see, how you would setup one monolithic Eclipse project, if the sources require e.g. different sets of #define's or different include pathes (and source files in real-life projects normally do require this). You won't get correct indexer results in that case. But maybe, I missed some point...
> 2) In fact, my patch does not honour project references, but I explicitly asked, if that would be a real problem, and your reply was, that the only impact is potential replication of some of the symbols in multiple project indices. Therefore, my understanding was that there is no "real" issue, if parallelized indexer does not take project references into account.
> 3) I cannot see, how parallelization on file level instead of project level can resolve issue 2). If indexer job#1 indexes a file from project A and indexer job#2 indexes a file from project B, then you are back at the same point... Should these jobs honour references between project A and B? I cannot see any difference.
> 4) Only solution to problem 3) would be to limit parallel indexer jobs (on file level) to the set of files of one and the same project. In that case, there will be at least three other issues... First, when there are many projects with only a few files, parallelization will be poor. Second, at the end of the process of indexing each project, parallization will be poor, because there are more potential free CPU cores than there are files left to index in that single project. Third (and most important) all these parallel jobs on the files of one single project will run into lock contention on the project's index write lock. I already faced a similar issue with my patch and had to change some of the locking code to achieve enough throuput/CPU utilization with my approach.
> 
> Therefore, from my point of view...
> 1) Discussion about paralellization across projects vs. parallelization across source files is independent of the question of honouring project dependencies. Both approaches need either a fix for the performance issues mentioned in b#351659 or a decission not to honour project dependencies while indexing.
> 2) And yes, of course I could open another bugzilla about that performance issue, but what would that help? I already mentioned that issue, I captured jprof profiling information and attached the profiler data to b#351659, and I asked for someone to look into that data. Nothing happend. Why should that change, if I opened a second bugzilla and attached the same profiler data again?
> 
> Kind regards.
> Volker
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Date: Mon, 28 Nov 2011 06:40:34 +0000
> From: "Schorn, Markus" <Markus.Schorn@xxxxxxxxxxxxx>
> To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> Subject: Re: [cdt-dev] Parallelization of indexer
> Message-ID:
> <30D36C1BA62C5F4892C482E607D5E77E1FA57921@xxxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="us-ascii"
> 
> Hi Volker!
> I can understand your frustration, however there is an issue with the patch as provided in bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=351659. My view on the matter is the following:
> 
> (1) Your patch does not deal with indexing dependent projects. Currently the index of a project is reused by a dependent project. This requires the dependent project to be indexed after its dependencies. Your patch ignores this requirement.
> You have identified that indexing with dependencies does introduce a performance issue. This needs further investigation and may lead us to changing the indexer, such that it ignores the project references. However, before we have made such a decision, your patch cannot be applied.
> 
> (2) The approach of parallelizing indexing on project level does not help for large projects. I do agree with Sergey, that it would be more rewarding to make parallelization work on file-level.
> 
> 
> To move forward on the issue, I encourage you to open a new bug on the performance issue of dependent projects. We need a discussion on that and only if we drop the requirement of reusing the index of a dependent project we can go back to consider your patch.
> 
> In parallel it makes sense to think about parallelization of file-level. Because thinking long-term, this is the more promising approach the approach would find more traction.
> 
> Markus.
> 
> 
> -----Original Message-----
> From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Volker Diesel
> Sent: Friday, November 25, 2011 23:34
> To: CDT Dev
> Subject: [cdt-dev] (no subject)
> 
> Hello, everybody.
> There used to be some discussion about C/C++ indexer parallelization some months ago and (initially) most people agreed, that this would be a great feature.
> There is a patch in place, that brings down full C/C++ indexing time from 4hrs to 20mins in our project (see bug#351659).
> This patch has now been used in our team (200+ people, 10+Mio lines of C/C++ code) without any issue for several months.
> I provided a git patch for CDT master.
> I provided a git patch for CDT 8.
> I have not received any answer to my latest questions in the above mentioned bugzilla since months.
> I wonder, if anyone out there in CDT DEV is still interested in that topic?
> I wonder, how such an enhancement will finally find its way to any CDT codeline and what I can else do to bring this feature into official CDT release?
> If noone at CDT DEV is any longer interrested in that topic, please let me know. In that case I would simply close that useless bugzilla.
> Thanks and kind regards.
> Volker
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> 
> 
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev

_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev


Back to the top