Re: [cdt-dev] Parallelization of indexer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cdt-dev] Parallelization of indexer
From: "Schaefer, Doug" <Doug.Schaefer@xxxxxxxxxxxxx>
Date: Wed, 30 Nov 2011 00:16:59 +0000
Accept-language: en-US
Delivered-to: cdt-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/cdt-dev>
List-help: <mailto:cdt-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: AQHMru5DqkUfW15JnEKh6xFswuJwMZXEi3TQ
Thread-topic: [cdt-dev] Parallelization of indexer
Actually, to be honest, you're response was totally inappropriate. This is a community built on respect. Markus is a well-respected member of the CDT community and if he has concerns, we usually listen. We'll need to take a careful look at your proposal and validate what you are saying before proceeding further.

Doug.

> -----Original Message-----
> From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx]
> On Behalf Of Volker Diesel
> Sent: Tuesday, November 29, 2011 6:26 PM
> To: cdt-dev@xxxxxxxxxxx
> Subject: [cdt-dev] Parallelization of indexer
> 
> Sorry, but could anyone @cdt-dev PLEASE stop this kind of "WRONG
> INFORMATION POSTING" ASAP!!!
> 
> Markus today claims...
> >>>
> In the given case the new feature (parallelizing the indexer across projects) is
> simply incomplete in that it does not respect that there needs to be some
> order in indexing projects.
> <<<
> 
> THERE IS NO INCOMPLETENESS IN MY APPROACH!!!
> 
> In fact, my indexer approach DOES NOT honour project dependencies. This
> is, because Markus explicitly told me, that this is not an issue!!!
> I EXPLICITLY ASKED MARKUS (before I implemented my patch), if that would
> be an issue, and MARKUS EXPLICITLY ANSWERED with "NO"!!! See history of
> this mail thread and see b#351659.
> 
> Now suddenly, an incompleteness seems to have appeared in Markus' mind
> and I would be glad to know, what kind of incompleteness it is, that
> (according to his own statements) didn't exist six months ago!!!
> 
> AND ONCE AGAIN... WE ARE USING MY PATCH IN OUR TEAM SINCE
> MONTH... AND THERE IS NO "INCOMPLETENESS" OR "FUNCTIONAL
> DIFFERENCE" BETWEEN THE INDEX GENERATED BY MY PATCH AND THE INDEX
> GENERATED BY OFFICIAL CDT8 (FROM AN END-USERS POINT OF VIEW)!!!
> 
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: cdt-dev-request@xxxxxxxxxxx
> Gesendet: Nov 29, 2011 11:26:12 PM
> An: cdt-dev@xxxxxxxxxxx
> Betreff: cdt-dev Digest, Vol 81, Issue 35
> 
> Send cdt-dev mailing list submissions to cdt-dev@xxxxxxxxxxx
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> or, via email, send a message with subject or body 'help' to cdt-dev-
> request@xxxxxxxxxxx
> 
> You can reach the person managing the list at cdt-dev-owner@xxxxxxxxxxx
> 
> When replying, please edit your Subject line so it is more specific than "Re:
> Contents of cdt-dev digest..."
> 
> 
> Today's Topics:
> 
> 1. Re: CDT DSF-GDB (Marc Khouzam)
> 2. Re: Parallelization of indexer (Greg Watson) 3. Parallelization of indexer
> (Volker Diesel)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 29 Nov 2011 13:10:54 -0500
> From: Marc Khouzam <marc.khouzam@xxxxxxxxxxxx>
> To: "'subhashchandranv@xxxxxxxxxxxxxxx'"
> <subhashchandranv@xxxxxxxxxxxxxxx>, "'CDT General developers list.'"
> <cdt-dev@xxxxxxxxxxx>
> Subject: Re: [cdt-dev] CDT DSF-GDB
> Message-ID:
> <F7CE05678329534C957159168FA70DEC578CBC2B95@EUSAACMS0703.eamcs
> .ericsson.se>
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx
> > [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of subhashchandranv
> > Sent: Friday, November 18, 2011 3:38 AM
> > To: cdt-dev@xxxxxxxxxxx
> > Subject: [cdt-dev] CDT DSF-GDB
> >
> > Hello,
> 
> Hi,
> 
> sorry for the dealy, I was on a business trip.
> 
> > I've been working on understanding DSF since past one month by going
> > through the example plugins which include the Timer's Example and PDA:
> > http://help.eclipse.org/indigo/index.jsp?topic=/org.eclipse.cd
> > t.doc.isv/guide/dsf/intro/dsf_programming_intro.html
> >
> > I believe DSF-GDB is where DSF has been implemented and I saw the
> > working of it in a flash video : http://live.eclipse.org/node/568.
> >
> > I tried to simulate the same into my Eclipse Indigo and it was not
> > happening.
> >
> > I got to know that, "org.eclipse.dd.mi" , "org.eclipse.dd.gdb",
> > "org.eclipse.dd.gdb.ui" are now renamed as "org.eclipse.cdt.dsf" and
> > "org.eclipse.cdt.dsf.ui" respectively.
> > but, I couldn find renamed plugins of follwing two plugins,
> >
> >
> > * org.eclipse.dd.gdb.launch
> > * org.eclipse.dd.gdb.launch.ui
> >
> > Please let me know the necessary plugins of DSF-GDB to check out from
> > CVS, so that I can build them in my Eclipse Indigo to learn better
> > about DSF.
> 
> We no longer use CVS, instead we use Git.
> http://wiki.eclipse.org/Getting_started_with_CDT_development
> 
> The plugins from the DD project have been combined into four main plugins
> (not including tests or examples):
> 
> DSF:
> org.eclipse.cdt.dsf
> org.eclipse.cdt.dsf.ui
> DSF-GDB
> org.eclipse.cdt.dsf.gdb
> org.eclipse.cdt.dsf.gdb.ui
> 
> > I'm also curious to know if there's any other project in eclipse where
> > DSF is implemented. If yes, please help me by providing the plugin
> > links.
> 
> EDC also uses DSF. It is part fo its own Git repository:
> org.eclipse.cdt.edc.git
> 
> Marc
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 29 Nov 2011 14:02:59 -0500
> From: Greg Watson <g.watson@xxxxxxxxxxxx>
> To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> Subject: Re: [cdt-dev] Parallelization of indexer
> Message-ID: <541F1601-DAA5-4871-9669-16518B541363@xxxxxxxxxxxx>
> Content-Type: text/plain; charset=iso-8859-1
> 
> Hi Markus,
> 
> Sounds reasonable to me. I hope the we see the new indexer fully
> implemented at some point.
> 
> Cheers,
> Greg
> 
> On Nov 29, 2011, at 9:57 AM, Schorn, Markus wrote:
> 
> > Hi Greg,
> > There is still support for multiple indexers, however the UI for that does
> not show up as long as you do not supply an alternative indexer. Different to
> earlier days of CDT, it is no longer simple to provide an alternative indexer
> that can come close to the one that is built into CDT. Therefore the usual path
> for a new indexer feature is to make it part of the existing indexer. Clearly
> such a feature can be made dependent on a preference setting.
> >
> > Whatever feature goes into CDT causes bug reports and when it comes to
> dealing with issues the enthusiasm of contributors and also committers is
> limited. I have quite a list of annoying 'experimental' features in CDT that
> simply don't work correctly (and probably never will). I do think it is a good
> idea to discuss and analyze the impact of new features before putting them
> into CDT.
> >
> > In the given case the new feature (parallelizing the indexer across projects)
> is simply incomplete in that it does not respect that there needs to be some
> order in indexing projects. It is not really difficult to implement the missing
> piece.
> >
> > Another path is to get rid of reusing indexes from referenced projects. The
> effect of this would be the duplication of index-information about files used
> from multiple dependent projects. While this makes indexing easier, it puts
> the burden on the clients working with the index-data because they have to
> deal with the redundant information. As written before, I am not convinced
> that it is important and we may want to change the indexer not to reuse
> those indexes.
> >
> > We may also end up with another preference setting, that allows for
> turning off reusing indexes from other projects. The parallelization would
> always work, but would work better when reusing indexes is turned off.
> >
> > Markus.
> >
> >
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-
> bounces@xxxxxxxxxxx]
> > On Behalf Of Greg Watson
> > Sent: Tuesday, November 29, 2011 14:51
> > To: CDT General developers list.
> > Subject: Re: [cdt-dev] Parallelization of indexer
> >
> > Hi,
> >
> > Would it be possible to add this as an experimental indexer that could be
> enabled though the preferences? There used to be support for multiple
> indexers, but this seems to have been removed in CDT 8, presumably to
> avoid confusion. From the user's perspective, what's important is the speed
> and accuracy of the indexer. From the discussion, it sounds like the new
> indexer improves speed but reduces accuracy for some types of projects. I
> think users would be willing to give this a try if it was easy to enable/disable.
> >
> > Regards,
> > Greg
> >
> > On Nov 29, 2011, at 5:17 AM, Schorn, Markus wrote:
> >
> >> Hi Volker!
> >> Ad 1)
> >> I am not interested in the discussion whether one should use a monolithic
> project or should split up the source into multiple projects. Both ways are a
> valid way of using CDT.
> >>
> >> Ad 2)
> >> We have a specific handling of project references in place (index of
> referenced project is reused). One can challenge this approach (I am not very
> convinced of this approach). The performance issue would be a way to start
> this challenge. However, we cannot simply change the behavior of CDT
> without a discussion and some analysis on the matter, and as long as we use
> this approach your patch has to honor it.
> >>
> >> Ad 3) and 4)
> >> As you realized yourself in 4), parallelization on file level would solve 2),
> because than you can index the reference project before the dependent
> one and at every time you would do that in parallel on file-level. As always,
> there are pros and cons to each approach.
> >>
> >> Ad 1)
> >> Right, the discussion on parallelization on file-level vs. project level can be
> done in parallel.
> >>
> >> Ad 2)
> >> You have two options: Either you make your patch honor project
> references, or you work towards changing CDT such that it ignores project
> references. For the latter a bugzilla on the performance issue may be your
> starting point.
> >>
> >> Markus.
> >>
> >> -----Original Message-----
> >> From: cdt-dev-bounces@xxxxxxxxxxx
> >> [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Volker Diesel
> >> Sent: Monday, November 28, 2011 21:26
> >> To: cdt-dev@xxxxxxxxxxx
> >> Subject: [cdt-dev] Parallelization of indexer
> >>
> >> Hi Markus.
> >>
> >> Yes, this discussion has been somewhat frustrating and I still do not agree
> with your comments (and the comments of others about that topic) for
> several reasons.
> >> 1) Everyone involved in this discussion seems to believe, that it is possible
> to setup one monolithic Eclipse project for a large-scale and real-life C++
> project, and that this is the "normal" use-case for CDT, and that
> parallelization of indexer jobs across Eclipse projects therefore doesn't help
> much. I already questioned that opinion in b#351659, because I cannot see,
> how you would setup one monolithic Eclipse project, if the sources require
> e.g. different sets of #define's or different include pathes (and source files in
> real-life projects normally do require this). You won't get correct indexer
> results in that case. But maybe, I missed some point...
> >> 2) In fact, my patch does not honour project references, but I explicitly
> asked, if that would be a real problem, and your reply was, that the only
> impact is potential replication of some of the symbols in multiple project
> indices. Therefore, my understanding was that there is no "real" issue, if
> parallelized indexer does not take project references into account.
> >> 3) I cannot see, how parallelization on file level instead of project level can
> resolve issue 2). If indexer job#1 indexes a file from project A and indexer
> job#2 indexes a file from project B, then you are back at the same point...
> Should these jobs honour references between project A and B? I cannot see
> any difference.
> >> 4) Only solution to problem 3) would be to limit parallel indexer jobs (on
> file level) to the set of files of one and the same project. In that case, there
> will be at least three other issues... First, when there are many projects with
> only a few files, parallelization will be poor. Second, at the end of the process
> of indexing each project, parallization will be poor, because there are more
> potential free CPU cores than there are files left to index in that single
> project. Third (and most important) all these parallel jobs on the files of one
> single project will run into lock contention on the project's index write lock. I
> already faced a similar issue with my patch and had to change some of the
> locking code to achieve enough throuput/CPU utilization with my approach.
> >>
> >> Therefore, from my point of view...
> >> 1) Discussion about paralellization across projects vs. parallelization across
> source files is independent of the question of honouring project
> dependencies. Both approaches need either a fix for the performance issues
> mentioned in b#351659 or a decission not to honour project dependencies
> while indexing.
> >> 2) And yes, of course I could open another bugzilla about that
> performance issue, but what would that help? I already mentioned that
> issue, I captured jprof profiling information and attached the profiler data to
> b#351659, and I asked for someone to look into that data. Nothing happend.
> Why should that change, if I opened a second bugzilla and attached the same
> profiler data again?
> >>
> >> Kind regards.
> >> Volker
> >>
> >>
> >>
> >> -----Urspr?ngliche Nachricht-----
> >> Date: Mon, 28 Nov 2011 06:40:34 +0000
> >> From: "Schorn, Markus" <Markus.Schorn@xxxxxxxxxxxxx>
> >> To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> >> Subject: Re: [cdt-dev] Parallelization of indexer
> >> Message-ID:
> >> <30D36C1BA62C5F4892C482E607D5E77E1FA57921@ALA-
> MBB.corp.ad.wrs.com>
> >> Content-Type: text/plain; charset="us-ascii"
> >>
> >> Hi Volker!
> >> I can understand your frustration, however there is an issue with the
> patch as provided in bug
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=351659. My view on the
> matter is the following:
> >>
> >> (1) Your patch does not deal with indexing dependent projects. Currently
> the index of a project is reused by a dependent project. This requires the
> dependent project to be indexed after its dependencies. Your patch ignores
> this requirement.
> >> You have identified that indexing with dependencies does introduce a
> performance issue. This needs further investigation and may lead us to
> changing the indexer, such that it ignores the project references. However,
> before we have made such a decision, your patch cannot be applied.
> >>
> >> (2) The approach of parallelizing indexing on project level does not help
> for large projects. I do agree with Sergey, that it would be more rewarding to
> make parallelization work on file-level.
> >>
> >>
> >> To move forward on the issue, I encourage you to open a new bug on the
> performance issue of dependent projects. We need a discussion on that and
> only if we drop the requirement of reusing the index of a dependent project
> we can go back to consider your patch.
> >>
> >> In parallel it makes sense to think about parallelization of file-level.
> Because thinking long-term, this is the more promising approach the
> approach would find more traction.
> >>
> >> Markus.
> >>
> >>
> >> -----Original Message-----
> >> From: cdt-dev-bounces@xxxxxxxxxxx
> >> [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Volker Diesel
> >> Sent: Friday, November 25, 2011 23:34
> >> To: CDT Dev
> >> Subject: [cdt-dev] (no subject)
> >>
> >> Hello, everybody.
> >> There used to be some discussion about C/C++ indexer parallelization
> some months ago and (initially) most people agreed, that this would be a
> great feature.
> >> There is a patch in place, that brings down full C/C++ indexing time from
> 4hrs to 20mins in our project (see bug#351659).
> >> This patch has now been used in our team (200+ people, 10+Mio lines of
> C/C++ code) without any issue for several months.
> >> I provided a git patch for CDT master.
> >> I provided a git patch for CDT 8.
> >> I have not received any answer to my latest questions in the above
> mentioned bugzilla since months.
> >> I wonder, if anyone out there in CDT DEV is still interested in that topic?
> >> I wonder, how such an enhancement will finally find its way to any CDT
> codeline and what I can else do to bring this feature into official CDT release?
> >> If noone at CDT DEV is any longer interrested in that topic, please let me
> know. In that case I would simply close that useless bugzilla.
> >> Thanks and kind regards.
> >> Volker
> >> _______________________________________________
> >> cdt-dev mailing list
> >> cdt-dev@xxxxxxxxxxx
> >> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> >>
> >>
> >>
> >> _______________________________________________
> >> cdt-dev mailing list
> >> cdt-dev@xxxxxxxxxxx
> >> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> >
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 29 Nov 2011 23:26:09 +0100 (CET)
> From: "Volker Diesel" <volker.diesel@xxxxxx>
> To: cdt-dev@xxxxxxxxxxx
> Subject: [cdt-dev] Parallelization of indexer
> Message-ID:
> <1080985477.5434381.1322605569215.JavaMail.fmail@mwmweb012>
> Content-Type: text/plain; charset="UTF-8"
> 
> 1) About functionality of parallel indexer (Greg) No! My parallel indexer
> patch does NOT introduce any functional limitations from an end-user point
> of view. If you search the index generated by my patch, it will give you
> exactly the same results as the index generated by original CDT8 (at least
> that's what we experienced in our team during the last half year or so). The
> only thing an end-user notices is, that index generation is one or two orders
> of magnitude faster than the original CDT8 indexer (depending on how many
> CDT projects you have in your workspace, and depending -of course- on your
> hardware). And indexes might consume more disk space (depending on your
> project configuration).
> 
> 2) About plugging in alternative indexers (Greg) As far as I can tell from my
> (poor) knowledge of CDT, options for easily plugging in alternative indexers
> have (unfortunately) been removed from CDT. According to documentation,
> there is an extension to do so, but I could not find any code, that implements
> that extension. If there were such an easy extension, I wouldn't need to go
> through all these discussions. I could simply publish my own indexer
> extension and anyone who likes it, could use it... free market, so to say:-)
> Unfortunately, CDT doesn't offer this (at least as far as I can tell).
> 
> 3) About monolithic project setups
> I do NOT consider this a minor topic or not worth to discuss, because if there
> is NO WAY to setup such a monolithic project, then discussion about whether
> going for parallelization on file level vs. going for parallelization on project
> level can be stopped immediately! If there are situations, where multiple
> CDT projects MUST be configured, we need a parallelization approach, that
> honours the fact of multiple (and maybe many) projects appropriately.
> So I am asking anyone @cdt-dev once again to explain, if and how it is
> possible to setup one monolithic CDT project, if you have different source
> files that require e.g. different sets of #define's or different include pathes
> (and if you expect correct indexer search results).
> 
> 4) About parallelization on file level
> I don't like to be cited wrong, therefore once again and for clarification, what
> I ment to really tell with point 4) of my last posting...
> a) Parallel indexing on file level DOES NOT SOLVE ANY SINGLE PROBLEM, that
> has not already been solved with my patch!
> b) The problem of honouring project dependencies needs to be solved, no
> matter if parallelization is done on file level or on project level. And once this
> is solved, this solution is as valid for file-level parallelization as it is for my
> already implemented project-level approach!
> c) If (b) is "hacked" by only parallizing indexing of files in ONE SINGLE project,
> THIS WILL NOT BE A SOLUTION, but will instead INTRODUCE EVEN MORE
> COMPLICATED performance and parallelization issues (THIS IS WHAT I
> CLEARLY SAID IN MY LAST POSTING). I mentioned e.g. lock contention on
> index write lock as one issue, which can only be solved by quite complex
> refactoring of index locking code... anyone out there, to do that job within
> this decade???
> So, point 4) of my last posting is A CLEAR STATEMENT AGAINST parallelization
> on file level (because this approach does not solve any problem, that hasn't
> already been solved with my patch), introduces only a bulk of new
> parallelization issues and bottle necks and should therefore please no longer
> be missused as an argument to GO FOR parallelization on file level (at least as
> long as no solutions for the problems mentioned above, are explicitly given).
> Thanks.
> 
> 5) If it helps to kick off cdt-dev administration, I will open a new bugzilla
> about indexer and project references and related performance issues, copy-
> paste the problem description (already available since months) from here to
> there and re-attach jperf files (already available since months) from here to
> there... and will then once again ask someone @cdt-dev to PLEASE, PLEASE,
> PLEASE have a look at these performance issues, because I do not know
> enough about CDT to tell, what's wrong there, and because the REAL
> technical issue does not appear or disappear, simply because a new bugzilla is
> opened or not opened!
> 
> 
> 
> 
> -----Urspr?ngliche Nachricht-----
> Von: cdt-dev-request@xxxxxxxxxxx
> Gesendet: Nov 29, 2011 6:00:06 PM
> An: cdt-dev@xxxxxxxxxxx
> Betreff: cdt-dev Digest, Vol 81, Issue 34
> 
> Send cdt-dev mailing list submissions to cdt-dev@xxxxxxxxxxx
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> or, via email, send a message with subject or body 'help' to cdt-dev-
> request@xxxxxxxxxxx
> 
> You can reach the person managing the list at cdt-dev-owner@xxxxxxxxxxx
> 
> When replying, please edit your Subject line so it is more specific than "Re:
> Contents of cdt-dev digest..."
> 
> 
> Today's Topics:
> 
> 1. Re: Parallelization of indexer (Greg Watson) 2. Re: Parallelization of indexer
> (Schorn, Markus)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 29 Nov 2011 08:51:09 -0500
> From: Greg Watson <g.watson@xxxxxxxxxxxx>
> To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> Subject: Re: [cdt-dev] Parallelization of indexer
> Message-ID: <5F0F2E77-6A61-4D6C-9E81-AA9249B5520E@xxxxxxxxxxxx>
> Content-Type: text/plain; charset=iso-8859-1
> 
> Hi,
> 
> Would it be possible to add this as an experimental indexer that could be
> enabled though the preferences? There used to be support for multiple
> indexers, but this seems to have been removed in CDT 8, presumably to
> avoid confusion. From the user's perspective, what's important is the speed
> and accuracy of the indexer. From the discussion, it sounds like the new
> indexer improves speed but reduces accuracy for some types of projects. I
> think users would be willing to give this a try if it was easy to enable/disable.
> 
> Regards,
> Greg
> 
> On Nov 29, 2011, at 5:17 AM, Schorn, Markus wrote:
> 
> > Hi Volker!
> > Ad 1)
> > I am not interested in the discussion whether one should use a monolithic
> project or should split up the source into multiple projects. Both ways are a
> valid way of using CDT.
> >
> > Ad 2)
> > We have a specific handling of project references in place (index of
> referenced project is reused). One can challenge this approach (I am not very
> convinced of this approach). The performance issue would be a way to start
> this challenge. However, we cannot simply change the behavior of CDT
> without a discussion and some analysis on the matter, and as long as we use
> this approach your patch has to honor it.
> >
> > Ad 3) and 4)
> > As you realized yourself in 4), parallelization on file level would solve 2),
> because than you can index the reference project before the dependent
> one and at every time you would do that in parallel on file-level. As always,
> there are pros and cons to each approach.
> >
> > Ad 1)
> > Right, the discussion on parallelization on file-level vs. project level can be
> done in parallel.
> >
> > Ad 2)
> > You have two options: Either you make your patch honor project
> references, or you work towards changing CDT such that it ignores project
> references. For the latter a bugzilla on the performance issue may be your
> starting point.
> >
> > Markus.
> >
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-
> bounces@xxxxxxxxxxx]
> > On Behalf Of Volker Diesel
> > Sent: Monday, November 28, 2011 21:26
> > To: cdt-dev@xxxxxxxxxxx
> > Subject: [cdt-dev] Parallelization of indexer
> >
> > Hi Markus.
> >
> > Yes, this discussion has been somewhat frustrating and I still do not agree
> with your comments (and the comments of others about that topic) for
> several reasons.
> > 1) Everyone involved in this discussion seems to believe, that it is possible
> to setup one monolithic Eclipse project for a large-scale and real-life C++
> project, and that this is the "normal" use-case for CDT, and that
> parallelization of indexer jobs across Eclipse projects therefore doesn't help
> much. I already questioned that opinion in b#351659, because I cannot see,
> how you would setup one monolithic Eclipse project, if the sources require
> e.g. different sets of #define's or different include pathes (and source files in
> real-life projects normally do require this). You won't get correct indexer
> results in that case. But maybe, I missed some point...
> > 2) In fact, my patch does not honour project references, but I explicitly
> asked, if that would be a real problem, and your reply was, that the only
> impact is potential replication of some of the symbols in multiple project
> indices. Therefore, my understanding was that there is no "real" issue, if
> parallelized indexer does not take project references into account.
> > 3) I cannot see, how parallelization on file level instead of project level can
> resolve issue 2). If indexer job#1 indexes a file from project A and indexer
> job#2 indexes a file from project B, then you are back at the same point...
> Should these jobs honour references between project A and B? I cannot see
> any difference.
> > 4) Only solution to problem 3) would be to limit parallel indexer jobs (on file
> level) to the set of files of one and the same project. In that case, there will
> be at least three other issues... First, when there are many projects with only
> a few files, parallelization will be poor. Second, at the end of the process of
> indexing each project, parallization will be poor, because there are more
> potential free CPU cores than there are files left to index in that single
> project. Third (and most important) all these parallel jobs on the files of one
> single project will run into lock contention on the project's index write lock. I
> already faced a similar issue with my patch and had to change some of the
> locking code to achieve enough throuput/CPU utilization with my approach.
> >
> > Therefore, from my point of view...
> > 1) Discussion about paralellization across projects vs. parallelization across
> source files is independent of the question of honouring project
> dependencies. Both approaches need either a fix for the performance issues
> mentioned in b#351659 or a decission not to honour project dependencies
> while indexing.
> > 2) And yes, of course I could open another bugzilla about that performance
> issue, but what would that help? I already mentioned that issue, I captured
> jprof profiling information and attached the profiler data to b#351659, and I
> asked for someone to look into that data. Nothing happend. Why should that
> change, if I opened a second bugzilla and attached the same profiler data
> again?
> >
> > Kind regards.
> > Volker
> >
> >
> >
> > -----Urspr?ngliche Nachricht-----
> > Date: Mon, 28 Nov 2011 06:40:34 +0000
> > From: "Schorn, Markus" <Markus.Schorn@xxxxxxxxxxxxx>
> > To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> > Subject: Re: [cdt-dev] Parallelization of indexer
> > Message-ID:
> > <30D36C1BA62C5F4892C482E607D5E77E1FA57921@ALA-
> MBB.corp.ad.wrs.com>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Hi Volker!
> > I can understand your frustration, however there is an issue with the patch
> as provided in bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=351659.
> My view on the matter is the following:
> >
> > (1) Your patch does not deal with indexing dependent projects. Currently
> the index of a project is reused by a dependent project. This requires the
> dependent project to be indexed after its dependencies. Your patch ignores
> this requirement.
> > You have identified that indexing with dependencies does introduce a
> performance issue. This needs further investigation and may lead us to
> changing the indexer, such that it ignores the project references. However,
> before we have made such a decision, your patch cannot be applied.
> >
> > (2) The approach of parallelizing indexing on project level does not help for
> large projects. I do agree with Sergey, that it would be more rewarding to
> make parallelization work on file-level.
> >
> >
> > To move forward on the issue, I encourage you to open a new bug on the
> performance issue of dependent projects. We need a discussion on that and
> only if we drop the requirement of reusing the index of a dependent project
> we can go back to consider your patch.
> >
> > In parallel it makes sense to think about parallelization of file-level. Because
> thinking long-term, this is the more promising approach the approach would
> find more traction.
> >
> > Markus.
> >
> >
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-
> bounces@xxxxxxxxxxx]
> > On Behalf Of Volker Diesel
> > Sent: Friday, November 25, 2011 23:34
> > To: CDT Dev
> > Subject: [cdt-dev] (no subject)
> >
> > Hello, everybody.
> > There used to be some discussion about C/C++ indexer parallelization
> some months ago and (initially) most people agreed, that this would be a
> great feature.
> > There is a patch in place, that brings down full C/C++ indexing time from
> 4hrs to 20mins in our project (see bug#351659).
> > This patch has now been used in our team (200+ people, 10+Mio lines of
> C/C++ code) without any issue for several months.
> > I provided a git patch for CDT master.
> > I provided a git patch for CDT 8.
> > I have not received any answer to my latest questions in the above
> mentioned bugzilla since months.
> > I wonder, if anyone out there in CDT DEV is still interested in that topic?
> > I wonder, how such an enhancement will finally find its way to any CDT
> codeline and what I can else do to bring this feature into official CDT release?
> > If noone at CDT DEV is any longer interrested in that topic, please let me
> know. In that case I would simply close that useless bugzilla.
> > Thanks and kind regards.
> > Volker
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> >
> >
> >
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 29 Nov 2011 14:57:26 +0000
> From: "Schorn, Markus" <Markus.Schorn@xxxxxxxxxxxxx>
> To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> Subject: Re: [cdt-dev] Parallelization of indexer
> Message-ID:
> <30D36C1BA62C5F4892C482E607D5E77E1FA57C81@ALA-
> MBB.corp.ad.wrs.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi Greg,
> There is still support for multiple indexers, however the UI for that does not
> show up as long as you do not supply an alternative indexer. Different to
> earlier days of CDT, it is no longer simple to provide an alternative indexer
> that can come close to the one that is built into CDT. Therefore the usual path
> for a new indexer feature is to make it part of the existing indexer. Clearly
> such a feature can be made dependent on a preference setting.
> 
> Whatever feature goes into CDT causes bug reports and when it comes to
> dealing with issues the enthusiasm of contributors and also committers is
> limited. I have quite a list of annoying 'experimental' features in CDT that
> simply don't work correctly (and probably never will). I do think it is a good
> idea to discuss and analyze the impact of new features before putting them
> into CDT.
> 
> In the given case the new feature (parallelizing the indexer across projects) is
> simply incomplete in that it does not respect that there needs to be some
> order in indexing projects. It is not really difficult to implement the missing
> piece.
> 
> Another path is to get rid of reusing indexes from referenced projects. The
> effect of this would be the duplication of index-information about files used
> from multiple dependent projects. While this makes indexing easier, it puts
> the burden on the clients working with the index-data because they have to
> deal with the redundant information. As written before, I am not convinced
> that it is important and we may want to change the indexer not to reuse
> those indexes.
> 
> We may also end up with another preference setting, that allows for turning
> off reusing indexes from other projects. The parallelization would always
> work, but would work better when reusing indexes is turned off.
> 
> Markus.
> 
> 
> -----Original Message-----
> From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx]
> On Behalf Of Greg Watson
> Sent: Tuesday, November 29, 2011 14:51
> To: CDT General developers list.
> Subject: Re: [cdt-dev] Parallelization of indexer
> 
> Hi,
> 
> Would it be possible to add this as an experimental indexer that could be
> enabled though the preferences? There used to be support for multiple
> indexers, but this seems to have been removed in CDT 8, presumably to
> avoid confusion. From the user's perspective, what's important is the speed
> and accuracy of the indexer. From the discussion, it sounds like the new
> indexer improves speed but reduces accuracy for some types of projects. I
> think users would be willing to give this a try if it was easy to enable/disable.
> 
> Regards,
> Greg
> 
> On Nov 29, 2011, at 5:17 AM, Schorn, Markus wrote:
> 
> > Hi Volker!
> > Ad 1)
> > I am not interested in the discussion whether one should use a monolithic
> project or should split up the source into multiple projects. Both ways are a
> valid way of using CDT.
> >
> > Ad 2)
> > We have a specific handling of project references in place (index of
> referenced project is reused). One can challenge this approach (I am not very
> convinced of this approach). The performance issue would be a way to start
> this challenge. However, we cannot simply change the behavior of CDT
> without a discussion and some analysis on the matter, and as long as we use
> this approach your patch has to honor it.
> >
> > Ad 3) and 4)
> > As you realized yourself in 4), parallelization on file level would solve 2),
> because than you can index the reference project before the dependent
> one and at every time you would do that in parallel on file-level. As always,
> there are pros and cons to each approach.
> >
> > Ad 1)
> > Right, the discussion on parallelization on file-level vs. project level can be
> done in parallel.
> >
> > Ad 2)
> > You have two options: Either you make your patch honor project
> references, or you work towards changing CDT such that it ignores project
> references. For the latter a bugzilla on the performance issue may be your
> starting point.
> >
> > Markus.
> >
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-
> bounces@xxxxxxxxxxx]
> > On Behalf Of Volker Diesel
> > Sent: Monday, November 28, 2011 21:26
> > To: cdt-dev@xxxxxxxxxxx
> > Subject: [cdt-dev] Parallelization of indexer
> >
> > Hi Markus.
> >
> > Yes, this discussion has been somewhat frustrating and I still do not agree
> with your comments (and the comments of others about that topic) for
> several reasons.
> > 1) Everyone involved in this discussion seems to believe, that it is possible
> to setup one monolithic Eclipse project for a large-scale and real-life C++
> project, and that this is the "normal" use-case for CDT, and that
> parallelization of indexer jobs across Eclipse projects therefore doesn't help
> much. I already questioned that opinion in b#351659, because I cannot see,
> how you would setup one monolithic Eclipse project, if the sources require
> e.g. different sets of #define's or different include pathes (and source files in
> real-life projects normally do require this). You won't get correct indexer
> results in that case. But maybe, I missed some point...
> > 2) In fact, my patch does not honour project references, but I explicitly
> asked, if that would be a real problem, and your reply was, that the only
> impact is potential replication of some of the symbols in multiple project
> indices. Therefore, my understanding was that there is no "real" issue, if
> parallelized indexer does not take project references into account.
> > 3) I cannot see, how parallelization on file level instead of project level can
> resolve issue 2). If indexer job#1 indexes a file from project A and indexer
> job#2 indexes a file from project B, then you are back at the same point...
> Should these jobs honour references between project A and B? I cannot see
> any difference.
> > 4) Only solution to problem 3) would be to limit parallel indexer jobs (on file
> level) to the set of files of one and the same project. In that case, there will
> be at least three other issues... First, when there are many projects with only
> a few files, parallelization will be poor. Second, at the end of the process of
> indexing each project, parallization will be poor, because there are more
> potential free CPU cores than there are files left to index in that single
> project. Third (and most important) all these parallel jobs on the files of one
> single project will run into lock contention on the project's index write lock. I
> already faced a similar issue with my patch and had to change some of the
> locking code to achieve enough throuput/CPU utilization with my approach.
> >
> > Therefore, from my point of view...
> > 1) Discussion about paralellization across projects vs. parallelization across
> source files is independent of the question of honouring project
> dependencies. Both approaches need either a fix for the performance issues
> mentioned in b#351659 or a decission not to honour project dependencies
> while indexing.
> > 2) And yes, of course I could open another bugzilla about that performance
> issue, but what would that help? I already mentioned that issue, I captured
> jprof profiling information and attached the profiler data to b#351659, and I
> asked for someone to look into that data. Nothing happend. Why should that
> change, if I opened a second bugzilla and attached the same profiler data
> again?
> >
> > Kind regards.
> > Volker
> >
> >
> >
> > -----Urspr?ngliche Nachricht-----
> > Date: Mon, 28 Nov 2011 06:40:34 +0000
> > From: "Schorn, Markus" <Markus.Schorn@xxxxxxxxxxxxx>
> > To: "CDT General developers list." <cdt-dev@xxxxxxxxxxx>
> > Subject: Re: [cdt-dev] Parallelization of indexer
> > Message-ID:
> > <30D36C1BA62C5F4892C482E607D5E77E1FA57921@ALA-
> MBB.corp.ad.wrs.com>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Hi Volker!
> > I can understand your frustration, however there is an issue with the patch
> as provided in bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=351659.
> My view on the matter is the following:
> >
> > (1) Your patch does not deal with indexing dependent projects. Currently
> the index of a project is reused by a dependent project. This requires the
> dependent project to be indexed after its dependencies. Your patch ignores
> this requirement.
> > You have identified that indexing with dependencies does introduce a
> performance issue. This needs further investigation and may lead us to
> changing the indexer, such that it ignores the project references. However,
> before we have made such a decision, your patch cannot be applied.
> >
> > (2) The approach of parallelizing indexing on project level does not help for
> large projects. I do agree with Sergey, that it would be more rewarding to
> make parallelization work on file-level.
> >
> >
> > To move forward on the issue, I encourage you to open a new bug on the
> performance issue of dependent projects. We need a discussion on that and
> only if we drop the requirement of reusing the index of a dependent project
> we can go back to consider your patch.
> >
> > In parallel it makes sense to think about parallelization of file-level. Because
> thinking long-term, this is the more promising approach the approach would
> find more traction.
> >
> > Markus.
> >
> >
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-
> bounces@xxxxxxxxxxx]
> > On Behalf Of Volker Diesel
> > Sent: Friday, November 25, 2011 23:34
> > To: CDT Dev
> > Subject: [cdt-dev] (no subject)
> >
> > Hello, everybody.
> > There used to be some discussion about C/C++ indexer parallelization
> some months ago and (initially) most people agreed, that this would be a
> great feature.
> > There is a patch in place, that brings down full C/C++ indexing time from
> 4hrs to 20mins in our project (see bug#351659).
> > This patch has now been used in our team (200+ people, 10+Mio lines of
> C/C++ code) without any issue for several months.
> > I provided a git patch for CDT master.
> > I provided a git patch for CDT 8.
> > I have not received any answer to my latest questions in the above
> mentioned bugzilla since months.
> > I wonder, if anyone out there in CDT DEV is still interested in that topic?
> > I wonder, how such an enhancement will finally find its way to any CDT
> codeline and what I can else do to bring this feature into official CDT release?
> > If noone at CDT DEV is any longer interrested in that topic, please let me
> know. In that case I would simply close that useless bugzilla.
> > Thanks and kind regards.
> > Volker
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> >
> >
> >
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> 
> ------------------------------
> 
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> 
> End of cdt-dev Digest, Vol 81, Issue 34
> ***************************************
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
> 
> 
> End of cdt-dev Digest, Vol 81, Issue 35
> ***************************************
> 
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
References:
- [cdt-dev] Parallelization of indexer
  - From: Volker Diesel
Prev by Date: [cdt-dev] Parallelization of indexer
Next by Date: Re: [cdt-dev] Parallelization of indexer
Previous by thread: [cdt-dev] Parallelization of indexer
Next by thread: Re: [cdt-dev] Parallelization of indexer
Index(es):
- Date
- Thread
Breadcrumbs