Bug 315539 - Make it easier to add other languages to CDT
Summary: Make it easier to add other languages to CDT
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 6.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on: 315540 315541 162806 315542
Blocks: 68083
  Show dependency tree
 
Reported: 2010-06-03 04:44 EDT by Alex Blewitt CLA
Modified: 2020-09-04 15:22 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Blewitt CLA 2010-06-03 04:44:58 EDT
Build Identifier: 

There are a number of assumptions and restrictions in adding a new language to CDT, not the least of which involves changes to core code. This bug can act as a parent of those issues.

Reproducible: Always
Comment 1 Alex Blewitt CLA 2010-06-03 05:00:56 EDT
Doug is working on CDTAntlr to try and help add other languages to CDT

http://code.google.com/a/eclipselabs.org/p/cdtantlr/
Comment 2 Alex Blewitt CLA 2010-06-09 02:34:50 EDT
Comment from bug 315540 comment 5 (Doug Schaefer)

My point is that [adding an additional language like] Objective-C could work because 
you can tweak the existing parsers to add it in.

But if you have a new language that requires a new parser, then it's too much
work, if not impossible to add it in. The CDT AST infrastructure is very tied
to the parsing style we used. And I don't expect other people to write parsers
that way.

So while I had hoped that we created the CDT DOM (AST, Binding, and Index) so
that it can be used by other languages, the optimizations we've done over the
years broke that, if it was ever properly built to begin with.

So for other new languages, we need to create a new multi-language framework to
support them. Xtext could be that but you need the full power of ANTLR to pull
of common programming languages. Maybe they'll get there.
Comment 3 Alex Blewitt CLA 2010-06-09 02:35:30 EDT
Comment from bug 315540 comment 6

I think we need to be a lot more clear about what it means to have CDT be
extensible for a new language. 

Firstly I don't think parser extensibility is even close to being the hardest
part. I was able to make the LR parser extensible by simply providing reusable
grammar files and action classes. Parsing produces an AST from some text, fine,
now what do you do with that AST? That's the important question.

I'm talking about stuff like the binding resolution algorithms which are
incredibly complex. They basically encode most of the semantic rules of C/C++.
How do you make something like that extensible? 


I see two levels...

One way would be to plug in at a fine grained level. This might be useful for
small language extensions like UPC which tend to only provide a few new things
on top of C. If I could just extend a parser with a few grammar rules, add a
few semantic rules, a handful of AST nodes etc. 

Unfortunately I don't think this would work without extensive architectural and
algorithmic changes to CDT. And introducing API at that level makes evolving
the CDT core much harder.

UPC does barely work though. The UPC parser represents UPC constructs in way
that is digestible to CDT by "reducing" the new language features to ones CDT
already supports. For example the UPC forall loop extends a regular C for loop,
which isn't actually correct but makes it work. That's about as far as I got,
anything else was big trouble. I even gave up trying to get "shared int" to
show up in the outline view. Another example is that the editor help system
doesn't even support using a content type other than C/C++. My point is that
sweeping changes across the core and UI would be needed to make this work,
including introducing tons of API. Is there even demand for such a framework? I
think its a non-starter. I could have just added UPC support directly to the
core and probably would have gotten a lot farther.


The other level would be very high level. Basically you provide an almost
complete standalone solution that plugs in only at specific points like
ILanguage. You provide a complete parser with binding resolution, AST, index
linkage etc. Many parts of CDT could be reusable, specifically the
preprocessor/lexer which already buys you a lot. But still there's a ton left
to do. 

And then how do you extend the UI? For the sake of argument say there's some
editor trick you could do in Obj-C that doesn't apply to C/C++, how do you
extend the existing editor with that? How do you make the call hierarchy extend
able to support all kinds of fancy stuff that can't be predicted like
multi-methods. Do you have to provide your own editor and type/call hierarchy?
All this stuff needs to be thought through.


My opinion is that the tooling needs to know a lot about the language in order
to provide powerful and useful features. So you're writing at least half an IDE
from scratch to add a new language to CDT, if not more.

That's why I think Markus is on point. The best approach would be to add
objective-c directly to the CDT core as a third officially supported language.
CDT is open source, anyone can provide patches, participate in the community,
and work towards becoming a committer. I don't see why doing it this way makes
CDT closed or hostile. And having ParserLanguage be an enum forces anyone truly
serious about supporting a new language to work closely with the CDT community
to do it.