Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [cdt-dev] decoupled preprocessor

Hi Markus,

I think it is possible to do this, but it might take a bit of work because
there are a few differences between the way the two preprocessors work.

First of all you would need a separate lexer in order to generate the
tokens, you could use the C99 lexer, but it doesn't support any GCC
extensions yet (although this is one of the things that is next on my
plate). Or you could generate a new lexer using LPG, or just handwrite a
lexer, its not that hard.

Secondly, the preprocessor is written as its own phase, so it fully
completes and generates a big list of tokens before the parser starts. The
DOM scanner on the other hand generates tokens on-the-fly via fetchToken().
However it might actually make more sense to run the DOM parsers on a fully
processed list of tokens. Right now the DOM parser creates a linked list of
the tokens as fetchToken() is called in order to support LL(*) and
backtracking. This has caused bugs in the past (i.e. infinite loops in the
parser).

So, yes I think it is possible, but if we do this we should take some time
to fully plan out our approach.

I also want to mention that I will be attempting an LPG based C++ parser
soon. C++ is a whole new beast compared to C99 and I have to admit I'm not
sure how good LR will be at handling it, maybe ANTLR is the better choice
for parsing C++, but I'm still going to attempt anyway.  Having an
extensible C++ parser would be nice, for one thing when the next C++
standard is completed (C++0x) we could just add it as an extension to the
existing parser.


Mike Kucera
Software Developer
IBM CDT Team, Toronto
mkucera@xxxxxxxxxx




                                                                           
             "Schorn, Markus"                                              
             <Markus.Schorn@wi                                             
             ndriver.com>                                               To 
             Sent by:                  "CDT General developers list."      
             cdt-dev-bounces@e         <cdt-dev@xxxxxxxxxxx>               
             clipse.org                                                 cc 
                                                                           
                                                                   Subject 
             06/20/2007 02:57          RE: [cdt-dev] decoupled             
             AM                        preprocessor                        
                                                                           
                                                                           
             Please respond to                                             
               "CDT General                                                
             developers list."                                             
             <cdt-dev@eclipse.                                             
                   org>                                                    
                                                                           
                                                                           




Mike,
is there a chance that we can use your decoupled preprocessor for the
current C- and C++-parsers? The DOM-Scanner really is a nightmare to
maintain.

Markus.

> -----Original Message-----
> From: cdt-dev-bounces@xxxxxxxxxxx
> [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Mike Kucera
> Sent: Dienstag, 19. Juni 2007 23:40
> To: CDT General developers list.
> Subject: RE: [cdt-dev] decoupled preprocessor
>
> It looks like you are planning to do preprocessing on the raw
> character
> stream and then feed the result to your ANTLR lexer.
>
> The C99 preprocessor works differently, it processes a token
> stream, not a
> character stream. It creates a CodeReader for each include,
> passes it to
> the lexer and expects a token stream as the result. It then
> adds the token
> stream to its own input and continues processing.
>
> I don't know which approach makes more sense with ANTLR. With
> LPG I was
> able to separate the lexer and parser and stick the preprocessor
> in-between.
>
> I believe that doing lexing before preprocessing makes the
> preprocessing
> phase much easier to write and maintain. For example the C99
> preprocessor
> doesn't need to deal with comments, from bug reports this is
> something that
> I can tell has created many issues in the DOM scanner. Also
> the code is
> cleaner because it is processing a token stream instead of a
> raw character
> stream (for example, compare Macro.invoke() to BaseScanner.
> expandFunctionStyleMacro()).
>
> Also, if you return raw characters from the preprocessor then
> how will you
> the calculate the offsets on the AST nodes? The offsets are normally
> contained in the tokens.
>
> > But if you already have everything we've done
> > there, then might be the better approach.
>
> Well, I hope so :) Its pretty new and I'm still working out
> the bugs. It
> does have a few features the DOM scanner doesn't, like support for
> trigraphs.
>
> I hope you do decide to give it a try. I'll decouple it soon.
>
>
> Mike Kucera
> Software Developer
> IBM CDT Team, Toronto
> mkucera@xxxxxxxxxx
>
>
>
>
>
>              Doug Schaefer
>
>              <DSchaefer@xxxxxx
>
>              m>
>           To
>              Sent by:                  "CDT General
> developers list."
>              cdt-dev-bounces@e         <cdt-dev@xxxxxxxxxxx>
>
>              clipse.org
>           cc
>
>
>
>      Subject
>              06/19/2007 03:53          RE: [cdt-dev]
> decoupled
>              PM                        preprocessor
>
>
>
>
>
>              Please respond to
>
>                "CDT General
>
>              developers list."
>
>              <cdt-dev@eclipse.
>
>                    org>
>
>
>
>
>
>
>
>
>
> Yes, it is definitely something I'll need. I'll need to take
> a look at what
> you've done. ANTLR uses it's own character stream interface to feed
> characters to the lexer. It provides implementations that can
> pull that out
> of Readers and InputStreams. I will likely want to create a
> new one that
> doesn't try to load it all into a char[] at startup like the
> built in ones
> do. We can then hook that up to the preprocessor.
>
> I'm not sure how you built yours but the easiest path I can
> see is to take
> our current scanner and replace nextToken with getChar and strip out
> anything that creates a token. But if you already have
> everything we've
> done
> there, then might be the better approach.
>
> Anyway, another shiny object flew by called CDT user docs, so
> I'll get back
> to ANTLR in a few days :).
>
> Cheers,
> Doug Schaefer, QNX Software Systems
> Eclipse CDT Project Lead, http://cdtdoug.blogspot.com
>
>
> > -----Original Message-----
> > From: cdt-dev-bounces@xxxxxxxxxxx
> [mailto:cdt-dev-bounces@xxxxxxxxxxx] On
> > Behalf Of Mike Kucera
> > Sent: Tuesday, June 19, 2007 3:43 PM
> > To: CDT General developers list.
> > Subject: [cdt-dev] decoupled preprocessor
> >
> >
> > Hi Doug,
> >
> > I take it from your latest blog post that you are going to
> be in need of
> a
> > preprocessor for you ANTLR C++ experiment. I was planning
> on decoupling
> > the
> > preprocessor that I wrote for the C99 parser so that it can
> be used with
> > any parser. If you are interested in picking this up when
> would you need
> > it?
> >
> > Mike Kucera
> > Software Developer
> > IBM CDT Team, Toronto
> > mkucera@xxxxxxxxxx
> >
> > _______________________________________________
> > cdt-dev mailing list
> > cdt-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cdt-dev
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
>
>
> _______________________________________________
> cdt-dev mailing list
> cdt-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cdt-dev
>
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev




Back to the top