Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cdt-dev] LR parser and token generation

I think it's more correct to say that at the moment we're not working on major feature enhancements on them. We have been continually working on the correctness and performance of these parsers, and we have a dedicated resource (John) devoted to them. There were some pretty drastic performance improvements that occurred late last year.

===========================
Chris Recoskie
Team Lead, IBM CDT and RDT
IBM Toronto

Inactive hide details for Mike Kucera---03/14/2011 10:22:56 AM---Nobody is working on the LR parser at the moment. Its been idlMike Kucera---03/14/2011 10:22:56 AM---Nobody is working on the LR parser at the moment. Its been idle for some time now. Any contributions


From:

Mike Kucera/Toronto/IBM@IBMCA

To:

"CDT General developers list." <cdt-dev@xxxxxxxxxxx>

Cc:

"CDT General developers list." <cdt-dev@xxxxxxxxxxx>, cdt-dev-bounces@xxxxxxxxxxx

Date:

03/14/2011 10:22 AM

Subject:

Re: [cdt-dev] LR parser and token generation

Sent by:

cdt-dev-bounces@xxxxxxxxxxx




Nobody is working on the LR parser at the moment. Its been idle for some time now. Any contributions are welcome.

Mike Kucera
Rational Multicore Tooling
IBM Toronto Lab
mkucera@xxxxxxxxxx


Inactive hide details for Mike Wrighton ---03/11/2011 08:13:42 AM---Hi, I'm facing the same issue at the moment (need to add neMike Wrighton ---03/11/2011 08:13:42 AM---Hi, I'm facing the same issue at the moment (need to add new token types), were

From:
Mike Wrighton <mike.wrighton@xxxxxxxxxxxxxx>
To:
"CDT General developers list." <cdt-dev@xxxxxxxxxxx>
Date:
03/11/2011 08:13 AM
Subject:
Re: [cdt-dev] LR parser and token generation
Sent by:
cdt-dev-bounces@xxxxxxxxxxx




Hi,


I'm facing the same issue at the moment (need to add new token types), were there any extensions added to allow custom lexers since the last post? If not this might be something we can look at doing and patch back.


Cheers,
Mike


On 4 March 2010 22:54, Mike Kucera <
mkucera@xxxxxxxxxx> wrote:
      Being able to specify new tokens other than keywords is a feature that the LR parser should have. Unfortunately I never got around to implementing it because a) I didn't have time and b) I didn't really need it. But now that the LR parser seems to be catching some momentum it should be added.

      It looks like the Lexer class returns single unrecognized characters as tokens of type tOTHER_CHARACTER, which are then filtered out by the preprocessor. I think this actually makes sense because it makes it easier for the parser to recover from syntax errors. I guess you could allow these tokens to make it to the parser, but then you would only be getting one character at a time and you would have to be diligent to filter out the ones you don't want.

      Ideally you should be able to specify new token patterns via the scanner extension configuration. Or perhaps allow a language to provide its own lexer and have the LR parser provide a reusable lexer grammar which you could then extend. (An older version of the LR parser actually had a lexer grammar, but I abandoned it when Markus wrote the new preprocessor, it could probably still be found in CVS). Unfortunately I don't have time to work on this enhancement at the moment. Outside contributions would definitely be appreciated.




      Mike Kucera
      Software Developer
      Eclipse CDT/PTP
      IBM Toronto

      mkucera@xxxxxxxxxx

      Inactive hide details for "Mario Pierro" ---03/03/2010 05:15:17 PM---Hello Mike,"Mario Pierro" ---03/03/2010 05:15:17 PM---Hello Mike,


      From:

      "Mario Pierro" <
      Mario.Pierro@xxxxxxx>

      To:

      "CDT General developers list." <
      cdt-dev@xxxxxxxxxxx>

      Date:

      03/03/2010 05:15 PM

      Subject:

      RE: [cdt-dev] LR parser and token generation





      Hello Mike,

      Thanks for the explanation. The @ sign now arrives as an identifier to the token mapper class and is translated to the proper LPG lexer token. The ScannerExtensionConfiguration used by the LR parser must be copied over to the new project just to have supportAtSignInIdentifiers() return true – the original class in the LR parser cannot be extended, and I have seen that it adds some macros.

      But still, this is a workaround for this specific case. I cannot say if the ‘@’ char example is the only one, or if the Lexer could also prevent other character sequences to be sent to the parser.

      Are there any plans to unify lexer and parser in the future? I feel that being forced to use a different lexer partly defeats the purpose of the LR parser project, which I really like. Consider that once the token issue was fixed it took me 15 minutes to add the required rules to the C99 grammar and get my custom extension working properly. I still have to figure out how to patch the PDOM C99 parser to do the same thing, and even if I did it would be more difficult to keep the parser updated, and to ensure adherence to a specific set of rules.

      An idea could be to have the Lexer play nice towards different parsers, by ensuring that every character in the input is passed to the parser as a token – possibly using a generic “unrecognized element” token. The token mapper class in the LR plugin would then perform all the further recognition, without the need of configuring the preprocessor in a specific way.

      Would this be a viable approach?

      /Mario

      From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Mike Kucera
      Sent:
      den 3 mars 2010 17:17
      To:
      CDT General developers list.
      Cc:
      CDT General developers list.; cdt-dev-bounces@xxxxxxxxxxx
      Subject:
      Re: [cdt-dev] LR parser and token generation

      Your understanding of the situation is exactly correct. I think normally with LPG you would provide a grammar for both the lexer and the parser parts. But in our situation we have a preprocessor sitting between the lexer and the parser which complicates things terribly. So instead the LR parser reuses the lexer/preprocessor from the CDT core. This is also necessary because the CPreprocessor class has a lot of critical functionality, but it does make adding new tokes other than keywords difficult. Worst case scenario you might have to provide a patch to add support for the new token to the core.

      Since the LR parser is not using a lexer generated by LPG there needs to be a token map that maps the tokens from the core to the tokens that LPG requires.

      If all you need to support is the @ sign then you may be in luck. The CDT lexer has an option to support @ in identifiers, if this option is turned on then the @ sign alone should be returned as an identifier token which you can then intercept and turn into the LPG token type that you want.


      Mike Kucera
      Software Developer
      Eclipse CDT/PTP
      IBM Toronto

      mkucera@xxxxxxxxxx

      Inactive hide details for "Mario Pierro" ---03/03/2010 09:32:28 AM---Hello,"Mario Pierro" ---03/03/2010 09:32:28 AM---Hello,


From:

"Mario Pierro" <
Mario.Pierro@xxxxxxx>

To:

"CDT General developers list." <
cdt-dev@xxxxxxxxxxx>

Date:

03/03/2010 09:32 AM

Subject:

[cdt-dev] LR parser and token generation





      Hello,

      Another question on LR parser customization...

      I am trying to add some custom extensions to the C99 language as
      specified in the LR parser plugin. The extensions require both
      additional keywords and additional grammar rules.

      My ILanguage implementation extends the C99Language class, and provides
      the custom C99Parser via its getParser() method. Additional keywords are
      added via a custom ICLanguageKeywords implementation (as described in

      http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg15788.html) which
      extends CLanguageKeywords and adds the new ones.

      >From what I understood, my custom parser will process tokens which have
      been produced by the CPreprocessor / Lexer classes - as the PDOM parser
      does - and use a customized version of the DOMToC99TokenMap class to map
      the preprocessor tokens (IToken interface) to the tokens in the
      generated C99Parsersym class.

      So if the parser defines new tokens, the CPreprocessor needs to know
      about them as well. If I got it right, this can be done by having the
      language class supply an implementation of
      IScannerExtensionConfiguration, which associates the extended keywords
      to token ids in the IExtensionToken interface in its addKeyword(char[],
      int) method.

      Alternatively, the lexer can ignore the extensions altogether, and the
      customized DOMToC99TokenMap class can determine if e.g. an "identifier"
      token supplied by the lexer is actually an "extended keyword" token in
      the parser.

      A customized LR parser will thus be dependent on the tokens generated by
      the preprocessor, no matter what its grammar specifies. Circumventing
      this might be difficult, some characters might never be recognized as
      the Lexer might not be generating any token at all (e.g. the '@' char).

      I would like to use the same grammar for the lexer and the parser, so
      that the token set is the same.
      Is this possible? Am I getting something terribly wrong here?

      Thank you for your help!

      /Mario


      _______________________________________________
      cdt-dev mailing list

      cdt-dev@xxxxxxxxxxx
      https://dev.eclipse.org/mailman/listinfo/cdt-dev
      _______________________________________________
      cdt-dev mailing list

      cdt-dev@xxxxxxxxxxx
      https://dev.eclipse.org/mailman/listinfo/cdt-dev



      _______________________________________________
      cdt-dev mailing list

      cdt-dev@xxxxxxxxxxx
      https://dev.eclipse.org/mailman/listinfo/cdt-dev
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx

https://dev.eclipse.org/mailman/listinfo/cdt-dev
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev


GIF image

GIF image

GIF image

GIF image

GIF image


Back to the top