Re: [xtext-dev] Dealing with variable length tokens (Hollerith)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [xtext-dev] Dealing with variable length tokens (Hollerith)

From: Lieven Lemiengre <lieven.lemiengre@xxxxxxxxxx>
Date: Thu, 30 Jun 2016 17:58:12 +0200
Delivered-to: xtext-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/xtext-dev>
List-help: <mailto:xtext-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/xtext-dev>, <mailto:xtext-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/xtext-dev>, <mailto:xtext-dev-request@eclipse.org?subject=unsubscribe>

Hi Kasper,

I have some experience with this, I've customized an xtext lexer for our commercial product.

Overriding the mTokens method is the right way to do this but I do it a little differently:

override mTokens() throws RecognitionException {

if(!SpecialTokensHandler.handle(input, state)) {

super.mTokens()

}

Detecting & emiting the 'HOLLERITH' token can be combined.

We do this because you also have to override the lexer for content assist. Look for the generated subclass of org.eclipse.xtext.ui.editor.contentassist.antlr.internal.Lexer. In that lexer you have to override the same method.

To make sure that the token type from the runtime lexer & ui lexer are the same you also have to change your mwe2 file. Use parser.antlr.ex.rt.AntlrGeneratorFragment & parser.antlr.ex.ca.ContentAssistParserGeneratorFragment instead of the usual generators.

kind regards,

Lieven

2016-06-30 17:13 GMT+02:00 kaspergam <kaspergam@xxxxxxxxxxxxx>:

I recently was asking about parsing IGES files using xtext, which included Hollerith strings in the specification. These strings are denoted by an int value, the number of characters, followed by a 'H' and then the string. To parse such tokens, you recommended I use a custom lexer. I was able to get decent parsing to work using this approach, but was curious if the way I am lexing is not optimal or recommended.

To handle a token like 9Hmy String, for example, I added a terminal rule in my grammar called HOLLERITH with this definition:

terminal HOLLERITH:
    INT 'H' . ;

and then created a new CustomIGESLexer that extended the generated InternalIEGSLexer. I then overrode the mTokens() method to check for these Hollerith strings first before allowing the internal lexer to work for any other token. I was wondering if this is a good approach, because I do not want to write a completely unique lexer, I just want to provide custom lexing for the Hollerith strings. The code is something like this:

public void mTokens() throws RecognitionException {
    if (isHollerith()) {
        myRULE_HOLLERITH();
    } else {
        super.mTokens();
    }
}

myRULE_HOLLERITH() {
try {
    int _type = RULE_HOLLERITH;
    int _channel = DEFUALT_TOKEN_CHANNEL;

    //... get the token, match the characters with match()

    state.type = _type;
    state.channel = _channel;
    } finally {
    }
}

I tried to resemble the style of the internal lexer when creating the custom rules. The isHollerith() just checks for an int followed immediately by a 'H'

    private boolean isHollerith() {
        int index = 1;
        int cur = input.LA(index);
        // See if an int starts the string
        while (cur >= '0' && cur <= '9') {
            index++;
            cur = input.LA(index);
        }
        // Followed by an 'H'
        return index > 1 && cur == 'H';
    }

This might be a terrible way to customize the lexer rules, but it works for now.

Thank you,

Kasper Gammeltoft
Oak Ridge National Lab,
Computer Science & Mathematics Division
Computer Science Research Group

_______________________________________________
xtext-dev mailing list
xtext-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/xtext-dev

References:
- [xtext-dev] Dealing with variable length tokens (Hollerith)
  - From: kaspergam

Prev by Date: [xtext-dev] Dealing with variable length tokens (Hollerith)
Next by Date: [xtext-dev] Importing EMF model inside subpackages
Previous by thread: [xtext-dev] Dealing with variable length tokens (Hollerith)
Next by thread: [xtext-dev] Importing EMF model inside subpackages
Index(es):
- Date
- Thread

Breadcrumbs