Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[xtext-dev] Grammar Mixins and Lexer rules

Hi,

during the M4 meeting we've had a discussion on grammar mixins.
I've written our results down and add some additional details.
Please read, ask and provide feedback.

Sven


Grammar Mixins

We’ve identified two major reasons why and how one wants to reuse existing grammars.
Reuse of existing Languages
Some parts of languages are very well understood and flexible enough to be reused in many languages. Such aspects are expressions or name spaces. We want to ship these aspects as libraries. This has two major advantages 
Users don’t have to implement the same stuff over and over again.
Such commonly reused language parts may have already been learned by a lot of potential DSL Users.

Tailoring Existing Language
One also might want to change a given language, by adding new concepts and change the syntax. But one wants to use as much existing code as possible.
Imagine a language for description of domain models, consisting of core concepts like Entity, ValueObject, DataType, Attributes and References.
If the existing DSL is more or less what you need but you want to add additional concepts (e.g. you want to add a property ‘abstract’ for Entities), it should be possible to do so, by just stating how and where the new concept fits into the existing language (i.e. the meta model and the concrete syntax).
This is much like the UML2 profile concept, with the difference, that it is possible to change the concrete syntax.

Concepts
We’ve been thinking of three concepts enabling the reuse of existing grammars.

Let’s say we have two existing languages (xtext.Namespaces and xtext.Expressions) we want to mix into our DSL (MyDSL). One could write something like the following.
(The concrete semantics will be explained below, this example is a bit complicated but is also intended to motivate the whole concept of grammar mix-in)

// imports the meta model used in xtext.Expressions
// same with namespace meta model

foo::MyDSL with xtext::Namespaces as ns, xtext::Expressions as expr{
  
  override ns::PackageContents :
    Entity | Datatype;

  override TypeDeclaration :
    Entity | Datatype;

  Entity : “entity” name=ID …;
  Datatype : “type” name=ID ….;

  Operation : returnType=[expr::TypeRef] name=ID”(“ params+=Param … “)” 
     _expression_=Block;  /* Block is a rule from xtext.Expressions */

 

  // we redefine the syntax for ListLiterals
  override ListLiteral returns expr::ListLiteral : “{“ 
}

The meta model imports are only needed if we explicitly refer to their types. If they are only referenced from mixed in grammars we don't need to import them. 

A language definition starts with it's name followed by any number of mixed in grammars. 
The parser rules are defined in the following block terminated by curly braces.

The semantic of grammar mix-in using the keyword ‘with’ is as follows.
From right to left (in the example this is Expressions, Namespaces, MyDSL), the parser rules get included with their full name. The full name is computed by the local name prepended with the alias specified using the keyword ‘as’. It is not necessary to apply an alias to a mixed in language.

Example:

language A {
  foo::bar::Stuff : “hi” name=ID otherRule=another::Stuff;
  another::Stuff : isFriendly?=”moin”;
}

language B with A {
  override foo::bar::Stuff : ….;
}

this would also be possible

language B with A as a {
  foo::bar::Stuff : ….;
  override a::foo::bar::Stuff : ….;
}

The cross references to local rules, also get the respective alias applied, so if A gets an alias ‘a’, the call to “another::Stuff” in “foo::bar::Stuff” will be renamed to “a::another::Stuff”. The result is a set of parser rules with unique local names.

The return type of an overridden rule must be either the same or a more specific type (i.e. covariance).

TODO : Define how this effects the inference of ecore models.

Super (later)
With just the override feature we have to copy and paste whole rules, and are not able to reuse the existing rule by just prepending or append some information to it.

With a super call we would be able to do something like this:

language A {
  Foo : “foo” name=ID;
}

language B with A {
  override Foo : 
    special=”special” super;
}

It would semantically act like an include (and would maybe also be implemented that way).

Fragments (later)
Fragments are not only useful in the context of grammar mixin. But In this contects they would allow for more fine overwriting smaller chunks (i.e. fragments) of a mixed in language.

Example:

language A {
  Foo : Documented “foo” name=ID;
  Bar : Documented “bar” name=ID;

  fragment Documented : doc=DOCUMENTATION;
}

language B {
  override Documented : 
     doc=SPECIALDOC (annotations+=Annotations)*;
  Annotations : ….
}

We’ve decided, that although fragments seem to be very valuable, we want to add them later.

Abstract Rules (later)
It might be good to allow abstract rules, i.e. rules without implementation:

Example: 

language A {
  Foo : Documented “foo” name=ID;
  Bar : Documented “bar” name=ID;

  abstract fragment Documented myType::Documented;
}

language B {
  override Documented : 
     doc=SPECIALDOC (annotations+=Annotations)*;
  Annotations : ….
}

Lexer rules
We can’t mix in the lexer rules like we do with parser rules, because they are matched in a specific sequence, and if for instance, expr::ID has the same implementation as ns::ID, it depends on the order of include which one gets matched by the lexer. 
We think it’s best to handle the differences between lexer and parser rules more explicitly, so the user won’t face surprising behavior. Therefore I propose that lexer and parser rules each get their own section in the grammar language.

The lexer section starts with the ‘lexer’ keyword right after the optional parser rule block.

MyLang {
} lexer {
  include xtext::Builtin
}

Lexer Rules are included at a specific point in the lexer section. If you ommit the lexer section, the xtext::Builtin lexer is automatically included. So the following means exactly the same as the example above:

MyLang {}

Also if you don’t have any parser rules, you don’t need to write the curly braces. This means that language from above could be written like so:

MyLang

Although this would be syntactically ok, there might be some semantic constraints forcing you to either add some parser or lexer information. But considering the xtext::Builtin language, which doesn’t yet have any parser rules, it would be possible to write it like so (omitting the parser rule section):

xtext::Builtin lexer {
   ID : “…”;
   …
   ANY : “.*”;
}

Ranges
As opposed to how we want to mix-in exitsing parser rules, for lexer rules we need more fine-grained control over how and where (i.e. in what order) each lexer rule gets included. Therefore we’ld like to propose an include construct, which allows to select single rules, subsets or all of the available lexer rules defined in another language definition. 
Specification of what rules are included is done using a range construct.

Examples:

my::FooParser {
   // … parser rules
} lexer {

   include xtext::Builtin[..<ANY] // first until and excluding ANY

   // my DOUBLE rule
   DOUBLE returns ecore::EDouble : “….”; 

   include xtext::Builtin[ANY] // include the ANY rule only
}

or

} lexer {

   STUFF : “….”;

   include xtext::Builtin // everything , shortcut for xtext::Builtin[..]
}


or

} lexer {

   // from start until excluding ID plus FOO plus excluding INT until end
   include xtext::Builtin[..<ID,FOO,INT>..] 
}

or

} lexer {
   include xtext::Builtin[
     ID,
     FOO,
     INT
   ] // includes only ID, FOO, INT in the given order
}

syntax for the include is

Include: 
  “include” QUALIFIED_NAME (‘[‘ RuleRef (‘,’ RuleRef)*’]’);

RuleRef :
   Range | DirectRuleRef;

Range :
  DirectRuleRef? (‘..’|’>..’|’>..<‘|’..<‘) DirectRuleRef?;

DirectRuleRef :
  ID;

Back to the top