[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [Dltk-dev] heredoc scanner help

hello all -

  after a bunch of trial and error (mostly error), i think i've come up w/ a solution for this, but i wanted to see if anyone saw any major potential issues.

  bruno: after messing around with this, the fact that D also supports an identifier means i believe you will also have to leverage this.

  i was successfully able to create a heredoc partition on an initial document open by overriding the 'nextToken()' method in my subclass of the RuleBasedPartitionScanner. the heredoc rule i created returned a HereDocToken sub-class (although i believe this is no longer necessary) to indicate to the partition scanner that the start of heredoc was seen and it would need to buffer tokens until the terminator was seen. it builds the buffer, tracking the offsets of the created tokens. when the tokens are removed from the buffer, the offsets maintained by the scanner are adjusted the partition offsets are correct.

  i discovered that if i went to insert characters before the start of the heredoc (<<), the partitioning would get all messed up b/c the partition scanner would abort processing once it saw the initial HereDocToken b/c it matched a prior token - this is line 372 of the FastPartitionScanner that i mentioned 50 emails ago :) - i resolved this by overriding

    setPartialRange(IDocument, int, int, String int)

  to clear the token buffer before it starts any processing, which seems to have addressed that problem.
  this is all fine and good and it alleviated my need to track and kind of state, until i made edits to the body of the heredoc...

  when heredoc rule would resume, but b/c it has no idea where it should terminate, the partition gets calculated incorrectly. a similar situation occurs when edits are made after the terminator (details aren't necessary here).

  so - it appears that i'm back to the original problem of needing to track information about the heredoc, namely what is its terminator and what is its ending offset in the document.

  after giving this some thought, i had the idea to tack this information on the end of the content type string that is returned by the getData() call on the token object. initially this didn't work b/c the partition type was no longer recognized, but after i overrode:


  to strip off the delimiter, the partitioning and coloring started working again.

  the first big question is, does anyone see anything wrong w/ tracking the terminator and offset via the data that is returned from the call to getData()? this does not seem to affect anything as long as that information is stripped off before anything else uses it and the above methods are the only ones i've encountered that do.

  it just occurred to me that it might be possible to just track that information inside the partition scanner for the life of the document but that might be difficult.

  the other question is there anything wrong w/ requiring an explicit HereDocRule subclass to be set (or provided by) the partition scanner instead of it being defined in the list of rules?

  1) i don't want to cut and past the implementation of RuleBasedPartitionScanner.nextToken() just b/c i need to change the way this line of code works:

    if (fContentType.equals(token.getData())) {

  that line of code will always fail b/c the 'success' token returned from the heredoc rule will never contain and terminator/offset information and i felt rather then do that cut and paste, the scanner can just do an explicit check against the heredoc rule before falling back to cycling through the list.

  2) if a resume is occurring, the rule will need to be told what the terminator/offset are so it can scan properly.


On Wed, May 16, 2012 at 9:20 PM, Jae Gangemi <jgangemi@xxxxxxxxx> wrote:

  ha! i think i figured out the answer :)

  the easiest thing to do will be to just build a token buffer if/when heredoc is encountered and then resume normal scanning once it's empty.

  i'll report back once i try this out.

On Wed, May 16, 2012 at 8:53 PM, Jae Gangemi <jgangemi@xxxxxxxxx> wrote:

  actually, on further thought, i want this to be in its own partition b/c then i can assign a specific 'color' scanner to it that offers me more flexibility/options.

  it'd be nice if there were a way pull offsets out of the ModuleDeclaration to supplement rule based parsing...is that even possible?

On Wed, May 16, 2012 at 8:40 PM, Jae Gangemi <jgangemi@xxxxxxxxx> wrote:
  started looking into this and have a very basic implementation kind of working, but i've hit the first snag and wanted to see what others thought on how to over come it...

  i've taken the route of trying to heredoc be it's own partition, so this:

  yields this document structure:

partition type: __perl_heredoc, offset: 0, length: 5
partition type: __dftl_partition_content_type, offset: 5, length: 1
partition type: __perl_heredoc, offset: 6, length: 3

  in order to do this, i start tracking that i've seen a heredoc token in my partition scanner and once i see that the next character is going to be a newline, the scanner will start consuming each line until it sees the terminator, at which point it ends the partition and resets the state.

  the problem that i am having is this, if i add a newline to the document before the start of <<EOF, i hit this block of code starting at line 371 in the FastPartioner

    // if position already exists and we have scanned at least the
    // area covered by the event, we are done
    if (fDocument.containsPosition(fPositionCategory, start, length)) {
        if (lastScannedPosition >= e.getOffset() + newLength)
            return createRegion();
            ++ first;
        } else {

  and the scanning stops and my partition scanner is left thinking that the next time 'nextToken()' is invoked, it's in heredoc mode.

  i really don't want to have to make a copy of the FastPartitioner just to add some way to 'reset' my partition scanner, so what other options exist?

  is creating a partition for this just the wrong way to go? eventually i'd like to be able to offer a folding option for heredoc, but i believe i could also accomplish that by having it represented in the AST.

  i haven't tried going through the code scanner yet to see if it's possible that way - but i am worried that i will encounter the same 'state' problem i saw with the partition scanner - but perhaps not.

  either way - if anyone has anything to contribute, i'd love to here it! :)

On Wed, May 16, 2012 at 2:28 PM, Jae Gangemi <jgangemi@xxxxxxxxx> wrote:

  in the 2nd case you're going to have to write your own rule, but that still should be much simpler to handle b/c the ';' appears on the terminator line, not after the heredoc start.

  your rule would have to check for q" followed by some character and if it saw a char after the ", read in all the chars up until the new line, save that as your terminator, and keep reading lines until you hit the terminator.

  sub-classing the pattern/multi-line rule to do this should get you want you want.

On Wed, May 16, 2012 at 2:21 PM, Bruno Medeiros <bruno.do.medeiros@xxxxxxxxx> wrote:

On Wed, May 16, 2012 at 6:40 PM, Jae Gangemi <jgangemi@xxxxxxxxx> wrote:

  actually, i just went and re-read the wiki page, shouldn't this be simple for D?

  you'd just need multiple multi-line rules that look something like this (not sure if there is an escape char):

    new MultiLineRule("q(", ")\n", token, (char) 0, true);
    new MultiLineRule("q{", "}\n", token, (char) 0, true);


That should be correct to:
  new MultiLineRule("q\"(", ")\"", token, (char) 0, true);
but anyways: yes and no, that would work for the heredoc with "a delimiter character (any of () <> {} or []) " which is the first case as shown in the wiki, but it wouldn't work for the second case with a delimiter identifier string:
int main() {
    string list = q"IDENT
1. Item One
2. Item Two
3. Item Three
    writef( list );
which is more like the Ruby and Perl heredoc.

Bruno Medeiros

dltk-dev mailing list