Community
Participate
Working Groups
The Scanner does not report the whitespace at the end of its input. The whitespace is added to the EOF token. If I explicitely ask for whitespace this behavior seems to be wrong (comments are reported correctly). Run the following testcase: import org.eclipse.jdt.core.compiler.InvalidInputException; import org.eclipse.jdt.internal.compiler.parser.Scanner; import org.eclipse.jdt.core.compiler.ITerminalSymbols; public class ScannerTest { public static void main(String[] args) { String input= "package com.ibm.itp.compare.ui; //foo \r\n"; int l= input.length(); Scanner scanner= new Scanner(true, true); // returns comments & whitespace char[] chars= new char[l]; input.getChars(0, l, chars, 0); scanner.setSource(chars); try { for (;;) { int t= 0; switch (t= scanner.getNextToken()) { case ITerminalSymbols.TokenNameEOF: System.out.println("EOF"); return; case ITerminalSymbols.TokenNameWHITESPACE: System.out.println("WS"); break; case ITerminalSymbols.TokenNameCOMMENT_LINE: System.out.println("COMMENT"); break; case ITerminalSymbols.TokenNameSEMICOLON: System.out.println("SEMICOLON"); break; default: System.out.println("Token: " + t); break; } } } catch (InvalidInputException ex) { } } }
If you have an input like: "package com.ibm.itp.compare.ui; //foo \r\n" I would say the the right tokens at the end are a line comment token followed by EOF. \r\n should be part of the line comment. If this is not the case, we have a bug, but the bug is not that we don't report whitespaces.
for the original input "package com.ibm.itp.compare.ui; //foo \r\n" the length of the LINECOMMENT token is 6, that is the space is included but the CrLf is not. for this input "package com.ibm.itp.compare.ui; " the last two tokens reported by getNextToken() are a SEMICOLON (with length 1) and EOF (BTW with length 1 too). So whitespace at the end of input is not reported.
Do you expect the line separator to be reported as part of the line comment or as a white space?
I assume that lines separators are not reported for multi line comments. Consequently I would expect similar behavior for single line comments: as part of the line comment.
So for a source like: "package com.ibm.itp.compare.ui; // toto \r\n" You expect: PACKAGE WS Token: 5 Token: 6 Token: 5 Token: 6 Token: 5 Token: 6 Token: 5 Token: 6 Token: 5 SEMICOLON WS COMMENT // no WS between COMMENT and EOF EOF And with: "package com.ibm.itp.compare.ui; \r\n": PACKAGE WS Token: 5 Token: 6 Token: 5 Token: 6 Token: 5 Token: 6 Token: 5 Token: 6 Token: 5 SEMICOLON WS // this WS includes the line separator. EOF We have a side-effect in the code formatter when the line separator is part of the line comment. I need to investigate more on this side before I can release anything.
Defer
Created attachment 2086 [details] patch for the scanner
Can fix it and patch the formatter to get rid of the side-effect.
Fixed and released in 2.1 stream. Regression tests added.
Verified.