Bug 13907 - Scanner does not report whitespace tokens at end of input
Summary: Scanner does not report whitespace tokens at end of input
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows 2000
: P3 normal (vote)
Target Milestone: 2.1 M3   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-04-16 12:07 EDT by Andre Weinand CLA
Modified: 2002-11-14 06:35 EST (History)
0 users

See Also:


Attachments
patch for the scanner (17.28 KB, text/plain)
2002-10-01 14:58 EDT, Olivier Thomann CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andre Weinand CLA 2002-04-16 12:07:24 EDT
The Scanner does not report the whitespace at the end of its input.
The whitespace is added to the EOF token. If I explicitely ask for whitespace 
this behavior seems to be wrong (comments are reported correctly).

Run the following testcase:

import org.eclipse.jdt.core.compiler.InvalidInputException;
import org.eclipse.jdt.internal.compiler.parser.Scanner;
import org.eclipse.jdt.core.compiler.ITerminalSymbols;


public class ScannerTest {

	public static void main(String[] args) {
		
		String input= "package com.ibm.itp.compare.ui; //foo \r\n";
		int l= input.length();
		
		Scanner scanner= new Scanner(true, true); // returns comments & 
whitespace
		char[] chars= new char[l];
		input.getChars(0, l, chars, 0);
		scanner.setSource(chars);
		try {
			for (;;) {
				int t= 0;
				switch (t= scanner.getNextToken()) {
				case ITerminalSymbols.TokenNameEOF:
					System.out.println("EOF");
					return;
				case ITerminalSymbols.TokenNameWHITESPACE:
					System.out.println("WS");
					break;			
				case ITerminalSymbols.TokenNameCOMMENT_LINE:
					System.out.println("COMMENT");
					break;
				case ITerminalSymbols.TokenNameSEMICOLON:
					System.out.println("SEMICOLON");
					break;
				default:
					System.out.println("Token: " + t);
					break;
				}
			}
		} catch (InvalidInputException ex) {
		}
	}
}
Comment 1 Olivier Thomann CLA 2002-04-16 13:58:51 EDT
If you have an input like:
"package com.ibm.itp.compare.ui; //foo \r\n"
I would say the the 
right tokens at the end are a line comment token followed by EOF. \r\n should be part of the line 
comment. If this is not the case, we have a bug, but the bug is not that we don't report whitespaces.
Comment 2 Andre Weinand CLA 2002-04-17 04:02:08 EDT
for the original input
   "package com.ibm.itp.compare.ui; //foo \r\n"
the length of the LINECOMMENT token is 6, that is the space is included but the
CrLf is not. 

for this input
   "package com.ibm.itp.compare.ui;     "
the last two tokens reported by getNextToken() are a SEMICOLON (with length 1) 
and EOF (BTW with length 1 too).
So whitespace at the end of input is not reported.
Comment 3 Olivier Thomann CLA 2002-04-30 11:32:13 EDT
Do you expect the line separator to be reported as part of the line comment or as a white space?
Comment 4 Andre Weinand CLA 2002-04-30 11:53:35 EDT
I assume that lines separators are not reported for multi line comments.
Consequently I would expect similar behavior for single line comments: as part 
of the line comment.
Comment 5 Olivier Thomann CLA 2002-04-30 12:05:10 EDT
So for a source like:
"package com.ibm.itp.compare.ui; // toto \r\n"
You 
expect:
PACKAGE
WS
Token: 5
Token: 6
Token: 5
Token: 6
Token: 5
Token: 6
Token: 
5
Token: 6
Token: 5
SEMICOLON
WS
COMMENT    // no WS between COMMENT and EOF
EOF

And 
with: "package com.ibm.itp.compare.ui;     \r\n":
PACKAGE
WS
Token: 5
Token: 6
Token: 
5
Token: 6
Token: 5
Token: 6
Token: 5
Token: 6
Token: 5
SEMICOLON
WS  // this WS 
includes the line separator.
EOF

We have a side-effect in the code formatter when the line 
separator is part of the line comment. I need to investigate more on this side before I can release 
anything.
Comment 6 Philipe Mulet CLA 2002-06-11 08:29:06 EDT
Defer
Comment 7 Olivier Thomann CLA 2002-10-01 14:58:30 EDT
Created attachment 2086 [details]
patch for the scanner
Comment 8 Olivier Thomann CLA 2002-10-22 12:06:21 EDT
Can fix it and patch the formatter to get rid of the side-effect.
Comment 9 Olivier Thomann CLA 2002-10-22 14:46:47 EDT
Fixed and released in 2.1 stream.
Regression tests added.
Comment 10 David Audel CLA 2002-11-14 06:35:52 EST
Verified.