Bug 43437 - Scanner does not like string literals
Summary: Scanner does not like string literals
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.0 M4   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords: api
Depends on:
Blocks:
 
Reported: 2003-09-22 11:42 EDT by Martin Aeschlimann CLA
Modified: 2003-10-14 08:00 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2003-09-22 11:42:39 EDT
20030922

1. Create a scanner and initalize it with the string "\"hello\""
2. Read two tokens: You get a TokenNameStringLiteral and a
  InvalidInputException: Unterminated_String
I'd expect to get
 - TokenNameStringLiteral and a
 - TokenNameEOF
Comment 1 Olivier Thomann CLA 2003-09-22 13:38:28 EDT
Could you please provide the way you create your scanner? I cannot reproduce
using the following code and the latest HEAD contents:

String source =	"\"hello\"";
IScanner scanner = ToolFactory.createScanner(true, true, true, true);
scanner.setSource(source.toCharArray());
int token = 0;
try {
	token = scanner.getNextToken();
	assertEquals("Wrong token type", ITerminalSymbols.TokenNameStringLiteral, token);
	token = scanner.getNextToken();
	assertEquals("Wrong token type", ITerminalSymbols.TokenNameEOF, token);
} catch (InvalidInputException e) {
	assertTrue(false);
}
Comment 2 Olivier Thomann CLA 2003-09-22 13:42:34 EDT
Add CC.
Comment 3 Martin Aeschlimann CLA 2003-09-22 13:47:57 EDT
I have a test case, but can't release right now (still other thing broken).
The scanner is set up with
	IScanner scanner= ToolFactory.createScanner(false, false, false, false);
	scanner.setSource(str.toCharArray());
	scanner.resetTo(0, str.length());
The string is
		StringBuffer buf= new StringBuffer();
		buf.append("\"Hello\" ");
		String contents= buf.toString();

Hope that helps.
Will add the test name here as soon as I release.
Comment 4 Olivier Thomann CLA 2003-09-22 14:14:15 EDT
I changed my test for:
StringBuffer buf= new StringBuffer();
buf.append("\"Hello\" ");
String str = buf.toString();
IScanner scanner= ToolFactory.createScanner(false, false, false, false);
scanner.setSource(str.toCharArray());
scanner.resetTo(0, str.length());
int token = 0;
try {
	token = scanner.getNextToken();
	assertEquals("Wrong token type", 
ITerminalSymbols.TokenNameStringLiteral, token);
	token = scanner.getNextToken();
	assertEquals("Wrong token type", ITerminalSymbols.TokenNameEOF, token);
} catch (InvalidInputException e) {
	assertTrue(false);
}

I cannot reproduce the problem. Need your complete test for further 
investigation.
Comment 5 Martin Aeschlimann CLA 2003-09-22 14:37:34 EDT
In CodeFormatterUtil change
  OLD_FUNC and BUG_43437
to false and run CodeFormatterUtilTest.testCatchStringLiteral
Comment 6 Olivier Thomann CLA 2003-09-22 16:39:01 EDT
I will investigate this one.
Comment 7 Olivier Thomann CLA 2003-09-23 08:19:37 EDT
Reproduced with your test.
I am investigating.
Comment 8 Olivier Thomann CLA 2003-09-23 08:38:29 EDT
The problem is due to the wrong value for the end position.
Comment 9 Olivier Thomann CLA 2003-09-23 08:54:30 EDT
A workaround is to remove the call to resetTo in this method in class
CodeFormatterUtil.

private static IScanner getTokenScanner(String str) {
	IScanner scanner= ToolFactory.createScanner(true, false, false, false);
	scanner.setSource(str.toCharArray());
	return scanner;
}

This works. Using resetTo with str.length() sets the eofPosition too far. This
is something we might want to clean up.
Comment 10 Olivier Thomann CLA 2003-09-23 11:06:52 EDT
This will be considered as a specification bug, because client code might rely
on this bug and use lenght - 1.
The javadoc has been changed to:
	/**
	 * Reposition the scanner on some portion of the original source. The given
endPosition is the last valid position.
	 * Beyond this position, the scanner will answer EOF tokens
(<code>ITerminalSymbols.TokenNameEOF</code>).
	 * 
	 * @param startPosition the given start position
	 * @param endPosition the given end position
	 */

So if you want to reset from 0 to source lenght the right way to call it is:
resetTo(0, s.length - 1);

If you update your test by changing the resetTo invocation, it works. You could
simply omit it, since your are considering the whole string. setSource is enough.
Fixed and released in HEAD.
Comment 11 Martin Aeschlimann CLA 2003-09-23 11:39:48 EDT
The old comment was really unclear. I'm used to exclusive ends, as this is more 
the standard in eclipse. Unfortunatly jdt.core uses the inclusive notion, but 
of course this is too late to change.
From our 11 uses of reset are 8 asuming the exclusive length. I guess other 
clients may have the same bug.
So what I suggest if that you could add a test
'if (endPos >= str.length) { endPos= str.length - 1 }
That would fix most of the usages.
Comment 12 Olivier Thomann CLA 2003-09-23 11:55:56 EDT
This could be a workaround to support src.length. We'd like to change it, but 
in this case it is simply too late.
Comment 13 Martin Aeschlimann CLA 2003-09-23 12:13:32 EDT
Don't understand that. Why is it too late?
The suggestion is to change the Javadoc as you suggested, but to gently handle 
the case reset(0, str.length) by saving str.length - 1 instead of str.length.

BTW, if you're strict, the Javadoc change is a breaking change, as you can see 
in our code. Wouldn't it be better to deprecate the old method and have a new 
one (best with offset, length)?
Comment 14 Olivier Thomann CLA 2003-09-23 12:21:00 EDT
I released your suggestion. I will talk with Philippe for the deprecation.
Comment 15 Philipe Mulet CLA 2003-09-23 13:26:53 EDT
I don't like the workaround, it makes the code tolerate two different scenarii.
This can be treated as a spec issue only, and we are guaranteed that our 
clients won't be broken since this is what our implementation does already.

Only the spec should be touched to match our implementation.
Comment 16 Olivier Thomann CLA 2003-09-23 13:36:23 EDT
Removed workaround and simply change the spec.
Comment 17 Martin Aeschlimann CLA 2003-09-24 03:48:05 EDT
fixed our code that passed in exclusive end positions > 20030924
Comment 18 David Audel CLA 2003-10-14 08:00:51 EDT
Verified.