43437 – Scanner does not like string literals

Bug 43437 - Scanner does not like string literals

Summary: Scanner does not like string literals

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.0
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.0 M4
Assignee:	Olivier Thomann
QA Contact:

URL:
Whiteboard:
Keywords:	api

Depends on:
Blocks:

Reported:	2003-09-22 11:42 EDT by Martin Aeschlimann
Modified:	2003-10-14 08:00 EDT (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Aeschlimann

2003-09-22 11:42:39 EDT

20030922

1. Create a scanner and initalize it with the string "\"hello\""
2. Read two tokens: You get a TokenNameStringLiteral and a
  InvalidInputException: Unterminated_String
I'd expect to get
 - TokenNameStringLiteral and a
 - TokenNameEOF

Comment 1 Olivier Thomann

2003-09-22 13:38:28 EDT

Could you please provide the way you create your scanner? I cannot reproduce
using the following code and the latest HEAD contents:

String source =	"\"hello\"";
IScanner scanner = ToolFactory.createScanner(true, true, true, true);
scanner.setSource(source.toCharArray());
int token = 0;
try {
	token = scanner.getNextToken();
	assertEquals("Wrong token type", ITerminalSymbols.TokenNameStringLiteral, token);
	token = scanner.getNextToken();
	assertEquals("Wrong token type", ITerminalSymbols.TokenNameEOF, token);
} catch (InvalidInputException e) {
	assertTrue(false);
}

Comment 2 Olivier Thomann

2003-09-22 13:42:34 EDT

Add CC.

Comment 3 Martin Aeschlimann

2003-09-22 13:47:57 EDT

I have a test case, but can't release right now (still other thing broken).
The scanner is set up with
	IScanner scanner= ToolFactory.createScanner(false, false, false, false);
	scanner.setSource(str.toCharArray());
	scanner.resetTo(0, str.length());
The string is
		StringBuffer buf= new StringBuffer();
		buf.append("\"Hello\" ");
		String contents= buf.toString();

Hope that helps.
Will add the test name here as soon as I release.

Comment 4 Olivier Thomann

2003-09-22 14:14:15 EDT

I changed my test for:
StringBuffer buf= new StringBuffer();
buf.append("\"Hello\" ");
String str = buf.toString();
IScanner scanner= ToolFactory.createScanner(false, false, false, false);
scanner.setSource(str.toCharArray());
scanner.resetTo(0, str.length());
int token = 0;
try {
	token = scanner.getNextToken();
	assertEquals("Wrong token type", 
ITerminalSymbols.TokenNameStringLiteral, token);
	token = scanner.getNextToken();
	assertEquals("Wrong token type", ITerminalSymbols.TokenNameEOF, token);
} catch (InvalidInputException e) {
	assertTrue(false);
}

I cannot reproduce the problem. Need your complete test for further 
investigation.

Comment 5 Martin Aeschlimann

2003-09-22 14:37:34 EDT

In CodeFormatterUtil change
  OLD_FUNC and BUG_43437
to false and run CodeFormatterUtilTest.testCatchStringLiteral

Comment 6 Olivier Thomann

2003-09-22 16:39:01 EDT

I will investigate this one.

Comment 7 Olivier Thomann

2003-09-23 08:19:37 EDT

Reproduced with your test.
I am investigating.

Comment 8 Olivier Thomann

2003-09-23 08:38:29 EDT

The problem is due to the wrong value for the end position.

Comment 9 Olivier Thomann

2003-09-23 08:54:30 EDT

A workaround is to remove the call to resetTo in this method in class
CodeFormatterUtil.

private static IScanner getTokenScanner(String str) {
	IScanner scanner= ToolFactory.createScanner(true, false, false, false);
	scanner.setSource(str.toCharArray());
	return scanner;
}

This works. Using resetTo with str.length() sets the eofPosition too far. This
is something we might want to clean up.

Comment 10 Olivier Thomann

2003-09-23 11:06:52 EDT

This will be considered as a specification bug, because client code might rely
on this bug and use lenght - 1.
The javadoc has been changed to:
	/**
	 * Reposition the scanner on some portion of the original source. The given
endPosition is the last valid position.
	 * Beyond this position, the scanner will answer EOF tokens
(<code>ITerminalSymbols.TokenNameEOF</code>).
	 * 
	 * @param startPosition the given start position
	 * @param endPosition the given end position
	 */

So if you want to reset from 0 to source lenght the right way to call it is:
resetTo(0, s.length - 1);

If you update your test by changing the resetTo invocation, it works. You could
simply omit it, since your are considering the whole string. setSource is enough.
Fixed and released in HEAD.

Comment 11 Martin Aeschlimann

2003-09-23 11:39:48 EDT

The old comment was really unclear. I'm used to exclusive ends, as this is more 
the standard in eclipse. Unfortunatly jdt.core uses the inclusive notion, but 
of course this is too late to change.
From our 11 uses of reset are 8 asuming the exclusive length. I guess other 
clients may have the same bug.
So what I suggest if that you could add a test
'if (endPos >= str.length) { endPos= str.length - 1 }
That would fix most of the usages.

Comment 12 Olivier Thomann

2003-09-23 11:55:56 EDT

This could be a workaround to support src.length. We'd like to change it, but 
in this case it is simply too late.

Comment 13 Martin Aeschlimann

2003-09-23 12:13:32 EDT

Don't understand that. Why is it too late?
The suggestion is to change the Javadoc as you suggested, but to gently handle 
the case reset(0, str.length) by saving str.length - 1 instead of str.length.

BTW, if you're strict, the Javadoc change is a breaking change, as you can see 
in our code. Wouldn't it be better to deprecate the old method and have a new 
one (best with offset, length)?

Comment 14 Olivier Thomann

2003-09-23 12:21:00 EDT

I released your suggestion. I will talk with Philippe for the deprecation.

Comment 15 Philipe Mulet

2003-09-23 13:26:53 EDT

I don't like the workaround, it makes the code tolerate two different scenarii.
This can be treated as a spec issue only, and we are guaranteed that our 
clients won't be broken since this is what our implementation does already.

Only the spec should be touched to match our implementation.

Comment 16 Olivier Thomann

2003-09-23 13:36:23 EDT

Removed workaround and simply change the spec.

Comment 17 Martin Aeschlimann

2003-09-24 03:48:05 EDT

fixed our code that passed in exclusive end positions > 20030924

Comment 18 David Audel

2003-10-14 08:00:51 EDT

Verified.