Summary: | ECJ accepts illegal unicode escape sequences | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Eclipse Project] JDT | Reporter: | Andreas Kohn <andreas.kohn> | ||||||||
Component: | Core | Assignee: | Olivier Thomann <Olivier_Thomann> | ||||||||
Status: | VERIFIED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | P3 | CC: | amj87.iitr, kazm, Olivier_Thomann, stephan202 | ||||||||
Version: | 3.7 | ||||||||||
Target Milestone: | 3.8 M2 | ||||||||||
Hardware: | PC | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Attachments: |
|
Description
Andreas Kohn
2011-09-05 13:50:54 EDT
Created attachment 202771 [details] Input file Attach the actual input file. The bug here (in my understanding) is that JLS explicitly says which characters are allowed for a unicode escape: --- http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3 3.3 Unicode Escapes Implementations first recognize Unicode escapes in their input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) with the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters: UnicodeInputCharacter: UnicodeEscape RawInputCharacter UnicodeEscape: \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit UnicodeMarker: u UnicodeMarker u RawInputCharacter: any Unicode character HexDigit: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F The \, u, and hexadecimal digits here are all ASCII characters. --- Could you please attach the test case in binary format? When I tried to compile it, I got: c:\tests_sources>java -jar ecj-head.jar Test.java ---------- 1. ERROR in Test.java (at line 32) public static final String ERROR = "\u000Ôà½"; ^^^^^^ Invalid unicode ---------- 1 problem (1 error) Are you using a specific encoding ? Created attachment 202780 [details]
Input file (binary)
The file was UTF-8 encoded, which is the default on my system (LANG=en_US.utf8 is set in the environment, and eclipse etc are configured to use it).
The interesting character in the file is U+216B ROMAN NUMERAL TWELVE (which is a digit according to Character#isDigit() with the value 12)
Reproduced. I needed to pass -encoding UTF-8 to reproduce the issue. Fix is trivial. Created attachment 202994 [details]
Proposed fix
Released for 3.8M2. Verified for 3.8M2 using org.eclipse.jdt.core_3.8.0.v_C09.jar |