Community
Participate
Working Groups
Build Identifier: I20110803-1800 The following snippet compiles with ECJ (Eclipse Compiler for Java(TM) 0.C02, 3.8.0 M1, Copyright IBM Corp 2000, 2011. All rights reserved.), and leads to an error with Oracle (javac 1.7.0_02-ea): public class Test { public static final String ERROR = "\u000Ⅻ"; } verbose output of Oracle javac: --- [parsing started RegularFileObject[src/Test.java]] src/Test.java:32: error: illegal unicode escape public static final String ERROR = "\u000Ⅻ"; ^ [parsing completed 18ms] [total 45ms] 1 error --- Verbose output of ECJ: --- [parsing src/Test.java - #1/1] [reading java/lang/Object.class] [analyzing src/Test.java - #1/1] [reading java/lang/String.class] [writing Test.class - #1] [completed src/Test.java - #1/1] [1 unit compiled] [1 .class file generated] --- Reproducible: Always
Created attachment 202771 [details] Input file Attach the actual input file. The bug here (in my understanding) is that JLS explicitly says which characters are allowed for a unicode escape: --- http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3 3.3 Unicode Escapes Implementations first recognize Unicode escapes in their input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) with the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters: UnicodeInputCharacter: UnicodeEscape RawInputCharacter UnicodeEscape: \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit UnicodeMarker: u UnicodeMarker u RawInputCharacter: any Unicode character HexDigit: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F The \, u, and hexadecimal digits here are all ASCII characters. ---
Could you please attach the test case in binary format? When I tried to compile it, I got: c:\tests_sources>java -jar ecj-head.jar Test.java ---------- 1. ERROR in Test.java (at line 32) public static final String ERROR = "\u000Ôà½"; ^^^^^^ Invalid unicode ---------- 1 problem (1 error) Are you using a specific encoding ?
Created attachment 202780 [details] Input file (binary) The file was UTF-8 encoded, which is the default on my system (LANG=en_US.utf8 is set in the environment, and eclipse etc are configured to use it). The interesting character in the file is U+216B ROMAN NUMERAL TWELVE (which is a digit according to Character#isDigit() with the value 12)
Reproduced. I needed to pass -encoding UTF-8 to reproduce the issue. Fix is trivial.
Created attachment 202994 [details] Proposed fix
Released for 3.8M2.
Verified for 3.8M2 using org.eclipse.jdt.core_3.8.0.v_C09.jar