Bug 356746 - ECJ accepts illegal unicode escape sequences
Summary: ECJ accepts illegal unicode escape sequences
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.7   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 3.8 M2   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-05 13:50 EDT by Andreas Kohn CLA
Modified: 2011-09-12 17:31 EDT (History)
4 users (show)

See Also:


Attachments
Input file (1.09 KB, text/x-java)
2011-09-05 13:57 EDT, Andreas Kohn CLA
no flags Details
Input file (binary) (1.09 KB, application/octet-stream)
2011-09-06 02:09 EDT, Andreas Kohn CLA
no flags Details
Proposed fix (22.32 KB, patch)
2011-09-08 10:19 EDT, Olivier Thomann CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Kohn CLA 2011-09-05 13:50:54 EDT
Build Identifier: I20110803-1800

The following snippet compiles with ECJ (Eclipse Compiler for Java(TM) 0.C02, 3.8.0 M1, Copyright IBM Corp 2000, 2011. All rights reserved.), and leads to an error with Oracle (javac 1.7.0_02-ea):

public class Test {
	public static final String ERROR = "\u000Ⅻ";
}


verbose output of Oracle javac:
---
[parsing started RegularFileObject[src/Test.java]]
src/Test.java:32: error: illegal unicode escape
	public static final String ERROR = "\u000Ⅻ";
	                                         ^
[parsing completed 18ms]
[total 45ms]
1 error
---

Verbose output of ECJ:
---
[parsing    src/Test.java - #1/1]
[reading    java/lang/Object.class]
[analyzing  src/Test.java - #1/1]
[reading    java/lang/String.class]
[writing    Test.class - #1]
[completed  src/Test.java - #1/1]
[1 unit compiled]
[1 .class file generated]
---

Reproducible: Always
Comment 1 Andreas Kohn CLA 2011-09-05 13:57:45 EDT
Created attachment 202771 [details]
Input file

Attach the actual input file. 

The bug here (in my understanding) is that JLS explicitly says which characters are allowed for a unicode escape:

--- http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.3
3.3 Unicode Escapes
Implementations first recognize Unicode escapes in their input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) with the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters:


    UnicodeInputCharacter:
            UnicodeEscape
            RawInputCharacter

    UnicodeEscape:
            \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

    UnicodeMarker:
            u
            UnicodeMarker u

    RawInputCharacter:
            any Unicode character

    HexDigit: one of
            0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

The \, u, and hexadecimal digits here are all ASCII characters.
---
Comment 2 Olivier Thomann CLA 2011-09-05 14:40:40 EDT
Could you please attach the test case in binary format?
When I tried to compile it, I got:
c:\tests_sources>java -jar ecj-head.jar Test.java
----------
1. ERROR in Test.java (at line 32)
        public static final String ERROR = "\u000Ôà½";
                                            ^^^^^^
Invalid unicode
----------
1 problem (1 error)

Are you using a specific encoding ?
Comment 3 Andreas Kohn CLA 2011-09-06 02:09:00 EDT
Created attachment 202780 [details]
Input file (binary)

The file was UTF-8 encoded, which is the default on my system (LANG=en_US.utf8 is set in the environment, and eclipse etc are configured to use it).

The interesting character in the file is U+216B ROMAN NUMERAL TWELVE (which is a digit according to Character#isDigit() with the value 12)
Comment 4 Olivier Thomann CLA 2011-09-06 08:18:52 EDT
Reproduced. I needed to pass -encoding UTF-8 to reproduce the issue.
Fix is trivial.
Comment 5 Olivier Thomann CLA 2011-09-08 10:19:44 EDT
Created attachment 202994 [details]
Proposed fix
Comment 6 Olivier Thomann CLA 2011-09-08 10:19:59 EDT
Released for 3.8M2.
Comment 7 Ayushman Jain CLA 2011-09-12 17:31:26 EDT
Verified for 3.8M2 using org.eclipse.jdt.core_3.8.0.v_C09.jar