Community
Participate
Working Groups
Build Identifier: Build id: 20101111-1638 When I use '・ (U+30FB) ' character as a part of Java identifier, eclipse Java editor marks "Invalid Character" error on it. I cheched Character.isJavaIdentifierPart('・') value on Oracle JDK6u24 and confirm it returns true. I believe the eclipse incremental compiler has some bugs. Reproducible: Always Steps to Reproduce: 1. Create New Java Project. 2. Set Project encoding as UTF-8. 3. Create New Java class. 4. Add a field named "Test・Test" like //--- private String Test・Test; //---
Created attachment 190107 [details] Screenshot editor shows Invalid Character error
Created attachment 190108 [details] Screenshot of related error ?
Maybe reproducing process I wrote can't produce the probrom. I attached Screenshots. (It seems editor recognize '・ (U+30FB) ' as '.(U+002E)' ... ) And very curiously the error indication disappear when I operate as following 1. Open the class file by Java Editor. 2. Select all (Ctrl+A) 3. Cut all (Ctrl+X) 4. Paste it (Ctrl+V) 5. Save it (Ctrl+S)
(In reply to comment #0) > Steps to Reproduce: > 1. Create New Java Project. > 2. Set Project encoding as UTF-8. > 3. Create New Java class. > 4. Add a field named "Test・Test" > like > //--- > private String Test・Test; > //--- I couldn't reproduce using these steps(In reply to comment #3) > Maybe reproducing process I wrote can't produce the probrom. Yup. Seems so. Can you please attach a test case where you can consistently rerpoduce this problem? The screenshots aren't of much help. Thanks!
Created attachment 190110 [details] Reproducing class Please copy this file under the default package of UTF-8 project.
Your class doesn't compile using javac 1.6 or 1.7 with -encoding UTF-8. What do you get when you do: javac TPA.java -encoding UTF-8 ?
Created attachment 190149 [details] Compiled results Sorry maybe I had wrong recognition... I tried compile the class by JDK6 and JDK7 both. Results included in the attached zip. JDK6 compile process can generate class file, but JDK7 compile process abort with errors. I run my eclipse on JDK7's JVM, this problem maybe caused by JDK7 not by eclipse. I'll check JDK7 information.
I could successfully import TPA_UTF8.java inside a java project with its encoding set to UTF-8. It doesn't work if I use the batch compiler only with -encoding UTF-8. I am investigating. Would it be possible for you to provide a version of TPA_UTF8 that is using unicodes notation (\u....) for the Japanese characters ?
Created attachment 190156 [details] \u notation class file I converted the class file using native2ascii command.
(In reply to comment #7) > Created attachment 190149 [details] > Compiled results > > Sorry maybe I had wrong recognition... > > I tried compile the class by JDK6 and JDK7 both. > Results included in the attached zip. > JDK6 compile process can generate class file, > but JDK7 compile process abort with errors. > > I run my eclipse on JDK7's JVM, > this problem maybe caused by JDK7 not by eclipse. > > I'll check JDK7 information. As I wrote in comment #7, I reported BugParade about this matter. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7024053 This problem seems to be caused by JDK7 bug. Thank you.
\u30FB is under the Other Punctuation category. Checking the doc of the method: java.lang.Character.isJavaIdentifierPart(char) it doesn't look like this character should be part of a java identifier. What makes you believe this is the case ? I checked the latest Unicode version and it is considered to be in the "Po" category which matches the one from Java as well. Character.getType(..) returns 24. According to this, I consider this more like a bug in JDK6.
(In reply to comment #11) > \u30FB is under the Other Punctuation category. Checking the doc of the method: > java.lang.Character.isJavaIdentifierPart(char) it doesn't look like this > character should be part of a java identifier. > What makes you believe this is the case ? > > I checked the latest Unicode version and it is considered to be in the "Po" > category which matches the one from Java as well. Character.getType(..) returns > 24. > > According to this, I consider this more like a bug in JDK6. Yes, as you say \u30fb is Po/Pc character. In JDK6 , Character.getType('\u30fb') is 23 (Pc). And in JDK7 it returns 24 (Po). I don't care the type value of them, because the meaning of that is important in the point of natural language. I think the behavior of Java language should not change. Because Java specification has not been not changed. I'll keep watching the bug parade entry progress.
(In reply to comment #12) > I think the behavior of Java language should not change. > Because Java specification has not been not changed. This is wrong. JDK7 is supporting a newer version of the Unicode specification (6.0). So JDK7 might change according to the version of the Unicode it supports. From the Unicode code character database, this character is under the Po category. I don't find it under the "Pc" category. If it would be under "Pc", then it should be accepted as this is a connecting punctuation character. Where did you find it under the "Pc" category ?
Without any clarification how you found it under the Pc category, I'll close as INVALID.
Please check the following values in JDK6 and JDK7. Character.getType('¥u30fb') In JDK6, it returns 23 ,is Character.CONNECTOR_PUNCTUATION. (Pc) In JDK7, it returns 24 ,is Character.OTHER_PUNCTUATIO. (Po) As I reported this problem to bugparade. It marked as DUPLICATED to other bug. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6990687 '¥u30fb' is defined in Unicode 1.1 and not changed following versions, so I expected the behavior would not change. Anyway this seems some bug of JDK according to bugparade database, I think you may close this Bug 338623 entry. Thanks.
I think JDK7 is right and JDK6 is wrong. In the unicode database (6.0), this character is under the Po category. Closing as NOT_ECLIPSE.
Verified for 3.7M7