338623 – JavaEditor marks '・ (U+30FB) ' as Invalid Character though Character.isJavaIdentifierPart('・') == true

Bug 338623 - JavaEditor marks '・ (U+30FB) ' as Invalid Character though Character.isJavaIdentifierPart('・') == true

Summary: JavaEditor marks '・ (U+30FB) ' as Invalid Character though Character.isJavaI...

Status:	VERIFIED NOT_ECLIPSE

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.7
Hardware:	All All

Importance:	P3 normal (vote)
Target Milestone:	3.7 M7
Assignee:	Olivier Thomann
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-03-02 01:45 EST by Missing name
Modified:	2011-04-25 04:27 EDT (History)
CC List:	3 users (show)

See Also:

Attachments
Screenshot editor shows Invalid Character error (13.90 KB, image/png) 2011-03-02 02:17 EST, Missing name	no flags	Details
Screenshot of related error ? (31.65 KB, image/png) 2011-03-02 02:18 EST, Missing name	no flags	Details
Reproducing class (6.10 KB, application/octet-stream) 2011-03-02 03:43 EST, Missing name	no flags	Details
Compiled results (12.36 KB, application/x-zip-compressed) 2011-03-02 09:52 EST, Missing name	no flags	Details
\u notation class file (8.30 KB, text/plain) 2011-03-02 10:20 EST, Missing name	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Missing name

2011-03-02 01:45:41 EST

Build Identifier: Build id: 20101111-1638

When I use '・ (U+30FB) ' character as a part of Java identifier,
eclipse Java editor marks "Invalid Character" error on it.

I cheched Character.isJavaIdentifierPart('・') value on Oracle JDK6u24
and confirm it returns true.

I believe the eclipse incremental compiler has some bugs.

Reproducible: Always

Steps to Reproduce:
1. Create New Java Project.
2. Set Project encoding as UTF-8.
3. Create New Java class.
4. Add a field named "Test・Test" 
 like 
//---
private String Test・Test;
//---

Comment 1 Missing name

2011-03-02 02:17:32 EST

Created attachment 190107 [details]
Screenshot editor shows Invalid Character error

Comment 2 Missing name

2011-03-02 02:18:35 EST

Created attachment 190108 [details]
Screenshot of related error ?

Comment 3 Missing name

2011-03-02 02:30:08 EST

Maybe reproducing process I wrote can't produce the probrom.

I attached Screenshots.
(It seems editor recognize '・ (U+30FB) ' as '.(U+002E)' ... )

And very curiously the error indication disappear
 when I operate as following

1. Open the class file by Java Editor.
2. Select all (Ctrl+A)
3. Cut all (Ctrl+X)
4. Paste it (Ctrl+V)
5. Save it (Ctrl+S)

Comment 4 Ayushman Jain

2011-03-02 03:29:08 EST

(In reply to comment #0)
> Steps to Reproduce:
> 1. Create New Java Project.
> 2. Set Project encoding as UTF-8.
> 3. Create New Java class.
> 4. Add a field named "Test・Test" 
>  like 
> //---
> private String Test・Test;
> //---

I couldn't reproduce using these steps(In reply to comment #3)
> Maybe reproducing process I wrote can't produce the probrom.

Yup. Seems so. Can you please attach a test case where you can consistently rerpoduce this problem? The screenshots aren't of much help. Thanks!

Comment 5 Missing name

2011-03-02 03:43:30 EST

Created attachment 190110 [details]
Reproducing class

Please copy this file under the default package of UTF-8 project.

Comment 6 Olivier Thomann

2011-03-02 08:53:22 EST

Your class doesn't compile using javac 1.6 or 1.7 with -encoding UTF-8.
What do you get when you do:
javac TPA.java -encoding UTF-8
?

Comment 7 Missing name

2011-03-02 09:52:37 EST

Created attachment 190149 [details]
Compiled results

Sorry maybe I had wrong recognition...

I tried compile the class by JDK6 and JDK7 both.
Results included in the attached zip.
JDK6 compile process can generate class file,
but JDK7 compile process abort with errors.

 I run my eclipse on JDK7's JVM,
this problem maybe caused by JDK7 not by eclipse.

I'll check JDK7 information.

Comment 8 Olivier Thomann

2011-03-02 10:08:55 EST

I could successfully import TPA_UTF8.java inside a java project with its encoding set to UTF-8.
It doesn't work if I use the batch compiler only with -encoding UTF-8.
I am investigating.

Would it be possible for you to provide a version of TPA_UTF8 that is using unicodes notation (\u....) for the Japanese characters ?

Comment 9 Missing name

2011-03-02 10:20:58 EST

Created attachment 190156 [details]
\u notation class file

I converted the class file using native2ascii command.

Comment 10 Missing name

2011-03-07 06:56:27 EST

(In reply to comment #7)
> Created attachment 190149 [details]
> Compiled results
> 
> Sorry maybe I had wrong recognition...
> 
> I tried compile the class by JDK6 and JDK7 both.
> Results included in the attached zip.
> JDK6 compile process can generate class file,
> but JDK7 compile process abort with errors.
> 
>  I run my eclipse on JDK7's JVM,
> this problem maybe caused by JDK7 not by eclipse.
> 
> I'll check JDK7 information.


As I wrote in comment #7,
I reported BugParade about this matter.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7024053

This problem seems to be caused by JDK7 bug.

Thank you.

Comment 11 Olivier Thomann

2011-03-07 09:28:21 EST

\u30FB is under the Other Punctuation category. Checking the doc of the method:
java.lang.Character.isJavaIdentifierPart(char) it doesn't look like this character should be part of a java identifier.
What makes you believe this is the case ?

I checked the latest Unicode version and it is considered to be in the "Po" category which matches the one from Java as well. Character.getType(..) returns 24.

According to this, I consider this more like a bug in JDK6.

Comment 12 Missing name

2011-03-07 10:18:58 EST

(In reply to comment #11)
> \u30FB is under the Other Punctuation category. Checking the doc of the method:
> java.lang.Character.isJavaIdentifierPart(char) it doesn't look like this
> character should be part of a java identifier.
> What makes you believe this is the case ?
> 
> I checked the latest Unicode version and it is considered to be in the "Po"
> category which matches the one from Java as well. Character.getType(..) returns
> 24.
> 
> According to this, I consider this more like a bug in JDK6.

Yes, as you say \u30fb is Po/Pc character.
In JDK6 , Character.getType('\u30fb') is 23 (Pc).
And in JDK7 it returns 24 (Po).

I don't care the type value of them,
because the meaning of that is important in the point of natural language.

I think the behavior of Java language should not change.
Because Java specification has not been not changed.

I'll keep watching the bug parade entry progress.

Comment 13 Olivier Thomann

2011-03-07 10:34:15 EST

(In reply to comment #12)
> I think the behavior of Java language should not change.
> Because Java specification has not been not changed.
This is wrong. JDK7 is supporting a newer version of the Unicode specification (6.0). So JDK7 might change according to the version of the Unicode it supports.

From the Unicode code character database, this character is under the Po category. I don't find it under the "Pc" category. If it would be under "Pc", then it should be accepted as this is a connecting punctuation character.

Where did you find it under the "Pc" category ?

Comment 14 Olivier Thomann

2011-03-15 10:12:50 EDT

Without any clarification how you found it under the Pc category, I'll close as INVALID.

Comment 15 Missing name

2011-03-15 23:41:40 EDT

Please check the following values in JDK6 and JDK7.

Character.getType('¥u30fb')

In JDK6, it returns 23 ,is Character.CONNECTOR_PUNCTUATION. (Pc)
In JDK7, it returns 24 ,is Character.OTHER_PUNCTUATIO. (Po)

As I reported this problem to bugparade.
It marked as DUPLICATED to other bug.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6990687

'¥u30fb' is defined in Unicode 1.1 and not changed following versions,
so I expected the behavior would not change.

Anyway this seems some bug of JDK according to bugparade database,
I think you may close this Bug 338623 entry.

Thanks.

Comment 16 Olivier Thomann

2011-03-16 08:53:08 EDT

I think JDK7 is right and JDK6 is wrong.
In the unicode database (6.0), this character is under the Po category.

Closing as NOT_ECLIPSE.

Comment 17 Satyam Kandula

2011-04-25 03:04:58 EDT

Verified for 3.7M7