Bug 13463 - Folder with Japanese characters disappears
Summary: Folder with Japanese characters disappears
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows 2000
: P3 normal (vote)
Target Milestone: 2.1 M5   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords: nl
: 21190 29584 38794 (view as bug list)
Depends on:
Blocks: 9330
  Show dependency tree
 
Reported: 2002-04-10 17:24 EDT by Jed Anderson CLA
Modified: 2003-06-11 18:00 EDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jed Anderson CLA 2002-04-10 17:24:21 EDT
Build: 20020409

I created a simple project named SA, in the resources perspective (not a java 
project).  I then made a folder in SA named "三" (<- Kanji character).  I did a 
refresh, and the folder disappeared.  Then I did another refresh and a file 
appeared named "三".

The machine I am working on is a standard English Win 2K box.

To set my machine up to use Japanese characters I went to the control panel 
regional settings.  I then set my local to be Japanese, and added Japanese as a 
language my machine could handle.  I then switched to the Input Locals tab and 
set my keyboard layout to be Japanese Input System (MS-IME2000).
Comment 1 Jed Anderson CLA 2002-04-10 17:25:43 EDT
In addition to refresh from local not working, when I attempted to create a 
file A.java in the folder it did not work.
Comment 2 Tod Creasey CLA 2002-04-11 11:50:46 EDT
I have tried this out on a Japanese and an English install of 2000 and it is 
possible to create a Japanese folder on a Japanese machine but it is not 
possible to create a Japanese folder on an English machine.

Moving to Core as this is an issue with Resources with foreign characters.
Comment 3 DJ Houghton CLA 2002-04-11 11:55:07 EDT
what vm are you using?
Comment 4 Tod Creasey CLA 2002-04-11 12:02:23 EDT
I used two vms for this test with the same result

The 1.3.1 GM Hursley VM  and the 1.4 Sun JVM
Comment 5 Jed Anderson CLA 2002-04-11 14:41:23 EDT
I was using:

C:\jdk1.4.1\bin>java -version
java version "1.4.1-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-beta-b02)
Java HotSpot(TM) Client VM (build 1.4.1-beta-b02, mixed mode)

I will try it on 1.4.0.
Comment 6 Jed Anderson CLA 2002-04-11 14:44:31 EDT
I get the same behaviour on 

C:\jdk1.4\bin>java -version
java version "1.4.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-rc-b91)
Java HotSpot(TM) Client VM (build 1.4.0-rc-b91, mixed mode)
Comment 7 John Arthorne CLA 2002-04-12 10:20:07 EDT
This is the same problem I ran into yesterday.  We are using the default 
platform encoding to convert filenames into byte[] for our native method.  When 
you take an English OS and change the language to Japanese, I suspect that the 
default encoding cannot handle non-latin characters (because it is still an 
English OS).  Unfortunately, I don't know how to write this encoding function so 
that it will work on such a setup.

One interesting thing to try, is to delete the Core DLL in 
org.eclipse.core.resources plugin.  This will then revert to using java.io 
functions, which should work.
Comment 8 Jed Anderson CLA 2002-04-12 10:46:42 EDT
When I remove the dll & perform a refresh, the folder comes back as a folder 
with the correct name.

However...

Now I can't create a project.  Whenever I try I get a dialog saying

File /<project name>/.project is read-only.

Should I enter a separate bug for this?
Comment 9 John Arthorne CLA 2002-04-12 11:40:39 EDT
You could be experiencing bug 12769.  If you still have this problem in build 
20020411 or greater, please log a separate PR.  Core should function correctly 
without the DLL present.
Comment 10 John Arthorne CLA 2002-07-25 14:13:45 EDT
*** Bug 21190 has been marked as a duplicate of this bug. ***
Comment 11 Jed Anderson CLA 2003-01-23 12:31:33 EST
Additional news on this bug.

If you remove the core natives, this behaviour does not occur.  Therefore, I
suspect that the natives are equipped to handle DBCS.
Comment 12 Jed Anderson CLA 2003-01-28 14:49:03 EST
I have debugged through this case.  When the natives fail they return zero for 
the file stat.  CoreFileSystemLibrary.getStat(String) is not checking for the 
zero return value.  By adding the check, you can drop through to the pure java 
file checks and thereby rescue the native failure.  

I suggest we change CoreFileSystemLibrary.getStat(String) to the following 
method declaration.  I have tested this fix on this test case and it solves the 
problem, but since this bug is deep in the bowels of the core, I wouldn't feel 
comfortable seeing it released without further testing.  Do you have more NL 
testing facilities?

Finally, I have raised the severity of this bug to critical.  I believe this is 
prudent because this failure makes it impossible to have folders with Japanese 
characters in them.

public static long getStat(String fileName) {
	if (hasNatives) {
		long stat= internalGetStat(Convert.toPlatformBytes(fileName));
		if (stat != 0) {
			return stat;
		}
	}

	// inlined (no native) implementation
	File target = new File(fileName);
	long result = target.lastModified();
	if (result == 0) // non-existing
		return result;
	result |= STAT_VALID;
	if (target.isDirectory())
		result |= STAT_FOLDER;
	if (!(new File(fileName).canWrite()))
		result |= STAT_READ_ONLY;
	return result;
}
Comment 13 Tod Creasey CLA 2003-01-28 14:53:02 EST
If you have XP or 2000 ou have sufficient facilities to test this. Just go to 
Regional and Language Options and change Regional Options to Japanese.
Comment 14 John Arthorne CLA 2003-01-28 15:21:59 EST
This bug doesn't prevent you from using Japanese characters on a Japanese
machine.  It prevents using Japanese characters on an English machine, or more
generally, from using characters that don't belong to the default character
encoding for that machine, as detected by Java.  On a Japanese install of
Windows, this bug doesn't occur.  That's why the severity was not marked as
critical.

Your workaround is a good idea, but it's not a fix.  It's just handling failure
of the native more gracefully.  I have no problem with applying that change, but
the real fix would be to make our native handle this case.  I believe Rafael is
investigating that in relation to bug 29584.
Comment 15 Jed Anderson CLA 2003-01-28 16:49:23 EST
As far as I know I am running on a Japanese box.  It is Win2k installed from a
Japanese installation disk of Win2k (not from an English version).  All (most)
of the menus are in Japanese.  Is there some way to determine if it is a "true"
Japanese box?

I completely agree that the real fix for this is to fix the natives.
Comment 16 John Arthorne CLA 2003-01-28 17:10:33 EST
Another thing to check is to ensure you're creating a folder name using the
Japanese character set.  For example, creating a folder with a German character
(such as U umlat), on a Japanese machine, will cause the same refresh problem. 
Also, what build of Eclipse and what VM are you using?
Comment 17 Jed Anderson CLA 2003-01-28 17:25:25 EST
I have reproduced the state in which the natives do _not_ fail.  Hopefully this 
insight will lead to a better understanding of how this all works.

It was not enough for me to simply switch to the Japanese region.  I also had 
to set the default language to be Japanese.  Since the copy of windows I am 
running on was installed from a Japanese disk, it had set the Japanese region. 
Ironically, the default language was still "Western Europe and United States".  
Now that I have changed that, the natives do not fail.

Any insight into why changing the default language would cause our natives to 
fail/work?
Comment 18 Jed Anderson CLA 2003-01-28 17:32:33 EST
More info:

build: 20030128
jdk: sun jdk1.4.1 beta 11

Also, the characters I am using to name files have been (recently) the 
characters that windows puts as the filenames for new files/folders.  I am 
assuming that these characters are Japanese.  I have also (in the past) used a 
character picker that comes with windows to pick characters from the Japanese 
character set.
Comment 19 John Arthorne CLA 2003-01-28 17:34:46 EST
Java detects the default (char to byte) encoding for us.  Presumably this
reaches into the OS and asks it for the encoding.  This would appear to return a
different result when you change language options.  The whole problem is that
we're using the wrong encoding.  The question is, what encoding do we need to
use, and how can we figure that out?
Comment 20 Tod Creasey CLA 2003-01-29 07:53:14 EST
I sounds like you are - if your start Menu is in Japanese then you are running 
with a Japanese install.

If you change your regional setting you change your Locale.
Comment 21 DJ Houghton CLA 2003-02-04 17:49:56 EST
New DLL was created and verified that this now works.
Released to HEAD.
Comment 22 DJ Houghton CLA 2003-02-04 17:50:59 EST
*** Bug 29584 has been marked as a duplicate of this bug. ***
Comment 23 Rafael Chaves CLA 2003-02-05 11:02:34 EST
It is worth mentioning that the fix solves generally the problem of supporting 
files/folders with names containing characters not supported by the default 
platform encoding, but it requires:
- Windows NT 4, 2K, XP
- 1.4-level JRE
Comment 24 Jed Anderson CLA 2003-02-05 12:58:50 EST
Follow up information:

When I changed the default language, the default encoding returned by the
following statement did not change.

new java.io.InputStreamReader(new java.io.ByteArrayInputStream(new
byte[0])).getEncoding();

This means that windows is treating the bytes differently when the default
language is set to a DBCS language.

Rafael, why does this fix pre-req JDK 1.4?
Comment 25 Rafael Chaves CLA 2003-02-05 13:45:08 EST
Testing with both Sun/IBM JRE 1.3, I noticed that java.io.File#listFiles 
produces wrong results when listing the contents of a directory that contains 
files/directories whose names contain characters not supported in the default 
encoding (wrong names, java.io.File#exists() returns false).

Comment 26 Rafael Chaves CLA 2003-02-06 14:30:06 EST
I could verify that this fix now works also on the new Sun JDK/JRE 1.3.1_07.

From JavaSoft's bug parade:

java.io: Cannot create files with full Unicode names (Win32/NT) 
http://developer.java.sun.com/developer/bugParade/bugs/4185525.html
Comment 27 Masayuki Fuse CLA 2003-02-10 01:13:06 EST
Per requesting of DJ, I've attempted verification with M5 on my Japanese 
Windows2000 by changing System locale to English and input Japanese charcaters.
This problem was reproducible on IBM Java 1.3.1-SR3, however not on Sun Java 
1.3.1-07 and IBM Java 1.4.0.
 
Comment 28 Rafael Chaves CLA 2003-06-11 18:00:25 EDT
*** Bug 38794 has been marked as a duplicate of this bug. ***