[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[news.eclipse.platform] Re: Character encoding problem (on Windows)

Thanks for your reply Alex. Actually, I was referring to text that was hard-coded in the .java file. For example:

	myLabel.setText("TEXT IN UTF8 ENCODING");

I have found the solution and am posting for other people's reference.

The SUN JDK/JRE on my machine correctly detects UTF8 as the default encoding. This why at runtime, the command outputs UTF8:

	System.getProperty("file.encoding")

The encoding at runtime is detected by Sun's JDK so it is correct. If I compile using Sun's javac the text is ok at runtime. The javac command has a -encoding parameter to set the input/output file encoding, though I don't need it because the detection works fine.

Eclipse however uses its own compiler (not Sun's javac) which somehow does not prefer UTF8 (at least on my system). Therefore the output (and input) of Eclipse's compiler (i.e. the .class files produced) is not encoded as UTF8. At runtime, one will still see the "file.encoding" property as UTF8 since this is what Sun's runtime detects, but the strings in the .class file are using another encoding so it is too late.

To fix the problem, one must tell the Eclipse compiler to use UTF8 for both source (.java) and output (.class) files. This is done as follows:

1) Go to Window->Preferences->General->Content Types
2) Select "Text" at the top tree-list.
3) Specify "UTF8" in the bottom text box labeled "Default encoding".
4) Click on the "Update" button.
5) Select "Java Class File" at the top tree-list.
6) Repeat (3) and (4)
7) Click OK to save preferences.
8) Clean the project so that it is rebuilt (Project->Clean...)

You may want to customize step (2) and expand "Text" to select "Java Source File" so that you may use other encodings for non-java text files (such as HTML, etc.) according to your needs.

Alex Blewitt wrote:

The .properties files aren't treated as UTF-8 when they're loaded. Instead, they're "escaped ASCII" with \u1234 representing unicode character 1234. You can use the 'native2ascii' converter to generate these files, though I'm not sure what you need to do to tell it that the input is UTF-8.

That's basically because the properties files are loaded with the java.util.Properties object, which uses this crappy format instead of something sensible; and hence, internationalised messages must also use the Crappy Format(TM).

Alex.