[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [equinox-dev] Equinox and UTF-8

Hi Holger,
To get more consistent results, use String.getBytes("UTF8"). The getBytes() method uses default encoding. I've read that Windows has different default encodings for GUI and console applications. If that is true, it might explain why you see different outputs.

Hmm now that I started to think about it I need to check something in my code.... :-).


Holger Mense <mail@xxxxxxxxxxxxxxx>
Sent by: equinox-dev-bounces@xxxxxxxxxxx

07/09/2008 02:35 PM

Please respond to
Equinox development mailing list <equinox-dev@xxxxxxxxxxx>

[equinox-dev] Equinox and UTF-8


I am in struggle with UTF-8 encoding of strings while using Equinox 3.4
and now seeking for some help.

I have the following code as example, which encodes a string
in two different ways into a byte representation. First by cast, then
with usage of String.getBytes(). The code is used inside a bundle.

=== cut ===
       String data = "">
       byte[] dataBytes = data.getBytes();
       System.out.println(data+" length() = "+data.length());

       System.out.print(data+" cast to byte = ");
       for (int i=0; i<data.length(); i++)
           System.out.print((byte)data.charAt(i)+" ");

       System.out.print("\r\n"+data+" getBytes() = ");

       for (int i=0; i<dataBytes.length; i++)
           System.out.print(dataBytes[i]+" ");

=== cut ===

Executing this inside Eclipse as part of an OSGi framework leads to:

=== cut ===
§ length() = 1
§ cast to byte = -89
§ getBytes() = -62 -89
===  cut ===

The result is the same, when starting this as a Java application
inside Eclipse.

When running the same code inside the Equinox framework on a command
shell using the
following command line

# java -Dfile.encoding=UTF-8
-Dosgi.bundles=reference\:file\:com.example.utf8_1.0.0.jar@start -jar
org.eclipse.osgi_3.4.0.v20080605-1900.jar -console -clean

it prints

=== cut ===
+é-º length() = 2
+é-º cast to byte = -62 -89
+é-º getBytes() = -61 -126 -62 -89
=== cut ===

Executing the code on a command shell as normal Java application leads
then to:

=== cut ===
-º length() = 1
-º cast to byte = -89
-º getBytes() = -62 -89
=== cut ===

In all cases I am always trying to encode a paragraph symbol. The output
is distorted because of the command shell lacking UTF-8 support.

What is the cause for the different results? What is the proper way to
get always the same results, irrespective of the execution inside or
outside of Eclipse? What must be done to have fully UTF-8 support as
default when using getBytes() and so on?

Software used: Windows XP, Eclipse 3.4, Equinox 3.4

Thanks for your help,

Holger Mense

Holger Mense                                http://www.holger-mense.de

equinox-dev mailing list

Attachment: signature.asc
Description: Binary data