[hyades-dev] More info on Java UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[hyades-dev] More info on Java UTF-8

From: "Nguyen, Hoang M" <hoang.m.nguyen@xxxxxxxxx>
Date: Wed, 18 Aug 2004 15:47:24 -0700
Delivered-to: hyades-dev@xxxxxxxxxxx
List-archive: <http://dev.eclipse.org/pipermail/hyades-dev/>
List-help: <mailto:hyades-dev-request@eclipse.org?subject=help>
List-subscribe: <http://dev.eclipse.org/mailman/listinfo/hyades-dev>, <mailto:hyades-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <http://dev.eclipse.org/mailman/listinfo/hyades-dev>, <mailto:hyades-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: AcSFdVRM8qxnirAmRhGxsp8TODBIGA==
Thread-topic: More info on Java UTF-8

Hello all,

If we want to adopt the Java UTF-8 form, we may want to consider adopting its data structure as well.

Here is the spec of UTF-8 data structure in Java Virtual Machine (JVM)

http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#7963

2-byte length
followed by the UTF-8 byte stream
length does not contain the null character

In addition, I have verified that Java handles the single null byte translation as well.

Please see attached programs.

We can discuss more and get some resolution on this issue in our weekly meeting.

Regards,

This demo program shows that Java can handle

UTF-8 file with null byte is translated as one byte.

import java.io.* ;

public class MyUTF8Output

{

public static void main(String args[])

{

FileOutputStream fos ;

OutputStreamWriter osw ;

char[] msg = {'A', '\u0000', 'B', '\u0080', 'C', '\u0000'} ;

try

{

String s = new String(msg) ;

fos = new FileOutputStream("myoutput.txt");

osw = new OutputStreamWriter(fos, "UTF-8");

osw.write(s) ;

osw.flush() ;

fos.close();

System.out.println("See \"myoutput.txt\" file.") ;

}

catch (Exception e) { }

}

This demo program shows that Java UTF-8 format is:

- 2-byte leng of the UTF-8 buffer

- null byte is mapped into two bytes

import java.io.* ;

public class MyUTF8Conversion

{

public static void main(String args[])

{

FileOutputStream fos ;

DataOutputStream dos ;

char[] msg = {'A', '\u0000', 'B', '\u0080', 'C', '\u0000'} ;

try

{

String s = new String(msg) ;

fos = new FileOutputStream("myoutput2.txt");

dos = new DataOutputStream(fos);

dos.writeUTF(s) ;

dos.flush() ;

fos.close();

System.out.println("See \"myoutput2.txt\" file.") ;

}

catch (Exception e) { }

}

Follow-Ups:
- Re: [hyades-dev] More info on Java UTF-8
  - From: Allan K Pratt

Prev by Date: [hyades-dev] Reminder for the Hyades Collection Engine meeting for this week
Next by Date: Re: [hyades-dev] More info on Java UTF-8
Previous by thread: [hyades-dev] Reminder for the Hyades Collection Engine meeting for this week
Next by thread: Re: [hyades-dev] More info on Java UTF-8
Index(es):
- Date
- Thread

Breadcrumbs