RE: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

RE: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal

From: Michael.Norman@xxxxxxxxxxxxx
Date: Tue, 31 Aug 2004 16:10:07 +0100
Delivered-to: hyades-dev@xxxxxxxxxxx
List-archive: <http://dev.eclipse.org/pipermail/hyades-dev/>
List-help: <mailto:hyades-dev-request@eclipse.org?subject=help>
List-subscribe: <http://dev.eclipse.org/mailman/listinfo/hyades-dev>, <mailto:hyades-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <http://dev.eclipse.org/mailman/listinfo/hyades-dev>, <mailto:hyades-dev-request@eclipse.org?subject=unsubscribe>

-----Original Message-----
From: hoang.m.nguyen [mailto:hoang.m.nguyen@xxxxxxxxx]
Sent: 31 August 2004 13:54
To: hyades-dev
Subject: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal

Hello all,

We have discussed and came to the following proposal in our last HCE 
team meeting regarding to UTF-8 usage:

If string data type is exchanged between the client and the HCE, it will 
be in the following format:

-          one byte indicates the size of the string length field: 2, 4 
or 8 bytes

-          the actual length field (2, 4 or 8 bytes)

-          UTF-8 byte stream with no embedded null byte and no 
terminating null byte

Here are some important points about this proposal:

*	This is only for string data type only. Other data types (int, 
float, double, etc.) are not affected. 

*	No embedded null byte in the string. 

If this requires, multiple strings should be created instead.

This should help C/C++ program to convert to its own string without 
additional checking.

*	No null byte at the end. 

If there is a length field in the beginning, it should not be forced to 
add one more the end

(just for C program convenience).

It will be very unnatural for Java programs to do so.

            The expectation is that most Java or C++ programs will 
probably

convert it to String objects before manipulating anyway.

*	The first length indicator byte is really for optimal solution and 
scalability 

-          most string lengths will be within 2-byte length size (my 
expectation is at 99%)

-          4-byte length is also possible (e.g. Allans environmental 
variable)

-          8-byte length is conceivable with the latest 64-bit memory 
addressing (as Frank de Jong pointed out)

As always, all inputs and comments are welcome and appreciated.

Regards,

Attachment: Eclipse Test Performance Project - Overview for LinuxWorld 8-5-04.ppt
Description: MS-Powerpoint presentation

Prev by Date: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal
Next by Date: [hyades-dev] junit and derivatives
Previous by thread: Re: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal
Next by thread: [hyades-dev] junit and derivatives
Index(es):
- Date
- Thread

Breadcrumbs