[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
RE: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal
|
-----Original Message-----
From: hoang.m.nguyen [mailto:hoang.m.nguyen@xxxxxxxxx]
Sent: 31 August 2004 13:54
To: hyades-dev
Subject: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal
Hello all,
We have discussed and came to the following proposal in our last HCE
team meeting regarding to UTF-8 usage:
If string data type is exchanged between the client and the HCE, it will
be in the following format:
- one byte indicates the size of the string length field: 2, 4
or 8 bytes
- the actual length field (2, 4 or 8 bytes)
- UTF-8 byte stream with no embedded null byte and no
terminating null byte
Here are some important points about this proposal:
* This is only for string data type only. Other data types (int,
float, double, etc.) are not affected.
* No embedded null byte in the string.
If this requires, multiple strings should be created instead.
This should help C/C++ program to convert to its own string without
additional checking.
* No null byte at the end.
If there is a length field in the beginning, it should not be forced to
add one more the end
(just for C program convenience).
It will be very unnatural for Java programs to do so.
The expectation is that most Java or C++ programs will
probably
convert it to String objects before manipulating anyway.
* The first length indicator byte is really for optimal solution and
scalability
- most string lengths will be within 2-byte length size (my
expectation is at 99%)
- 4-byte length is also possible (e.g. Allans environmental
variable)
- 8-byte length is conceivable with the latest 64-bit memory
addressing (as Frank de Jong pointed out)
As always, all inputs and comments are welcome and appreciated.
Regards,
Attachment:
Eclipse Test Performance Project - Overview for LinuxWorld 8-5-04.ppt
Description: MS-Powerpoint presentation