Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[hyades-dev] UTF-8 in HCE Data Exchange Format Proposal

 

Hello all,

 

We have discussed and came to the following proposal in our last HCE team meeting regarding to UTF-8 usage:

 

If string data type is exchanged between the client and the HCE, it will be in the following format:

-          one byte indicates the size of the string length field: 2, 4 or 8 bytes

-          the actual length field (2, 4 or 8 bytes)

-          UTF-8 byte stream with no embedded null byte and no terminating null byte

 

Here are some important points about this proposal:

 

  • This is only for string data type only. Other data types (int, float, double, etc.) are not affected.

 

  • No embedded null byte in the string.

If this requires, multiple strings should be created instead.

This should help C/C++ program to convert to its own string without additional checking.

 

  • No null byte at the end.

If there is a length field in the beginning, it should not be forced to add one more the end

(just for C program convenience).

It will be very unnatural for Java programs to do so.

 

            The expectation is that most Java or C++ programs will probably

convert it to String objects before manipulating anyway.

 

  • The first length indicator byte is really for optimal solution and scalability

-          most string lengths will be within 2-byte length size (my expectation is at 99%)

-          4-byte length is also possible (e.g. Allan’s environmental variable)

-          8-byte length is conceivable with the latest 64-bit memory addressing (as Frank de Jong pointed out)

 

As always, all inputs and comments are welcome and appreciated.

 

Regards,

 

 


Back to the top