[hyades-dev] UTF-8 in HCE Data Exchange Format Proposal

[hyades-dev] UTF-8 in HCE Data Exchange Format Proposal

Hello all,

We have discussed and came to the following proposal in our last HCE team meeting regarding to UTF-8 usage:

If string data type is exchanged between the client and the HCE, it will be in the following format:

- one byte indicates the size of the string length field: 2, 4 or 8 bytes

- the actual length field (2, 4 or 8 bytes)

- UTF-8 byte stream with no embedded null byte and no terminating null byte

Here are some important points about this proposal:

This is only for string data type only. Other data types (int, float, double, etc.) are not affected.

If this requires, multiple strings should be created instead.

This should help C/C++ program to convert to its own string without additional checking.

If there is a length field in the beginning, it should not be forced to add one more the end

(just for C program convenience).

It will be very unnatural for Java programs to do so.

The expectation is that most Java or C++ programs will probably

convert it to String objects before manipulating anyway.

- most string lengths will be within 2-byte length size (my expectation is at 99%)

- 4-byte length is also possible (e.g. Allan’s environmental variable)

- 8-byte length is conceivable with the latest 64-bit memory addressing (as Frank de Jong pointed out)

As always, all inputs and comments are welcome and appreciated.

Regards,

Follow-Ups:
- Re: [hyades-dev] UTF-8 in HCE Data Exchange Format Proposal
  - From: Allan K Pratt