Hello all,
We have discussed and came to the following proposal in our
last HCE team meeting regarding to UTF-8 usage:
If string data type is exchanged
between the client and the HCE, it will be in the following format:
-
one byte indicates the size of the string length
field: 2, 4 or 8 bytes
-
the actual length field (2, 4 or 8 bytes)
-
UTF-8 byte stream with no embedded null byte and no
terminating null byte
Here are some important points about this proposal:
- This is only for string data type only. Other data
types (int, float, double, etc.) are not affected.
- No embedded null byte in the string.
If this requires, multiple strings
should be created instead.
This should help C/C++ program to
convert to its own string without additional checking.
If there is a length field in the
beginning, it should not be forced to add one more the end
(just for C program convenience).
It will be very unnatural for Java
programs to do so.
The
expectation is that most Java or C++ programs will probably
convert it to String objects before
manipulating anyway.
- The first length indicator byte is really for optimal
solution and scalability
-
most string lengths will be within 2-byte length size
(my expectation is at 99%)
-
4-byte length is also possible (e.g. Allan’s
environmental variable)
-
8-byte length is conceivable with the latest 64-bit
memory addressing (as Frank de Jong pointed out)
As always, all inputs and comments are welcome and
appreciated.
Regards,