[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[hyades-dev] Re: More info on Java UTF-8
|
All,
I'm new to the hyades-dev list, so please bear with me if I say anything
stupid.
Regarding the encoding of strings, I found the way that the Java
serialization protocol handles it makes a lot of sense, see:
http://java.sun.com/j2se/1.4.2/docs/guide/serialization/spec/protocol.ht
ml#wp8101
The main idea is that a string also has a single header byte indicating
it is a short string (up to 64k) or a long string (longer than 64k).
Depending on the header byte, the following 2 or 8 (!) bytes are used to
encode length. You might think a 64-bit length field is a bit overdone,
but since long strings are by definition longer than 64k, the allocation
of 8 bytes more won't make a difference, and there is ample room for
expansion.
The good thing about using this format is that you have a ready-made
interface in Java to generate the output: java.io.Serializable.
Regards,
Frank de Jong
Zyntax, Amsterdam