Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[hyades-dev] Re: More info on Java UTF-8

All,

I'm new to the hyades-dev list, so please bear with me if I say anything
stupid.

Regarding the encoding of strings, I found the way that the Java
serialization protocol handles it makes a lot of sense, see:

http://java.sun.com/j2se/1.4.2/docs/guide/serialization/spec/protocol.ht
ml#wp8101

The main idea is that a string also has a single header byte indicating
it is a short string (up to 64k) or a long string (longer than 64k).
Depending on the header byte, the following 2 or 8 (!) bytes are used to
encode length. You might think a 64-bit length field is a bit overdone,
but since long strings are by definition longer than 64k, the allocation
of 8 bytes more won't make a difference, and there is ample room for
expansion.

The good thing about using this format is that you have a ready-made
interface in Java to generate the output: java.io.Serializable.

Regards,

Frank de Jong
Zyntax, Amsterdam



Back to the top