So let's look at all the buffering and copying involved in sending a TextFrame that will be deflated ( I note the lazy conversion of String to bytes has been reverted - no big deal as I think it was marginal use in its current form).
- TextFrames get the bytes of the text with a call to String@getBytes(CharSet), this does:
- calls StringCoding with the internal char array of the string
- depending on the securitymanager, the char array might be copied?
- allocates a byte[] that is len * max bytes per char, which is 4* for UTF-8
- encode the chars to the byte array (copy 1, transform 1)
- copy the bytes from the allocated byte array into an exact size byte array. (copy 2)
- We deflate the payload:
- uses BufferUtil.toArray, which copies to new byte[len] (copy 3)
- allocates compressed byte
- compresses into byte[] (copy 4, transform 2)
- We batch the frame
- copy to aggregate direct buffer (copy 5, transform 3)
So will do 5 copies of the data, of which 3 of them are transforming the data as it is copied (counting moving to kernel space as a transform).
I think one copy can easily be removed as the toArray call in the deflater is not necessary in most cases where the buffer array is directly accessible.
Code modularity fights against easy reductions of copying but let's consider what could be the best result if we don't have to consider modularity. If we could delay converting the string to bytes until the deflater wants those bytes, then we could generate UTF-8 directly into a fixed sized reusable uncompressed buffer in the deflator:
- TextFrame keeps just a reference to the String.
- Deflate has special handling for TextFrames:
- Allocates fixed size reusable uncompressed buffer.
- allocates reusable fixed size + overhead compressed buffer
- while more that 4 bytes of space in uncompressed buffer, iterates over characters in string using charAt(), converting each character to UTF-8 directly into the uncompressed buffer (copy 1, transform 1)
- compresses data (copy 2, transform 2)
- Batch the frame
- copy to aggregate direct buffer (copy 3, transform 3)
So I think it is possible to achieve the minimum of 3 copies for the 3 transforms needed to send a text frame, but it does need special handling for TextFrames in the extension - or potentially this could be moved into AbstractFrame with an abstraction of a payload iteration with copyTo semantic.
However, we would need to evaluate if iteratively calling charAt is efficient? It may be better to just burn the memory and fetch the char array from the string.... pity there is no access to the internal string char[].
Also the last copy might not be a transform to kernel memory if SSL or HTTP/2 is involved, but I think we have enough unencrypted usage of websocket now to justify optimising this case and 1 non transform copy is better than 2 or 3.
cheers