Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jetty-dev] Websocket buffers and async modes



I just looked at how hard it would be to encode UTF-8 to a preallocated/reusable byte buffer.   It looks pretty simple and  I have written the method below for StringUtil.
The issue I see with this is that it is hard to abstract into AbstractFrame as the index is in terms of chars rather than bytes, but that could be put into an opaque iterator???


    /* ------------------------------------------------------------ */
    /** Encode a string as UTF-8 to a ByteBuffer
     * @param s The characters to encode
     * @param offset The offset of the characters to encode
     * @param len The length of characters to encode
     * @param out The ByteBuffer to encode to. Bytes will between the limit and capacity and the limit will be updated to reflect the bytes encoded.
     * @return The number of characters actually encoded, which may be < len if space is not available in the out buffer.
     */
    public static int encodeUTF8to(CharSequence s,int offset, int len,ByteBuffer out)
    {
        /*
         bits    sequence          Byte 1   Byte 2   Byte 3   Byte 4 
          7      U+0000  U+007F    0xxxxxxx
         11      U+0080  U+07FF    110xxxxx 10xxxxxx
         16      U+0800  U+FFFF    1110xxxx 10xxxxxx 10xxxxxx
         21      U+10000 U+1FFFFF  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
        */
        if (!out.hasArray())
            throw new IllegalArgumentException();
        byte[] e=out.array();
        int eEnd=out.capacity()+out.arrayOffset()-4;
        int o=out.limit()+out.arrayOffset();
        int sEnd=offset+len;
        int i=offset;
        for (;i<sEnd && o<eEnd;i++)
        {
            char c=s.charAt(i);
           
            if (c<0x80)
            {
                e[o++]=(byte)c;
            }
            else if (c<0x800)
            {
                e[o++] = (byte)(0xc0 | (c >> 06));
                e[o++] = (byte)(0x80 | (c & 0x3f));
            }
            else if (Character.isHighSurrogate(c))
            {
                int code=Character.toCodePoint(c,s.charAt(++i));
                e[o++] = (byte)(0xf0 | ((code >> 18)));
                e[o++] = (byte)(0x80 | ((code >> 12) & 0x3f));
                e[o++] = (byte)(0x80 | ((code >> 06) & 0x3f));
                e[o++] = (byte)(0x80 | (code & 0x3f));
            }
            else
            {
                e[o++] = (byte)(0xe0 | (c >> 12));
                e[o++] = (byte)(0x80 | ((c >> 06) & 0x3f));
                e[o++] = (byte)(0x80 | (c & 0x3f));
            }
        }
       
        out.limit(o-out.arrayOffset());
        return i-offset;
    }


On 10 January 2014 12:44, Greg Wilkins <gregw@xxxxxxxxxxx> wrote:

So let's look at all the buffering and copying involved in sending a TextFrame that will be deflated ( I note the lazy conversion of String to bytes has been reverted - no big deal as I think it was marginal use in its current form).

  1. TextFrames get the bytes of the text with a call to String@getBytes(CharSet), this does:
    • calls StringCoding with the internal char array of the string
    • depending on the securitymanager, the char array might be copied?
    • allocates a byte[] that is len * max bytes per char, which is 4* for UTF-8
    • encode the chars to the byte array (copy 1, transform 1)
    • copy the bytes from the allocated byte array into an exact size byte array. (copy 2)
  2. We deflate the payload:
    • uses BufferUtil.toArray, which copies to new byte[len] (copy 3)
    • allocates compressed byte
    • compresses into byte[] (copy 4, transform 2)
  3. We batch the frame
    • copy to aggregate direct buffer (copy 5, transform 3)

So will do 5 copies of the data, of which 3 of them are transforming the data as it is copied (counting moving to kernel space as a transform).

I think one copy can easily be removed as the toArray call in the deflater is not necessary in most cases where the buffer array is directly accessible.

Code modularity fights against easy reductions of copying but let's consider what could be the best result if we don't have to consider modularity.  If we could delay converting the string to bytes until the deflater wants those bytes, then we could generate UTF-8 directly into a fixed sized reusable uncompressed buffer in the deflator:

  1. TextFrame keeps just a reference to the String.
  2. Deflate has special handling for TextFrames:
    • Allocates fixed size reusable uncompressed buffer.
    • allocates reusable fixed size + overhead compressed buffer
    • while more that 4 bytes of space in uncompressed buffer, iterates over characters in string using charAt(), converting each character to UTF-8 directly into the uncompressed buffer (copy 1, transform 1)
    • compresses data (copy 2, transform 2)
  3. Batch the frame
    • copy to aggregate direct buffer (copy 3, transform 3)

So I think it is possible to achieve the minimum of 3 copies for the 3 transforms needed to send a text frame, but it does need special handling for TextFrames in the extension - or potentially this could be moved into AbstractFrame with an abstraction of a payload iteration with copyTo semantic.

However, we would need to evaluate if iteratively calling charAt is efficient?  It may be better to just burn the memory and fetch the char array from the string.... pity there is no access to the internal string char[].

Also the last copy might not be a transform to kernel memory if SSL or HTTP/2 is involved, but I think we have enough unencrypted usage of websocket now to justify optimising this case and 1 non transform copy is better than 2 or 3.

cheers
















On 10 January 2014 12:00, Greg Wilkins <gregw@xxxxxxxxxxx> wrote:



On 10 January 2014 11:17, Bruno D. Rodrigues <bruno.rodrigues@xxxxxxxxx> wrote:

I’m sorry for jumping in, but this is not 100% true ;)

We want you to jump in! that's why we posted here!
 
The aggregation does indeed happen for write(byte[]) but not for write(ByteBuffer). I’ve sent an email to the list some time in the past and I was almost sure I had opened a bug on bugs.eclipse, but now I can’t find it. It’s late now, and I’ll search again tomorrow, but basically a write(ByteBuffer) will never aggregate, and hence why my local copy has this patch: https://gist.github.com/davipt/7155109 Please advice how to provide a better feedback on this kind of situation. I’ll search my archives again and reopen a new issue on bugzilla if indeed it never went through :( 

I believe I have already addressed this issue in 9.1.1 I think and passed buffers may be aggregated, sliced and written or just written.

Eitherway, the point remains valid that if we choose to copy passed buffers, then we have to deal with very large buffers.    Copying these in chunks is not helpful, so I think we should write them directly and not call their callbacks until written - ie momentarily revert to non batching mode if a very large frame is to be sent.

cheers



 

_______________________________________________
jetty-dev mailing list
jetty-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/jetty-dev




--
Greg Wilkins <gregw@xxxxxxxxxxx>
http://www.webtide.com
Developer advice and support from the Jetty & CometD experts.
Intalio, the modern way to build business applications.



--
Greg Wilkins <gregw@xxxxxxxxxxx>
http://www.webtide.com
Developer advice and support from the Jetty & CometD experts.
Intalio, the modern way to build business applications.



--
Greg Wilkins <gregw@xxxxxxxxxxx>
http://www.webtide.com
Developer advice and support from the Jetty & CometD experts.
Intalio, the modern way to build business applications.

Back to the top