Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jetty-users] Understanding behavior of org.eclipse.jetty.util.UrlEncode.decodeUtf8To

Can you provide some examples?

What precise bytes / values do you consider an incomplete UTF8 sequence?
Please include an example of what you consider an incomplete UTF8 sequence "in the middle" and another example as to the problem at the last part of the sequence.


--
Joakim Erdfelt <joakim@xxxxxxxxxxx>
Expert advice, services and support from from the Jetty & CometD experts


On Fri, Mar 21, 2014 at 3:44 PM, Ugo Scaiella <scaiella@xxxxxxxxxxxxx> wrote:
I don't understand the behavior of org.eclipse.jetty.util.UrlEncode.decodeUtf8To methods. Maybe I'm missing some points, but IMHO there are several inconsistent behaviors in case request data is not correctly encoded. I'm currently using v9.1.0 (but I cannot see any change in latest v9.1.3) and I'm using UTF8 as charset for decoding request data.

The strange behaviors I noticed are:

A) when parsing query string parameters
A.1) if the last value of the query string is an incomplete UTF8 sequence, the value is added to the map by replacing the last character with Utf8Appendable.REPLACEMENT (in my opinion this is the correct behavior)
A.2) if a token (ie a value or a key) in the middle of the query string is an incomplete UTF8 sequence, that token is completely ignored and will never be added to the map. You'll get just warn-level log message.

B) when parsing a form-urlencoded body of a POST or PUT request
B.1) if the last value of post data is an incomplete UTF8 sequence, a Utf8Appendable.NotUtf8Exception exception is raised and it bubbles up to, for instance, Request.getParameter(). And that is a RuntimeException... 
B.2) if a token (ie a value or a key) in the middle of the body is an incomplete UTF8 sequence, that token is ignored, just like point (A.2) above.

I think that there are several issues in the two overloaded methods org.eclipse.jetty.util.UrlEncode.decodeUtf8To
We have two overloaded methods decodeUtf8To in UrlEncoded class: the first one accepts an array of byte as first parameter, while the latter takes an InputStream. Namely the first one is used in scenario (A) and the second one in scenario (B).

Both of them, deploy a Utf8StringBuilder to temporary store the current parsed token. But when the token is converted into String we always call buffer.toString() that can throw that exception if the bytes are not a valid UTF8 sequence.
In (A.2) and (B.2), that call is inside a try-catch, but catch block do nothing, so the buffer is not reset and the value is not added to the map.
In (B.1), call to toString() is outside try-catch so, the exception bubbles up.
Scenario (A.1) is fine, because in that case (and only there) we use buffer.toReplacedString() that has a much safer behavior: if the last character is not a valid UTF8 sequence, the Utf8Appendable.REPLACEMENT is appended, the exception is logged (but not thrown) and the resulting string is returned.

IMHO, this is the correct behavior, so in org.eclipse.jetty.util.UrlEncode.decodeUtf8To methods, we should replace Utf8StringBuilder.toString calls with Utf8StringBuilder.toReplacedString .
Or am I missing something?

-- Ugo

_______________________________________________
jetty-users mailing list
jetty-users@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/jetty-users



Back to the top