Community
Participate
Working Groups
Created attachment 254336 [details] Patch, test case, logs and captures When re-connecting to the same server and port using SSL, Paho prevents the default Java SSL session resumption mechanism, i.e. an abbreviated SSL handshake for subsequent connections. Being able to resume the old SSL session is particularly important for slow (GPRS) connections with limited network traffic (some MBi/month). The issue is related to how Paho needs to stop the receiver thread and how it reads the socket input stream. Paho uses a blocking read of the socket input stream with a very long (infinite?) socket timeout. Due to this, in order to stop the receiver thread it has to forcibly close the socket while the thread is blocked on the read. This causes the read to throw a SocketException which has the effect to also invalidate the SSL session. Since the session has been invalidated, the next connection will perform a full SSL handshake wasting precious GPRS traffic. Please see the attachment for the proposed patch, test case, SSL debug logs and Wireshark captures. Please note that the proposed fix, despite it's just a single line change and should not cause any harm, might work only on some JVMs/OSs (we use Oracle Java SE 7 on Linux OS). The right fix would imply rewriting how Paho reads the input stream using short socket timeouts.
Hi. The attachment doesn't look right. Is it supposed to be a zip file?
Comment on attachment 254336 [details] Patch, test case, logs and captures Removed
Created attachment 254338 [details] Patch, test case, logs and captures
Sorry, The proposed patch does not work. The call to shutdownInput() actually throws an UnsupportedOperationException since is not supported by an SSLSocket. The exception is actually caught by a block which ignores it. Please ignore the proposed patch in the attachment. The analysis is OK though.
Created attachment 254479 [details] Implement socket read timeout Hi, This patch tries to fix the issue by implementing a 1 second socket read timeout. This allows the receiver thread to check if it is asked to exit and allows to avoid closing the socket while the thread is reading from the socket input stream. It looks ok to me (all tests pass). The 1 second timeout might be questionable since it causes the read to throw a SocketTimeoutException every second. It is now hardcoded in the TCPNetworkModule but must be known by CommsReceiver (see WARNING comment). The timeout should be probably configurable by the application.
I've been talking to one of our Java experts who doesn't think that the closing of the socket in the way that the Paho client does it is the cause of the SSL session not being able to be resumed. I haven't found any documentation about SSL session resumption in Java, yet. Do you know of any? The introduction of the socket read timeout does introduce a variable which I'm uncomfortable about adding unless we're sure this is the best or only approach.
Hi, Thanks for the feedback. > I've been talking to one of our Java experts who doesn't think that the closing of the socket in the way that the Paho client does it is the cause of the SSL session not being able to be resumed. The Java SSL log (obtained with -Djavax.net.debug=all) is in the original attachment. This also includes the Wireshark captures. You should be able to recreate the issue with the provided test case (in the attachment), at least on Linux and the Oracle Java SE 7u45 64 bit. > I haven't found any documentation about SSL session resumption in Java, yet. Do you know of any? Unfortunately I don't. Experimentally I observed that every time the socket is closed while a read is in progress, a SocketException is thrown (documented in the Javadoc) and the session is invalidated (see attached logs). > The introduction of the socket read timeout does introduce a variable which I'm uncomfortable about adding unless we're sure this is the best or only approach. I agree. Unfortunately I could not find any other method and the patched version seems to work fine so far. No need to say that the abbreviated SSL hadshake allows to save some kBi that on a GPRS link with 1MBi/month of traffic is not negligible (on a GPRS link connection losses can occur quite often). Please try the test case on the original and patched version and see the difference with Wireshark. Regards, Cristiano
Created attachment 255674 [details] Always modify the socket read timeout during the SSL handshake After enabling the socket read timeout, on slow cellular links, the SSL connection handshake timeouts due to the following code of org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule: if ( soTimeout == 0 ) { // RTC 765: Set a timeout to avoid the SSL handshake being blocked indefinitely socket.setSoTimeout(this.handshakeTimeoutSecs*1000); } As the comment suggests, the code was intended to timeout the SSL handshake when the default socket timeout (soTimeout) was 0 (infinite). Since my patch sets a 1 second socket read timeout, the above condition is false and the SSL connection handshake will timeout on slow links. This new patch supersedes the old one by skipping the above check for soTimeout and always modifies the timeout during the SSL handshake. The patch is cumulative and includes the previous patch "254479: Implement socket read timeout" plus what described here.
Thanks Cristiano, for all your work on this one. I'd like to think there is a different, perhaps better, solution to the having a one second timeout on the read, and am going to ask on the mailing list to see if there are any other suggestions or advice.
Hello, This patch has been in production for the last two releases of our ESF framework. It seems to work fine. Note that some of our gateways are connected over GPRS cellular links often with unstable connectivity. What about submitting the patch as a pull request to the Paho github repository for others to review it? Regards, Cristiano
(In reply to Cristiano De Alti from comment #10) > What about submitting the patch as a pull request to the Paho github > repository for others to review it? That sounds like a very reasonable approach :) Cristiano, are you familiar with how Gerrit works for submitting your PR?
I think that's reasonable too. Even though I'm still somewhat uncomfortable about making this change, I've not found any alternative suggestion that works so far. I guess if we make the change and try it out in our tests, then we'll get some idea of whether it's going to cause any problems for existing applications. I'm going to assign this to James to implement. Cristiano, could you please sign the Eclipse CLA? https://eclipse.org/legal/CLA.php Additionally, a contributing guide is here: http://git.eclipse.org/c/paho/org.eclipse.paho.mqtt.java.git/about/ but I'm thinking we can take the patch, if you sign the CLA. Thanks!
Thanks, I've signed the CLA. Does this mean I don't need to submit the PR through Gerrit and the attached patch will do? Regards, Cristiano
Hello, Can anyone confirm this issue? Does the submitted patch work for you? Regards, Cristiano
Hi Cristiano, I'll take a look at the patch & test cases today. Just for reference, we've just migrated from the Eclipse git and Bugzilla to GitHub. But are waiting for these bugs to be made read only. The GitHub Issue for this bug is here: https://github.com/eclipse/paho.mqtt.java/issues/28