Bug 469947 - Paho prevents the default Java SSL session resumption
Summary: Paho prevents the default Java SSL session resumption
Status: ASSIGNED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Paho (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: James Sutton CLA
QA Contact: Ian Craggs CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-11 09:56 EDT by Cristiano De Alti CLA
Modified: 2016-02-05 11:16 EST (History)
3 users (show)

See Also:


Attachments
Patch, test case, logs and captures (417.05 KB, text/plain)
2015-06-11 09:56 EDT, Cristiano De Alti CLA
no flags Details
Patch, test case, logs and captures (215.37 KB, application/octet-stream)
2015-06-11 10:23 EDT, Cristiano De Alti CLA
no flags Details
Implement socket read timeout (9.16 KB, patch)
2015-06-16 11:47 EDT, Cristiano De Alti CLA
no flags Details | Diff
Always modify the socket read timeout during the SSL handshake (10.22 KB, patch)
2015-08-06 10:29 EDT, Cristiano De Alti CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Cristiano De Alti CLA 2015-06-11 09:56:54 EDT
Created attachment 254336 [details]
Patch, test case, logs and captures

When re-connecting to the same server and port using SSL, Paho prevents the default Java SSL session resumption mechanism, i.e. an abbreviated SSL handshake for subsequent connections.

Being able to resume the old SSL session is particularly important for slow (GPRS) connections with limited network traffic (some MBi/month).

The issue is related to how Paho needs to stop the receiver thread and how it reads the socket input stream.

Paho uses a blocking read of the socket input stream with a very long (infinite?) socket timeout.

Due to this, in order to stop the receiver thread it has to forcibly close the socket while the thread is blocked on the read.

This causes the read to throw a SocketException which has the effect to also invalidate the SSL session.

Since the session has been invalidated, the next connection will perform a full SSL handshake wasting precious GPRS traffic.

Please see the attachment for the proposed patch, test case, SSL debug logs and Wireshark captures.

Please note that the proposed fix, despite it's just a single line change and should not cause any harm, might work only on some JVMs/OSs (we use Oracle Java SE 7 on Linux OS).

The right fix would imply rewriting how Paho reads the input stream using short socket timeouts.
Comment 1 Ian Craggs CLA 2015-06-11 10:11:43 EDT
Hi.  The attachment doesn't look right.  Is it supposed to be a zip file?
Comment 2 Cristiano De Alti CLA 2015-06-11 10:21:58 EDT
Comment on attachment 254336 [details]
Patch, test case, logs and captures

Removed
Comment 3 Cristiano De Alti CLA 2015-06-11 10:23:47 EDT
Created attachment 254338 [details]
Patch, test case, logs and captures
Comment 4 Cristiano De Alti CLA 2015-06-12 06:46:38 EDT
Sorry,
  The proposed patch does not work.
The call to shutdownInput() actually throws an UnsupportedOperationException since is not supported by an SSLSocket.

The exception is actually caught by a block which ignores it.

Please ignore the proposed patch in the attachment.

The analysis is OK though.
Comment 5 Cristiano De Alti CLA 2015-06-16 11:47:55 EDT
Created attachment 254479 [details]
Implement socket read timeout

Hi,

This patch tries to fix the issue by implementing a 1 second socket read timeout.
This allows the receiver thread to check if it is asked to exit and allows to avoid closing the socket while the thread is reading from the socket input stream.

It looks ok to me (all tests pass).

The 1 second timeout might be questionable since it causes the read to throw a SocketTimeoutException every second.
It is now hardcoded in the TCPNetworkModule but must be known by CommsReceiver (see WARNING comment).

The timeout should be probably configurable by the application.
Comment 6 Ian Craggs CLA 2015-06-22 05:53:14 EDT
I've been talking to one of our Java experts who doesn't think that the closing of the socket in the way that the Paho client does it is the cause of the SSL session not being able to be resumed.

I haven't found any documentation about SSL session resumption in Java, yet. Do you know of any?

The introduction of the socket read timeout does introduce a variable which I'm uncomfortable about adding unless we're sure this is the best or only approach.
Comment 7 Cristiano De Alti CLA 2015-06-22 09:55:48 EDT
Hi,
  Thanks for the feedback.

> I've been talking to one of our Java experts who doesn't think that the closing of the socket in the way that the Paho client does it is the cause of the SSL session not being able to be resumed.

The Java SSL log (obtained with -Djavax.net.debug=all) is in the original attachment.

This also includes the Wireshark captures.

You should be able to recreate the issue with the provided test case (in the attachment), at least on Linux and the Oracle Java SE 7u45 64 bit.

> I haven't found any documentation about SSL session resumption in Java, yet. Do you know of any?

Unfortunately I don't. Experimentally I observed that every time the socket is closed while a read is in progress, a SocketException is thrown (documented in the Javadoc) and the session is invalidated (see attached logs).

> The introduction of the socket read timeout does introduce a variable which I'm uncomfortable about adding unless we're sure this is the best or only approach.

I agree. Unfortunately I could not find any other method and the patched version seems to work fine so far.

No need to say that the abbreviated SSL hadshake allows to save some kBi that on a GPRS link with 1MBi/month of traffic is not negligible (on a GPRS link connection losses can occur quite often).

Please try the test case on the original and patched version and see the difference with Wireshark.

Regards,
  Cristiano
Comment 8 Cristiano De Alti CLA 2015-08-06 10:29:30 EDT
Created attachment 255674 [details]
Always modify the socket read timeout during the SSL handshake

After enabling the socket read timeout, on slow cellular links, the SSL connection handshake timeouts due to the following code of org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule:

if ( soTimeout == 0 ) {
    // RTC 765: Set a timeout to avoid the SSL handshake being blocked indefinitely
    socket.setSoTimeout(this.handshakeTimeoutSecs*1000);
}

As the comment suggests, the code was intended to timeout the SSL handshake when the default socket timeout (soTimeout) was 0 (infinite).

Since my patch sets a 1 second socket read timeout, the above condition is false and the SSL connection handshake will timeout on slow links.

This new patch supersedes the old one by skipping the above check for soTimeout and always modifies the timeout during the SSL handshake.

The patch is cumulative and includes the previous patch "254479: Implement socket read timeout" plus what described here.
Comment 9 Ian Craggs CLA 2015-08-31 16:42:54 EDT
Thanks Cristiano, for all your work on this one.  I'd like to think there is a different, perhaps better, solution to the having a one second timeout on the read, and am going to ask on the mailing list to see if there are any other suggestions or advice.
Comment 10 Cristiano De Alti CLA 2015-10-02 10:03:37 EDT
Hello,
  This patch has been in production for the last two releases of our ESF framework.
It seems to work fine.
Note that some of our gateways are connected over GPRS cellular links often with unstable connectivity.
What about submitting the patch as a pull request to the Paho github repository for others to review it?

Regards,
  Cristiano
Comment 11 Benjamin Cabé CLA 2015-10-15 14:17:57 EDT
(In reply to Cristiano De Alti from comment #10)

> What about submitting the patch as a pull request to the Paho github
> repository for others to review it?


That sounds like a very reasonable approach :) Cristiano, are you familiar with how Gerrit works for submitting your PR?
Comment 12 Ian Craggs CLA 2015-10-16 06:23:16 EDT
I think that's reasonable too.

Even though I'm still somewhat uncomfortable about making this change, I've not found any alternative suggestion that works so far.  I guess if we make the change and try it out in our tests, then we'll get some idea of whether it's going to cause any problems for existing applications.

I'm going to assign this to James to implement.

Cristiano, could you please sign the Eclipse CLA?  
https://eclipse.org/legal/CLA.php

Additionally, a contributing guide is here:
http://git.eclipse.org/c/paho/org.eclipse.paho.mqtt.java.git/about/
but I'm thinking we can take the patch, if you sign the CLA.

Thanks!
Comment 13 Cristiano De Alti CLA 2015-10-16 10:19:23 EDT
Thanks,
  I've signed the CLA.
Does this mean I don't need to submit the PR through Gerrit and the attached patch will do?

Regards,
  Cristiano
Comment 14 Cristiano De Alti CLA 2016-02-04 12:56:01 EST
Hello,
  Can anyone confirm this issue?
Does the submitted patch work for you?

Regards,
  Cristiano
Comment 15 James Sutton CLA 2016-02-05 05:16:42 EST
Hi Cristiano,
I'll take a look at the patch & test cases today. Just for reference, we've just migrated from the Eclipse git and Bugzilla to GitHub. But are waiting for these bugs to be made read only. The GitHub Issue for this bug is here: https://github.com/eclipse/paho.mqtt.java/issues/28