Bug 462474 - CommsReceiver disconnecting on unknown PUBREC
Summary: CommsReceiver disconnecting on unknown PUBREC
Status: UNCONFIRMED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Paho (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Bin Zhang CLA
QA Contact: Ian Craggs CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-18 10:57 EDT by Martijn Stellinga CLA
Modified: 2016-02-05 11:16 EST (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martijn Stellinga CLA 2015-03-18 10:57:11 EDT
Whenever the CommsReceiver class receiver an MqttAck message, the class looks for the token of the originally published message, and if it is not found, it throws an exception (CommsReceiver line 123).

Unfortunately, we have a production system where our application crashed, resulting in our MQTT broker (mosquitto) sending a PUBREC message for a message that is not known in the application anymore.
Because the CommReceiver throws an exception, it disconnects. 
Unfortunately, because the PUBREC is never acknowledged, Mosquitto keeps sending the PUBREC message every time the application tries to connect. We can only solve this by clearing the Mosquitto message database, meaning we lose any other valid messages that are still queued.

It seems it would be more robust if the Commsreceiver acknowledges PUBREC messages, even if it does not have a token for them, and logs a warning.
Comment 1 Bin Zhang CLA 2015-03-19 03:09:00 EDT
Per my understanding, the use case is:

Client publishes a message with QoS2, and server receives this message and replies with PUBREC, but client can never acknowledge this PUBREC because it cannot find a stored PUBLISH with the same packet ID.  

I think the client shouldn't discard the message util it received PUBREC, so at this time, the client is the message owner. It should consider the message hasn't arrived the server, and should resend the PUBLISH message again. It probably means the client data store is corrupted.  And I don't understand why the server will keep sending the PUBREC, unless it receives another PUBLISH from client. because the ownership of the message hasn't been transferred to the server yet.

And yes, i agree the it would be more robust if just acknowledge PUBREC in this case.
but I'm not sure if it's a good idea. But at least i think we need to make sure the client store not lose any QoS1&2 messages even it crashes.

cc Ian, WDYT?
Comment 2 Maarten van Schouwenburg CLA 2015-11-02 11:00:45 EST
I may have found a situation where this can occur, without the client data store being corrupted.

If the Paho client sends a PUBLISH with the dup flag set, at the exact same time the server sends the PUBREC, the server (mosquitto) will respond to the second message too. Which results in 2 PUBREC messages being send. 

If we just ignore the fact that this one unknown and just call clientState.notifyReceivedAck((MqttAck)message); we may get another duplicate message on PUBCOMP

excerpt from mosquitto.log:

1446478939: Received PUBLISH from backend1 (d0, q2, r0, m49638, '/v1/account/xxxxx/devices/xxxxxx/status', ... (103 bytes))
1446478939: Sending PUBREC to backend1 (Mid: 49638)
1446478959: Received PUBLISH from backend1 (d1, q2, r0, m49638, '/v1/account/xxxxx/devices/xxxxxx/status', ... (103 bytes))
1446478959: Sending PUBREC to backend1 (Mid: 49638)
1446478959: Sending PUBREC to backend1 (Mid: 49638)
1446478972: Received PUBREL from backend1 (Mid: 49638)
1446478972: Sending PUBCOMP to backend1 (Mid: 49638)
1446478973: Received PUBREL from backend1 (Mid: 49638)
1446478973: Sending PUBCOMP to backend1 (Mid: 49638)
Comment 3 James Sutton CLA 2016-02-05 06:09:35 EST
Migrated to GitHub Issue: https://github.com/eclipse/paho.mqtt.java/issues/27