Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[mosquitto-dev] libmosquitto fails to do backoff

Hello guys,

When using mosquitto_loop_forever, if the broker closes the connection while libmosquitto is actively sending data, the backoff never happens. This can end up causing a very fast loop of connect/disconnect that can bring down both parties. More specifically, this is a problem on QoS > 0, where the message gets buffered and retried right after connection.

I manage to produce a simple piece of code to reproduce it:

Here the C portion:
#include <mosquitto.h>

int main() {
    mosquitto_lib_init();
    struct mosquitto *m = mosquitto_new("test_id", false, NULL);

    mosquitto_connect(m, "localhost", 1885, 60);
    mosquitto_publish(m, NULL, "test", 3, "hey", 1, false);
    mosquitto_loop_forever(m, 1, 1);
}

I also wrote a fake "broker" that only accepts the connection and then drops it. My real scenario is not this crude, but this version makes it predictable, and the result is the same. Here the code:

import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind(('', 1885))
    s.listen(1)

    while True:
        conn, _ = s.accept()
        with conn:
            print("connected")
            conn.send(b"\x20\x02\x01\x00") # CONNACK

With these two pieces of code I was able to bisect and pinpoint the commit a3ebeff9d732458a4dac7513fac10a52a97cf4d1 as the one that broke the library (somewhen between 1.6.9 and 1.6.10).

The issue seems to be caused by interruptible_sleep calling select on a socket mosq->sockpairR that is been written at the same time that the connection drops. A new connection starts right away.

I corroborated that this is the place by quickly emptying mosq->sockpairR before going to the select. This made the problem go away.

I'm not sure how to proceed from here. I feel this code is quite delicate to touch, and while fixing this bug I might introduce another one.

If someone knows how to fix it, or can at least provide a suggestion, please let me know.

Regards,
Abilio



Back to the top