Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[mosquitto-dev] Quadratic performance loss in main loop + proof of concept fix

Hi,

One of our Mosquitto servers has about 25k clients, with almost no messages. Still, it's permanently at 90-100% CPU. Profiling with `perf top` hints that most of that time seems to be spent on [1], iterating over all clients at the start of each main loop iteration. This seems to be for two reasons:

1) Deal with keep-alive expiration.

2) To send out all queued messages caused by socket traffic. For instance, when someone publishes and loop_handle_reads_writes() is called, some/many clients will get queued messages because they subscribe to the published ones.

The high CPU seems caused by clients' keep-alive after 60s. 25k clients sending keep alives, is about 416 pings every second (when distributed perfectly), which means that HASH_ITER iterates about 10.4 million times per second (see *note). On a 3 Ghz machine, this leaves roughly 300 CPU cycles for each iteration, and then normal work still has to be done. And this scales to the power of 2: more clients means more keep-alives and more iterations. A doubling in clients is a quadruple in HASH_ITER at [1].

*Note: the epoll_wait does return more than 1 fd with activity, so it's not perfectly one iteration per keep-alive, but I've noticed that even when DOSing the process, it's still very low. That's why the main loop iterates so often.

I wrote some proof of concept code [2] (best viewed with white-space ignoring diff), that only does the keep-alive checking on occasion, and puts the client contexts that are actually affected in a list, so that we only have to iterate over those. In other words, the code inside the HASH_ITER has been split up in two blocks.

Not fully knowing the Mosquitto code base, I'd like to hear if this is a good approach, and whether I may have missed something.

Regards,

Wiebe


[1] https://github.com/eclipse/mosquitto/blob/3a13205e5e7635fab23bc26ca3898fbfdeec4836/src/loop.c#L362
[2] https://github.com/halfgaar/mosquitto/commit/0f0f6cfd3423db11581085eefed4a6af337dba75


Back to the top