Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mosquitto-dev] Quadratic performance loss in main loop + proof of concept fix

----- Original Message -----
> From: "Wiebe Cazemier" <wiebe@xxxxxxxxxxxx>
> To: "General development discussions for the mosquitto project" <mosquitto-dev@xxxxxxxxxxx>
> Sent: Saturday, 26 September, 2020 15:03:56
> Subject: [mosquitto-dev] Quadratic performance loss in main loop + proof of	concept fix
>
> Hi,
> 
> One of our Mosquitto servers has about 25k clients, with almost no messages.
> Still, it's permanently at 90-100% CPU. Profiling with `perf top` hints that
> most of that time seems to be spent on [1], iterating over all clients at the
> start of each main loop iteration. This seems to be for two reasons:
> 
> 1) Deal with keep-alive expiration.
> 
> 2) To send out all queued messages caused by socket traffic. For instance, when
> someone publishes and loop_handle_reads_writes() is called, some/many clients
> will get queued messages because they subscribe to the published ones.
> 
> The high CPU seems caused by clients' keep-alive after 60s. 25k clients sending
> keep alives, is about 416 pings every second (when distributed perfectly),
> which means that HASH_ITER iterates about 10.4 million times per second (see
> *note). On a 3 Ghz machine, this leaves roughly 300 CPU cycles for each
> iteration, and then normal work still has to be done. And this scales to the
> power of 2: more clients means more keep-alives and more iterations. A doubling
> in clients is a quadruple in HASH_ITER at [1].
> 
> *Note: the epoll_wait does return more than 1 fd with activity, so it's not
> perfectly one iteration per keep-alive, but I've noticed that even when DOSing
> the process, it's still very low. That's why the main loop iterates so often.
> 
> I wrote some proof of concept code [2] (best viewed with white-space ignoring
> diff), that only does the keep-alive checking on occasion, and puts the client
> contexts that are actually affected in a list, so that we only have to iterate
> over those. In other words, the code inside the HASH_ITER has been split up in
> two blocks.
> 
> Not fully knowing the Mosquitto code base, I'd like to hear if this is a good
> approach, and whether I may have missed something.
> 
> Regards,
> 
> Wiebe
> 
> 
> [1]
> https://github.com/eclipse/mosquitto/blob/3a13205e5e7635fab23bc26ca3898fbfdeec4836/src/loop.c#L362
> [2]
> https://github.com/halfgaar/mosquitto/commit/0f0f6cfd3423db11581085eefed4a6af337dba75

As a follow up: I worked on it some more to make something complete. Tests now pass. My branch 'wiebe-optimize-event-loop' [1] has the code. It saves time by not iterating over all clients all the time, and by only doing the keep-alives occasionally.

I did some benchmarks. See attachment. I used my MqttLoadSimilator for it [2].

As you can see, especially the load caused by idling clients is reduced significantly, but also with only active clients connected there is an improvement.

The code is based on master, and I see that in develop, at least some of the event loop code pertaining to keep-alives has changed. But of course the idea can be redone on the development branch. But, still I'd like to know whether this change is perhaps a bad idea, and/or whether it conflicts with other development plans, etc.

Regards,

Wiebe



[1] https://github.com/halfgaar/mosquitto/tree/wiebe-optimize-event-loop
[2] https://github.com/halfgaar/MqttLoadSimulator






Attachment: Mosquitto-benchmarks.png
Description: PNG image


Back to the top