Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mosquitto-dev] Storing all Messages in Database - Avoiding Single Subscriber `#` Anti-pattern

Hi Karl,

On Wed, Sep 7, 2016 at 8:21 PM, Karl Palsson <karlp@xxxxxxxxxxxx> wrote:
> Yes. that's it. My point is that if you have a bridge
> configuration that results in all data being available
> everywhere, then you've already made the bottleneck. Replacing a
> second localhost process (let's call it a "subsciber" for fun)
> with extra code in the main process will surely give you an
> increase in performance, but you still have a single point that
> you're expecting to process every message. Farm it out you say?
> Have your in daemon code fire it off to multiple workers? Great
> idea! Hang on, that's just like having multiple subscribers....
> on each of the nodes...

You did not look at the video again :). Point is not in the multiple
workers or raw power of the backend client, point is in message
routing within the cluster and load balancing on northbound interface
of the broker.

In the scenario I propose there is no message routing within the
cluster, so middle node does not have all the messages flying through
it. Every broker node in the bridge equally takes the load and pumps
out it's own messages directly in database, not going through a middle
node.

People from VerneMQ agree on the same problem, even though Erlang is
much more easy to cluster than C:
https://github.com/erlio/vernemq/issues/197

> What I'm suggesting is that if you want to save all your
> messages, save them on the front mqtt nodes

Umm... I did not get this... My MQTT broker provides MQTT API for
devices to connect. I can not tell devices to connect to my
intermediate proxy that leaks data to database before it forwards it
to MQTT broker. And what would this proxy be anyway - another MQTT
broker?

> don't aggregate them
> all into one place and then complain that it's too many to
> process in one place.

I am not trying to aggregate all in one place, that is how MQTT works.
I do not have a mechanism to subscribe to different nodes for all
messages for only that particular node (one could try splitting the
topics, but there is still no guarantee that even with this solution
all backend clients will not connect to only one node).

> Calling it a "#" wildcard subscriber or a
> code patch to do it in the same process doesn't change that
> anywhere near as meaningfully as simply not aggregating it all in
> the first place.

Usin `#` is not scalable in cluster. It is perfectly fine to use it in
scenarios when you have only one broker (in fact it is exactly what
should be done), and I would say it is even OK for low loads. But you
have to understand that this is not scalable solution. Video I have
sent is from 2lemetry guys, and Amazon acquired 2tlemetry (this is
where Amazon AWS IoT services come from) - and be sure that they would
not go through all the hassle with Kafka and Zookeper if `#`
subscription was scalable.

>>
>> So, to resolve this I do not want my DB backend client to be
>> MQTT subscriber (especially not on `#` topic, which makes
>> firehose single-point-of-failure node) but rather be some kind
>> of either TCP client that taked message data from within MQTT
>> nodes internally, but most probably be Kafka queue subscriber
>> while MQTT brokers will publish their messages to Kafka (each
>> of them).
>
> Or this, sure. But while it's no longer "a subscriber to #" it's
> now "a queue subscriber to all" so you still have the
> "bottleneck" that someone has to listen to everything.

No, it is bunch of queues - one per node in the cluster. And each node
participates equally in pumping messages in it's queue - so load is
equally spread among MQTT nodes. How I will pick up messages from the
queues with my backend client - this is another thing, and this is
really simple and known technique (Kafka pub/sub for example, or
RabbitMQ, or whatever). But my goal was to spread load over **MQTT
borkers** - this is what is hard to do. In `#` one MQTT broker takes
**all the load** and all this investigation is how to avoid this
scenario!

> I guess my concern is simply that you seem to implying that
> subscribing to # is bad, but you still want to subscribe to
> everything, you just don't want it to be a "subscriber" and that
> seems like an artificial distinction.

I do want all the messages, but I do not want that all the messages be
routed through only one MQTT broker, but that all MQTT brokers in the
cluster participate equally in pumping out these messages.

A
|
v
B -> client
^
|
C

is different than

A -> queue \
B -> queue  --->client
C -> queue /


BR,
Drasko


Back to the top