Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mosquitto-dev] Storing all Messages in Database - Avoiding Single Subscriber `#` Anti-pattern

Drasko DRASKOVIC <drasko.draskovic@xxxxxxxxx> wrote:
> Hi Karl,
> 
> On Tue, Sep 6, 2016 at 8:03 PM, Karl Palsson
> <karlp@xxxxxxxxxxxx> wrote:
> >
> > Drasko DRASKOVIC <drasko.draskovic@xxxxxxxxx> wrote:
> >> It is strange though that in the official examples I find
> >> exactly the Single Subscriber anti-pattern:
> >> https://github.com/eclipse/mosquitto/blob/master/examples/mysql_log/mysql_log.c,
> >> line:
> >
> > Says who? Who says it's an antipattern? You want to save all the
> > messages in some "other" way, but you don't want to get all the
> > messages via even a local networking connecton (which are pretty
> > heavily optimized in most OSs)
> 
> You probably missed my first e-mail on this subject, where I
> posted the link where I explained problem in more details:
> https://groups.google.com/forum/#!topic/rabbitmq-users/KVMNkAsW-ac.
> I did not want to repeat all, but basically you do want to look
> at this video: https://www.youtube.com/watch?v=VoTclkxSago (the
> fun part starts at 11th minute), and read this article:
> http://www.hivemq.com/blog/mqtt-sql-database (look at chapter
> "Isn’t the wildcard subscriber some kind of bottleneck?")

You're correct, I absolutely didn't read it, but I did understand
what you were getting at. I just failed at explaining my point.
I'll try and clarify :)

> 
> >
> > If your bridging configuration results in you creating hotspots,
> > that's a problem with how you're setting up your bridging and
> > clustering, not that you need some "special" way of writing out
> > files that's not called subscribing.
> 
> No matter how you configure the bridge, your backend database
> client will connect to only one host. And it will connect on
> `#` topic which will provoke that all other nodes in the bridge
> have to send all messages to this node in order for the
> messages to be pushed to the backend client.
> 
> Let's say that you have 3 nodes in the bridge, and your db
> client subscribes to the Node 2 (on `#` topic). Even if client
> connected to the Node 1 pushes something on the topic XY
> wanting to send for example message to the some subscriber on
> Node 3, this message will also have to go to the Node 2. And so
> on, and so on. So when you make bridge of 100 nodes, you will
> still have just one node (Node 2 in this case) that will under
> heavy load - i.e. you will not be able to spread the load over
> all 100 nodes).
> 
> The only way to spread the load is to let each of 100 nodes
> send their internal messages directly to database backend
> client (rather then route them through Node 2). But this is not
> possible if your client is MQTT subscriber (as it will be
> connected to only one node).

Yes. that's it. My point is that if you have a bridge
configuration that results in all data being available
everywhere, then you've already made the bottleneck. Replacing a
second localhost process (let's call it a "subsciber" for fun)
with extra code in the main process will surely give you an
increase in performance, but you still have a single point that
you're expecting to process every message. Farm it out you say?
Have your in daemon code fire it off to multiple workers? Great
idea! Hang on, that's just like having multiple subscribers....
on each of the nodes...

What I'm suggesting is that if you want to save all your
messages, save them on the front mqtt nodes, don't aggregate them
all into one place and then complain that it's too many to
process in one place. Calling it a "#" wildcard subscriber or a
code patch to do it in the same process doesn't change that
anywhere near as meaningfully as simply not aggregating it all in
the first place.

> 
> So, to resolve this I do not want my DB backend client to be
> MQTT subscriber (especially not on `#` topic, which makes
> firehose single-point-of-failure node) but rather be some kind
> of either TCP client that taked message data from within MQTT
> nodes internally, but most probably be Kafka queue subscriber
> while MQTT brokers will publish their messages to Kafka (each
> of them).

Or this, sure. But while it's no longer "a subscriber to #" it's
now "a queue subscriber to all" so you still have the
"bottleneck" that someone has to listen to everything.

I guess my concern is simply that you seem to implying that
subscribing to # is bad, but you still want to subscribe to
everything, you just don't want it to be a "subscriber" and that
seems like an artificial distinction.

Sincerely,
Karl P

Attachment: signature.asc
Description: OpenPGP Digital Signature


Back to the top