Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [smila-dev] Message Resequencer :: change to Agent Interface

hi,

 

thx for ur comments and  i will try to answer your concerns. I also will update the wiki according to this discussion to reflect the status quo. b/c I learned quite a bit after I posted the initial draft and this hasn’t been worked in yet.

 

> use case

the problem with the current design is a follows:

a)      we put all messages from an agent into the same Q, if they be ADDs or DELETEs no matter what. now, a queue as the name suggests, will deliver the messages as they are in sequence to a consumer. however, since we have >1 consumers -- our listeners -- there is no way we can tell in what order they are consumed, especially since we use selectors as well. (the JMS API spec makes no stipulation in that regard, because normally a Q is used in p2p connections and not 1:N. cant find the link  for it ATM, but if I do, I will post it gain)

b)      the listeners will feed the messages into their respective pipelines. these pipelines have different processing times, e.g. processing an ADD record usually takes longer than for a DEL.

c)       imagine we have an agent that produces

1.       ADD

2.       DELETE for the same resource R

the result will be likely like so: since the ADD executes longer than the DELETE the delete is likely to be executed before and the ADD adds the deleted resources to the index when in fact it is deleted in the source.

> problem with doing that in the buffer:

I really cannot see how we can solve it there in some non-messy way. maybe I'm just too thick and u have to enlighten me on the subject or u just have a really cool idea. then please post and I stop what I'm doing

 

the only way I can see this working is, if we were to remember all the IDs/hashes for all the messages we sent out and then intercept processing of those messages that get updated before they are committed to the index. such a scheme implies the following mechanics:

a)      connectivity will need to be informed of the committal of R to the index, to know that processing of that message has completed.

b)      connectivity needs to know where and how to intercept processing of the message.
BTW: in a clustered scenario this will be quite the challenge, as u will have to duplicate some of the processing flow in the processing logic as well. timeliness of intercepting and the added overhead will be quite something, to sort out!

 

I just thought of this additional scenario:

what if we have >1 processing targets? e.g. different indexes?

·         where we have one index to hold the result of a certain processing while having another to hold a different processing result, e.g. more advanced processing

·         or we just use 2 different index technologies, wanting to compare them or to have a fallback strategy

 

in these cases we would need to track several processing branches and states in connectivity.

 

> non cluster due to memory map

yes, that is right but I never said anything about this map being solely in RAM. I just used the word Map w/o tying this to the impl. kind. as we have it in DI: we can use persistent or just RAM whatever our setup warrants.

but then, I don’t think this will be much of a problem anyhow in a cluster scenario:

 

the core idea behind my suggestion is that the resequencer (rseq) will reorder the messages just before they are given to the indexing service (or any other processing target) which can then process the messages in proper order.

in most cases there will be just one node hosting the index service because it seldom makes sense to have different nodes feeding the same index:

·         often this is not supported by the technology

·         it incurs overhead as the requests need to by synchronized

 

but even if it were like so: with the current scheme we could still put the resequencer in front of the services putting the records into a Q in proper order, from which the service then take in random order.

 

to elaborate more on the mechanics:

 

a)      agent adds a seq# to all records that defines the total order of the records produced by this agent for a given data source.
(this is the change I was referring to)

b)      the router sends all messages to

a.       Queue Q0 and

b.      with a 2nd send task to another Qrseq_notify

c)       the processing pipelines will listen on Q0 normally

a.       do their work but w/o adding the records to the index

b.      send the result to Qrseq_in

d)      the rseq  will be the sole listener on

a.       Qrseq_notify.
all record's arriving here, will be put into the map (actually only the hashes) to know what was sent out by the router and is in processing. the message record are discarded.

b.      Qrseq_in.
all messages in this Q have completed processing and can be put into the index, independently if this is a DEL or ADD operation. (note: DELs can usually be sent here directly)

                                                               i.      on arrival of a record the rseq will check in his map if that hash is present and has a  higher sequence # than the current record.

1.       yes: discard the current operation on the record to be performed
(additionally we could also implement here some logic that actively removes entries from message queues that have that very hash. however, that is just a means to save processing time and is not a must)

2.       no: execute operation

 

 

some more notes:

·         when the agent is modified to produce the seq# we have no problem and can always reproduce the order in which the records where sent.  the change should be very simple to be done.
if we don’t want to change the agent interface/implementation then we could also accomplish the same thing by putting a sequencer into the Q0 that adds that seq# but then we cannot use buffering anymore.

·         the map of the rseq needs to be cleared of stale hashes by configuration a timeout that is not exceeded by a processing time, e.g. for those messages that end up in the deadletter Q

·         the rseq is impl'ed as a service and is called with 2 diff. invocations, one with the "notify" and the other with "process" hence it is called from pipelines to fit in the SMIAL landscape of things.

·         the approach is cluster safe and independent from the processing target

 

> not so easy to config

yes. but id dont have a better solution for this yet.

we just need to educate the people on this or make more specific solutions that are built into the current components, which is not so good IMO.

 

may I quote:

There is always an easy solution to every human problem—neat, plausible, and wrong.

 

http://www.bartleby.com/73/1736.html

 

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Igor.Novakovic@xxxxxxxxxxx
Sent: Donnerstag, 24. September 2009 19:14
To: smila-dev@xxxxxxxxxxx
Subject: AW: [smila-dev] Message Resequencer :: change to Agent Interface

 

Hi Tom,

 

I share Daniel’s opinion on both issues.

Before you start programming (I see that you’ve already opened a dedicated branch in repository for the resequencer), please let’s discuss the problem and do some conceptual work.

 

BTW: What is the use case that you’re trying to cover with your resequencer?

 

Regards

Igor

 

Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Daniel.Stucky@xxxxxxxxxxx
Gesendet: Donnerstag, 24. September 2009 17:31
An: smila-dev@xxxxxxxxxxx
Betreff: AW: [smila-dev] Message Resequencer :: change to Agent Interface

 

Hi Tom,

 

I see two drawbacks of your proposed solution(s):

1)      it will only work on one machine. In a distributed environment it is not guaranteed, that a Resequencer will get all the relevant messages concerning one Record. And as each Resequencer has it’s own map of Ids and sequence numbers two competing operations would not be recognized as such. The map has to be shared across all Resequencer instances (e.g. by using another Queue, or a database).

 

The initial idea of the Buffer component in Connectivity was to filter out and resolve competing operations before they enter the “system”, that is before they are processed. Of course this Buffer would also have to share its internal state across all instances (At the moment a Agent/Crawler is bound to one instance of Connectivity, so this distribution is not relevant, yet). In either case, the processing is left totally untouched by introducing the Buffer component., which leads me to my second issue:

 

2)      The workflow has to be adapted to architecture changes. So in order to benefit from the Resequencer business logic, workflows have to be designed in special ways (first do some processing, second store thee processed data). I think this is hard to grasp by users. BTW: would the actual storing be configurable, I mean will the Resequencer execute a BPEL pipeline or is the LuceneIndexing hardcoded ? The latter is of course no valid scenario, we have to be flexible in this regard, as users may want to store their data in arbitrary stores/indexes/whatsoever

 

 

Perhaps you could elaborate about your concerns with our initial Buffer idea ?

 

Bye,

Daniel

 

 

Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Thomas Menzel
Gesendet: Dienstag, 22. September 2009 07:28
An: Smila project developer mailing list
Betreff: RE: [smila-dev] Message Resequencer :: change to Agent Interface

 

oops,

 

http://wiki.eclipse.org/SMILA/Specifications/ProcessingMessageResequencer

 

PS: all along writing this draft I had this mail open but still managed to forget to add link….

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Igor.Novakovic@xxxxxxxxxxx
Sent: Montag, 21. September 2009 19:04
To: smila-dev@xxxxxxxxxxx
Subject: AW: [smila-dev] Message Resequencer :: change to Agent Interface

 

Hi Thomas,

 

Could you please provide us the link to your specification draft?

 

Cheers

Igor

 

 

Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Thomas Menzel
Gesendet: Montag, 21. September 2009 18:28
An: Smila project developer mailing list
Betreff: [smila-dev] Message Resequencer :: change to Agent Interface

 

Hi,

 

I wrote a specification draft for this change. plz feel free to comment.

 

 

in order for this to implement I will need to change the interface of the agent.

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Thomas Menzel
Sent: Montag, 21. September 2009 14:00
To: Smila project developer mailing list
Subject: [smila-dev] FYI :: new feature :: Message Resequencer

 

hi folks,

 

just wanted to announce and inform you that I will be working on the problem that messages don’t get out of sync when there are changes in close succession

 

this change will be tracked thru the bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=289995

 

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 


Back to the top