Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-dev] RE: FYI :: new feature :: Message Resequencer

Hi Tom,


> a) Parallel workflows should be totally legitimate and not illegal.
> 
> Imagine that you want to process a record in two completely different
ways ending
> up in diff indexes. Why should we force it to run in serial fashion if
the
> customer provides the computing power?
There are two "flavors" of this use case:
1) The "illegal" one:
The user defines two pipelines that contain several preprocessing
pipelets and an indexing service at the end of each pipeline.
This case is "illegal" because 
	a) We must assume that each preprocessing pipelet updates the
record. 
	b) Now if those two pipelines are running simultaneously the
record would be arbitrary updated by _all_ the pipelets and no
predictable result/workflow can be guaranteed.

2) The "legal" one:
If the user really wants to store the record in two or more different
indexes than all he has to do is to construct the pipeline that does
some (complex) preprocessing (by merging the two pipelines) and at the
end of the new pipeline simply fork it with two index writer pipelets.



> Also the router explicitly allows several Send tasks in its config,
which we would
> have to take out.
IMO we should take it out. It just causes problems by "seducing" the
user to run into this pitfall.


 
> b) following your discussion it seems to me that you slowly approach
the idea
> where you need to register first and unregister at the end, albeit you
use the
> terms (un)lock and move the functionality into existing components.
Yes, the proposed changes would affect already existing components like
blackboard, connectivity and listener.
But there is one important difference between locking and registering:
By locking the record we also prevent having it changed simultaneously
in the preprocessing part of the process.
By registering operations you would eventually only keep the order of
them but still be unable to prevent their parallel execution and
therefore arbitrary updates of the record.



> To solve this, I have to agree with you that we need a buffer that
allows us to
> queue and consolidate subsequent PRs as long as the item in question
is being
> processed.
Exactly!

 
 
> New idea:
> 
> An idea that I had (but not thought thru yet) was to have such a
buffer in
> connectivity myself but I don't want to delay all PRs by a fixed
amount of time.
> instead I want to have pipelets just before calling the PT to signal
to
> connectivity that processing has completed 
I like your idea of not having buffer operating in constant intervals.
I would like only to suggest another implementation:
Instead of expanding buffer with callbacks and annoying it with a bunch
of information that he is not interested in (remember: only a very small
portion of records would be changed in short time periods), we could use
"record locking concept" so that the buffer proactively query the
blackboard if some record is "ready" for reprocessing.


> Thought: since we use an MQ anyhow we just could open up another Q to
send such
> messages back. 
That is in principle the same idea as I've just proposed only that the
buffer would not query the blackboard but the queue. If this is easier
to implement - I'll support it!


> New idea2:
> Take the core of juergen's idea and instead of opening up a buffer,
map or a Q in
> addition to the recordstore, place additional information associated
with the
> record not as part of it, so that it is not shared. 
Sorry, but I do not understand what you mean.
Can you please rephrase your statement?

Regards
Igor


Back to the top