Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [smila-dev] RE: FYI :: new feature :: Message Resequencer

Hi juergen,

thx for the effort to explain it but i'm still not a sure if I really got it now.

Let me therefore rephrase your idea in my words so that you can see if understand it correctly:

the core idea is to have two attributes or annotation in the record which signal to a pipelet that a newer version is present if they are different. The pipelet then can skip processing by omitting it from the returned Id[].

Correct?

My understanding so far is: 
1. that we have a shared record 
2. we have JMS messages containing the serialized version of the record or at least parts of it plus some JMS props.
3. when the message is taken from the Q, the record is parsed and synced, so that the BB/RS always has the same picture as the pipeline.
4. the syncing then will lead to the diff. version numbers which is detectable
5. rest as above ...

For this to work, I think the fowling is assumed:
It is illegal to include the documentVersion into the JMS message as it would void the detection mechanism/ only the router is allowed to set this.
Correct?

How does this solution work in a concurrent situation where processing is spread over several pipelines and thus several JMS messages pointing to the same record? 

Don't we run into concurrency problems when we have a listener L1 that reads V1 and syncs it but a concurrent listener L2 does the same just after but before the pipelet of L1 makes the check?

(I think Daniel was referring to this too) 


Kind regards
Thomas Menzel @ brox IT-Solutions GmbH


> -----Original Message-----
> From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
> bounces@xxxxxxxxxxx] On Behalf Of Juergen.Schumacher@xxxxxxxxxxx
> Sent: Mittwoch, 7. Oktober 2009 17:00
> To: smila-dev@xxxxxxxxxxx
> Subject: RE: [smila-dev] RE: FYI :: new feature :: Message Resequencer
> 
> > -----Original Message-----
> > From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
> bounces@xxxxxxxxxxx] On Behalf Of Thomas Menzel
> > Sent: Wednesday, October 07, 2009 2:57 PM
> > To: Smila project developer mailing list
> > Subject: RE: [smila-dev] RE: FYI :: new feature :: Message
> Resequencer
> >
> [...]
> >
> > What I didn't quite get was:
> > > - One of this attributes is written by the router to the record in
> the
> > > queue, the other one must only to be stored in record storage. It's
> just configuration.
> >
> > And
> >
> > > - Then a simple pipelet at the start of a pipeline can filter out
> those
> > > records for which these attribute values are not equal (invalidate
> record on blackboard and
> > > do not return its ID in the pipelet result):
> >
> > Plz explain this further and how you mean it and how that is going to
> > work.
> 
> Of course. The agent/crawler sends a record with two attributes
> containing a "version":
> 
> Record
> - ID source:42
> - Attr "documentVersion" = 1
> - Attr "messageVersion" = 1
> 
> If such a "version" value cannot be read from the source, the
> agent/crawler must generate one.
> This complete record is written to RecordStorage (actually it is
> sufficient to store ID and the
> "documentVersion" attribute, but our record storage currently does not
> support record filtering).
> A queue message is created that contains a filtered version of the
> record with ID and the
> "messageVersion" attribute:
> 
> Queue Message A
>   Record
>   - ID source:42
>   - Attr "messageVersion" = 1
> 
> Now an agent again sends this record with another version:
> 
> Record
> - ID source:42
> - Attr "documentVersion" = 2
> - Attr "messageVersion" = 2
> 
> The router overwrites the version 1 in the record storage with the new
> one and generates a message containing:
> 
> Queue Message B
>   Record
>   - ID source:42
>   - Attr "messageVersion" = 2
> 
> Now the message A is received by a listener. The record is loaded from
> record storage to the blackboard:
> 
> Blackboard Record:
> - ID source:42
> - Attr "documentVersion" = 2
> - Attr "messageVersion" = 2 (if stored in record storage)
> 
> then the message record is synced to the blackboard which overwrites
> the "messageVersion" attribute:
> 
> Blackboard Record:
> - ID source:42
> - Attr "documentVersion" = 2
> - Attr "messageVersion" = 1
> 
> and a pipelet can recognize that the message does not match the stored
> document and therefore should not be processed.
> This should be possible by removing the record ID from the list of IDs
> given to the pipelet and returning the reduced
> list. For safety the blackboard record should also be invalidated so
> the "messageVersion" attribute is not changed in
> record storage.
> 
> Later (or even at the same time), the message B is received by another
> listener, and after syncing the message record
> to the blackboard we have:
> 
> Blackboard Record:
> - ID source:42
> - Attr "documentVersion" = 2
> - Attr "messageVersion" = 2
> 
> This is ok, the checking pipelet returns the record ID and the
> processing continues.
> I hope that makes it a bit clearer.
> 
> Cheers,
> Juergen.
> 
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.420 / Virus Database: 270.14.4/2417 - Release Date:
> 10/07/09 05:18:00


Back to the top