[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [smila-dev] RE: FYI :: new feature :: Message Resequencer

> -----Original Message-----
> From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Thomas Menzel
> Sent: Wednesday, October 07, 2009 2:57 PM
> To: Smila project developer mailing list
> Subject: RE: [smila-dev] RE: FYI :: new feature :: Message Resequencer
> 
[...]
>
> What I didn't quite get was:
> > - One of this attributes is written by the router to the record in the
> > queue, the other one must only to be stored in record storage. It's just configuration.
> 
> And
> 
> > - Then a simple pipelet at the start of a pipeline can filter out those
> > records for which these attribute values are not equal (invalidate record on blackboard and
> > do not return its ID in the pipelet result):
> 
> Plz explain this further and how you mean it and how that is going to
> work.

Of course. The agent/crawler sends a record with two attributes containing a "version":

Record
- ID source:42
- Attr "documentVersion" = 1
- Attr "messageVersion" = 1

If such a "version" value cannot be read from the source, the agent/crawler must generate one.
This complete record is written to RecordStorage (actually it is sufficient to store ID and the 
"documentVersion" attribute, but our record storage currently does not support record filtering). 
A queue message is created that contains a filtered version of the record with ID and the 
"messageVersion" attribute:

Queue Message A
  Record
  - ID source:42
  - Attr "messageVersion" = 1

Now an agent again sends this record with another version:

Record
- ID source:42
- Attr "documentVersion" = 2
- Attr "messageVersion" = 2

The router overwrites the version 1 in the record storage with the new one and generates a message containing:

Queue Message B
  Record
  - ID source:42
  - Attr "messageVersion" = 2

Now the message A is received by a listener. The record is loaded from record storage to the blackboard:

Blackboard Record:
- ID source:42
- Attr "documentVersion" = 2 
- Attr "messageVersion" = 2 (if stored in record storage)

then the message record is synced to the blackboard which overwrites the "messageVersion" attribute:

Blackboard Record:
- ID source:42
- Attr "documentVersion" = 2
- Attr "messageVersion" = 1

and a pipelet can recognize that the message does not match the stored document and therefore should not be processed.
This should be possible by removing the record ID from the list of IDs given to the pipelet and returning the reduced
list. For safety the blackboard record should also be invalidated so the "messageVersion" attribute is not changed in
record storage.

Later (or even at the same time), the message B is received by another listener, and after syncing the message record 
to the blackboard we have:

Blackboard Record:
- ID source:42
- Attr "documentVersion" = 2
- Attr "messageVersion" = 2

This is ok, the checking pipelet returns the record ID and the processing continues. 
I hope that makes it a bit clearer.

Cheers,
Juergen.