Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-dev] RE: FYI :: new feature :: Message Resequencer

+1

Cheers
Igor



> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im
> Auftrag von Juergen.Schumacher@xxxxxxxxxxx
> Gesendet: Mittwoch, 7. Oktober 2009 12:24
> An: smila-dev@xxxxxxxxxxx
> Betreff: RE: [smila-dev] RE: FYI :: new feature :: Message Resequencer
> 
> Hi,
> 
> > BTW: I would be very happy if other team members would join this IMO
> > important discussion. Guys, please participate!
> 
> Sorry, this discussion started in my vacation, and I'm buried in other work
> currently,
> so I had some problems catching up. And still, I think that I do not completely
> understand
> the solution proposed in the wiki page. But when reading it a different (but
> problaby similar)
> solution came to my mind, that could probably work without the need for extending
> APIs,
> setting up additional queues:
> 
> - An agent/crawler could set two attributes (or annotations?) with the same value
> that somehow
> identifies the event, e.g. the last-modified-timestamp for documents from a file
> system,
> or the document version for documents coming from some real CMS. Or even just a
> string composed
> from an agent/crawler-UUID plus some simple counter value. If the data source
> delivers document
> metadata that can be used for this, it's just configuration. For other data
> sources, an agent/crawler
> would have to generate something.
> - One of this attributes is written by the router to the record in the queue, the
> other one
> must only to be stored in record storage. It's just configuration.
> - Then a simple pipelet at the start of a pipeline can filter out those records
> for which these
> attribute values are not equal (invalidate record on blackboard and do not return
> its ID in the
> pipelet result): If the values are not equal, it must be because another event has
> been generated
> for this document which has changed the "version attribute" in the record storage,
> but not in the
> currently processed event. So the current event is obsolete and can be discarded.
> 
> Yes, this solution only works when a record storage is active, but all other
> solutions also
> need some additional storage like additional queues, too. It would even be
> possible to create
> a simple record storage implementation that only stores document ID and "version
> attribute" in
> a small database table, and send all other document metadata in the queue message,
> if one is
> concerned about the resource requirements.
> 
> What do you think about this?
> I may not be able to always answer immediately in this discussion, but eventually,
> I will (-;
> 
> Cheers,
> Juergen.
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev


Back to the top