Before we go deeper
into the technical discussion, I would like to clear some things first:
1. We are actually talking about the use
case/issue which can be shortly defined as: “The execution order of
operations on _one_ particular record _does_ matter.”
2. We want to be highly scalable. This implies
that in general we always have more than one queue consumer. (Single queue
consumer is only a special case.)
3. It is not important just to execute operations
in the right order (ADD and then DELETE). Even more important is _not to execute
superfluous operations_ at all: Do not add something if it should be deleted
right after that!
4. We want to buffer user’s actions for at
least a _couple of minutes_.
5. Some people reading this discussion may ask
themselves “How important is this use case at all?”, so let’s
rate it:
·
Occurrence:
very rare
i.
This
issue _does not_ occur when crawling some data source. (Crawling is the
most common use case.)
ii.
This
issue only occurs if the data source has been monitored by an agent _and_
the user is doing “ADD” (or “UPDATE”) and subsequently
almost instantly an “DELETE” operation on the document (data set
that is represented later as a record in SMILA). The chances that this happens
are very low.
·
Awareness:
quite high
i.
We have
been aware of this issue for more than a year. We discussed it and (after a
short analysis) agreed that it should be addressed in the connectivity module
by a component called “Buffer”.
·
Relevance:
low
i.
Since
this issue occurs very rare, it can be generally rated as “low”.
Can we agree on these
statements?
If not, please
correct me or add what’s being missing.
Cheers
Igor