Re: AW: [smila-dev] Controlling Tasks Order Concept

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: AW: [smila-dev] Controlling Tasks Order Concept

From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Date: Fri, 10 Oct 2008 19:05:16 +0700
Delivered-to: smila-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/smila-dev>
List-help: <mailto:smila-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Thunderbird 2.0.0.17 (Windows/20080914)

Hi guys,

Just had a long discussion with Marius by Skype and want to summarize.

It may be two types of solutions based on one key statement.
This statement may be is shortly described by one question.

When Record object passed into "Processor",is it contains complete Record data or it may be partial?


Sample of partial data may be explained on the next sample.

Two agents collects data from database tables for one Record

table [person] (id, name)  - trigger on update linked with Agent A

table [person_address] (id, person_id, address) - - trigger on updatelinked with Agent B

Agents A and B collects tables changes and send it to processing, bothof them collects data for one object "Person".

when Record contains partial data for Person.

I'm not sure that partial records supporting required.

If its not required, and Record contains complete data, then it possibleto use timestamp for rejecting old records.

Otherwise records for one ID should be processed synchronouslyone-by-one. Organizing of locks for synchronous one-by-one processingwill be performance blocker and its may cause some dead-locks onRecords. And, imho, almost all MQ asynchronous processing benefits willbe lost.


Any ideas, opinions?

--
Regards, Ivan

Ivan Churkin wrote:

Hi Folks,

Many thanks Daniel, Allan and Marius for feedbacks.

Will try to explain problem in detail.
DeltaIndexingManager will blocks concurrent data-source usage but itis not solving problem.
Basically the problem relates to cooperation of two main modules.

First of them is
"Record Producer" = "Crawler" + "Crawler Controller" + "DeltaIndexing"
The second is
"Record Processor"  = "Router" + "Listener" + "BPEL engine"
"Producer" blocks concurrent usage of data-source by delta-index, soit is synchronous relating data-sourceIMO, this blocking works only for Crawlers, but it should be changedwhen Agents will be added.
A good sample of Agent is database trigger. It's not good to blocks it.
"Processor" is absolutely asynchronous. Basically, it works with somebig Record dump. It process records by configured Rules.Processing time may be quite long and it may consist of many steps,when Record put again and again in Queue after each operation.
Even for Crawler mode only, It may be easy occurs situation when
"Producer" twice synchronously crawls data-source but "Processor"still not starts to producing them.After that, it may occurs that different Listener threads catchRecords from queue with the same ID (from different crawls).
And they will try asynchronously process it.
BWT: after the second crawl Record will be replaced in Blackboardcache by the last one, but in queue it will be two processes started.And I cannot imagine what may happens finally :(.
As regarding Buffer and adding support and checks of processing-statusfor each ID, it's a forcing of synchronization by ID
I beware that it may fall down productivity and makes dead-locks problem.


--
Regards, Ivan


Daniel.Stucky@xxxxxxxxxxx wrote:
Hi Ivan,
in the existing concepts the so called Buffer of ConnectivityManager(seehttp://wiki.eclipse.org/SMILA/Project_Concepts/Connectivity#Buffer_.28P2.29)was meant to deal with these problems.
Some more thoughts:
- do we really want to allow concurrent usage of agents and crawlerson the same datasource ? If so we also have to adopt the currentusage of DeltaIndexinManager, as it blocks concurrent usage.
- I agree that there are scenarios where race conditions occur, but Ialso claim that these are special cases that do not happen all thetime. So in my eyes the standard use case has the be optimized inregards to performance, these special cases have to be optimized inregards to robustness. The handling of these special cases should nothave any (or as less as possible) impact on the standard cases.
- asynchronous processing of different records is OK, asynchronousprocessing of the same record is NOT OK (it may lead to corrupt data)
- this is a highly complex functionality, I think we have to discussit in greater detail. We should list the uses cases and how we expectSMILA to handle them. Then we can discuss a technical solution.
- I also think that we need some mechanism to identify that theprocessing of a record has finished, either successfully or not (itthen may be moved to a dead-letter-queue). E.g. it may be needed ifevents should be triggered after processing
Bye,
Daniel
-----Ursprüngliche Nachricht-----
Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
bounces@xxxxxxxxxxx] Im Auftrag von Ivan Churkin
Gesendet: Donnerstag, 9. Oktober 2008 15:15
An: Smila project developer mailing list
Betreff: Re: [smila-dev] Controlling Tasks Order Concept

Hey guys,

Give me some feedback, please ;)
This is very significant problem of architecture...
Now problem is not visible because we only manually starts one Crawler.
It becomes very actual when Agents will be added.

The page contains my ideas for solution only. Unfortunately
documentation for every case will costs time.
If my explanations was not good and its required to write complete
documentation about also inform me.
------------------------------------------------------------------------
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev

References:
- [smila-dev] Controlling Tasks Order Concept
  - From: Ivan Churkin
- Re: [smila-dev] Controlling Tasks Order Concept
  - From: Ivan Churkin
- AW: [smila-dev] Controlling Tasks Order Concept
  - From: Daniel.Stucky
- Re: AW: [smila-dev] Controlling Tasks Order Concept
  - From: Ivan Churkin

Prev by Date: Re: [smila-dev] OutOfMemoryException during Crawl
Next by Date: AW: [smila-dev] OutOfMemoryException during Crawl
Previous by thread: Re: AW: [smila-dev] Controlling Tasks Order Concept
Next by thread: AW: [smila-dev] Controlling Tasks Order Concept
Index(es):
- Date
- Thread

Breadcrumbs