Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[smila-dev] CrawlerController - ConnectivityManager interaction

Hi all,

 

I am a little unsatisfied with the way CrawlerController interacts with ConnectivityManager and the internal Router. As the API is designed, the CrawlerController gets feedback for each invocation of method  add(), and internally the ConnectivityManager gets feedback for each invocation of route().

 

Adding records to the Storages via Blackboard may be a time consuming operation and we have to wait for it to be completed, before we can insert a message to the Queue. This is currently done within a simple loop and all callers are blocked until all records were added (or tried to be added but failed) to the Queue and the return value by each method is generated.

 

Do we really need the return values in method add() and route() ?  I think we should strive for a more asynchronous processing of incoming records in ConnectivityManager to increase throughput. I don’t think that we need this kind of feedback for clients of ConnectivityManager. Errors on single records are still logged in ConnectivityManager and could also be made available (to some extend) via JMX.

 

Another option could be to use multithreading in the CrawlerController (currently there is only one thread), but that could make crawler implementations more difficult.

 

Any thoughts or comments ?

 

Bye,

Daniel


Back to the top