[smila-dev] RE: CrawlerController - ConnectivityManager interaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[smila-dev] RE: CrawlerController - ConnectivityManager interaction

From: Thomas Menzel <tmenzel@xxxxxxx>
Date: Mon, 17 Aug 2009 13:39:37 +0200
Accept-language: en-US, de-DE
Acceptlanguage: en-US, de-DE
Delivered-to: smila-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/smila-dev>
List-help: <mailto:smila-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: AcofHOn0Y0Lja3bGSwaJbH3sUwFb7gAEc08A
Thread-topic: CrawlerController - ConnectivityManager interaction

hi Daniel,

i like that u think about making things faster there…

I also was wondering about switching the DI checking to a batch oriented process as u did with the router and listeners. ATM each record is checked and added singly to the DB and I could imagine that doing sets of N would be faster.

but I don’t know if the code and hence can't tell if it is feasible

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Daniel.Stucky@xxxxxxxxxxx
Sent: Montag, 17. August 2009 11:27
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] CrawlerController - ConnectivityManager interaction

Hi all,

I am a little unsatisfied with the way CrawlerController interacts with ConnectivityManager and the internal Router. As the API is designed, the CrawlerController gets feedback for each invocation of method add(), and internally the ConnectivityManager gets feedback for each invocation of route().

Adding records to the Storages via Blackboard may be a time consuming operation and we have to wait for it to be completed, before we can insert a message to the Queue. This is currently done within a simple loop and all callers are blocked until all records were added (or tried to be added but failed) to the Queue and the return value by each method is generated.

Do we really need the return values in method add() and route() ? I think we should strive for a more asynchronous processing of incoming records in ConnectivityManager to increase throughput. I don’t think that we need this kind of feedback for clients of ConnectivityManager. Errors on single records are still logged in ConnectivityManager and could also be made available (to some extend) via JMX.

Another option could be to use multithreading in the CrawlerController (currently there is only one thread), but that could make crawler implementations more difficult.

Any thoughts or comments ?

Bye,

Daniel

References:
- [smila-dev] CrawlerController - ConnectivityManager interaction
  - From: Daniel.Stucky

Prev by Date: [smila-dev] CrawlerController - ConnectivityManager interaction
Next by Date: [smila-dev] Use-Cases for Aperture subsets
Previous by thread: [smila-dev] CrawlerController - ConnectivityManager interaction
Next by thread: [smila-dev] Use-Cases for Aperture subsets
Index(es):
- Date
- Thread

Breadcrumbs