Re: [smila-user] Job modes for crawler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [smila-user] Job modes for crawler

From: Andreas Schank <andreas.schank@xxxxxxxxxxxxx>
Date: Fri, 9 Mar 2012 08:38:28 +0100
Accept-language: de-DE
Acceptlanguage: de-DE
Delivered-to: smila-user@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/smila-user>
List-help: <mailto:smila-user-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/smila-user>, <mailto:smila-user-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/smila-user>, <mailto:smila-user-request@eclipse.org?subject=unsubscribe>
Thread-index: Acz9R5LPoYLoB3PgRpiWflkjNJO85wAfVsug
Thread-topic: [smila-user] Job modes for crawler

Hi Nick,

The filesystem crawler job doesn’t work like that. It simply crawls the file system and forwards files that are changed or new to the bulkbuilder. There is no component that monitors filesystem changes and reacts on them.

You’d have to regularly trigger the crawl job (in runOnce mode) to react to changes in the filesystem.

The filesystem crawler will then crawl all files but only the new or changed ones would be forwarded or no longer present ones would be deleted (unless you clean your delta store information for your source).

Also be sure not to set the parameter “deltaImportStrategy” to anything but “full” if you want to use this delta importing feature mentioned above.

The job run modes don’t have anything to do with the way file system changes are located. Be sure you always use “runOnce” for crawling jobs, otherwise nothing’s going to happen at all, because only in runOnce mode the initial task is created to trigger the crawling.

You can find more information on the file system crawler, the delta-importing strategy or the way the import jobs work at http://wiki.eclipse.org/SMILA/Documentation#Importing.

Bye

Andreas

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von Nick
Gesendet: Donnerstag, 8. März 2012 17:21
An: Smila project user mailing list
Betreff: [smila-user] Job modes for crawler

Hi,

I'm trying to configure the Filesystem crawler Job in order to crawl a directory, and when finishes to crawl all the files present the Job does not finishes but remains alive waiting for new files that might be inserted in the directory.
With "runOnce" mode I crawl all the initial files of the directory, but when adding a new file the crawler does not react.
I have tried to change the workflow mode from "runOnce" to "standard", but (whatever taskgenerator mode I assign to the worker) the crawler Job does not crawl any file.

Which is the correct configuration for my use case?

Thank you,

Nicolò Aquilini

References:
- [smila-user] Job modes for crawler
  - From: Nick

Prev by Date: [smila-user] Job modes for crawler
Next by Date: [smila-user] Input buckets for a workflow
Previous by thread: [smila-user] Job modes for crawler
Next by thread: [smila-user] Input buckets for a workflow
Index(es):
- Date
- Thread

Breadcrumbs