Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-user] performance degredation with the new processing

So the crawler is finished in 1 minute, but processing takes 40 minutes, and there are no tasks in "todo" and only 1 task is in "inprogress"?
This would mean that all records were added to a single bulk, and processed sequentially in a single bulk, so there would be no parallel processing at all.

Maybe you can reduce the time/size limits of bulkbuilder by adding "bulkLimitTime" and/or "bulkLimitSize" parameters to the job, so that we can sure that multiple bulks are created? See http://wiki.eclipse.org/SMILA/Documentation/Bulkbuilder#Configuration.

If this doesn't help, maybe you could post the result of /smila/jobmanager/jobs/<job-name>/<job-run-id> after the job is finished. Maybe we can see something there.

Cheers,
Juergen.



-----Original Message-----
From: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] On Behalf Of Thomas Menzel
Sent: Thursday, October 13, 2011 11:12 AM
To: Smila project user mailing list
Subject: Re: [smila-user] performance degredation with the new processing

Hi,

So I replaced the obj store with an Memory Impl. While this improved the finishing time of the crawler (it was done for 45k files in a minute, which is just as it was when using AMQ) it did little to improve the overall processing time which is still @ 42 min which is supported by the still short TODO list.

You wrote:
> But if the tasks are created too slowly, scaleUp cannot help anyway.
So what factors control this? And how can I speed it up?



Back to the top