Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-dev] How to best use JobGroups in a recursive task?

Hi Mickael,

I think a job per recursive step is too fine-grained here. I would suggest something like this:

- A single thread-safe queue instance of directories to process
- Each work pulls a directory from the queue, processes that directory, and inserts children into the queue.
- Workers continue until queue is empty.
- Workers would need to hold queue mutex until children are added to avoid premature worker death (lock queue, pop dir, push children, unlock queue, process dir)
- Have a fixed size pool of workers processing the queue, tuned based on performance measurements to see what is optimal size (probably near to number of cores on the machine)

John




From:        Mickael Istria <mistria@xxxxxxxxxx>
To:        platform-dev@xxxxxxxxxxx
Date:        04/20/2015 11:17 AM
Subject:        [platform-dev] How to best use JobGroups in a recursive task?
Sent by:        platform-dev-bounces@xxxxxxxxxxx




Hi all,

In the context of
https://wiki.eclipse.org/E4/UI/Smart_Import , the main operation that crawls the filesystem to identify patterns and then configure Eclipse projects.
I believe that, when on a directory, crawling and configuring direct children can be treated in parallel. I was thinking of using JobGroups to enable parallelism for this task, and more specifically, I was thinking that I could simply make this loop parallel:
http://git.eclipse.org/c/e4/org.eclipse.e4.ui.git/tree/bundles/org.eclipse.e4.ui.importer/src/org/eclipse/ui/internal/wizards/datatransfer/OpenFolderCommand.java#n158 . It's the one that iterates on children.
However, since it's in a recursive operation, I don't believe I can assign 1 job for each iteration, or it will assign all thread to "parent" tasks, without leaving some for children. Example:

Imagine this filesystem structure:
* P
** P1
*** P1a
*** P1b
** P2
*** P2a
*** P2b
** P3
*** P3a
....
And that I use 2 threads to throttle my jobs. Then those 2 threads will be assigned to the job responsible of crawling P1 and P2, and when trying to work on P3, P1a, P1b, P2a or P2b, the new job for this branch will wait for a new thread forever.
I'm not a parallelism expert, so that may be a dumb question, but how can I take benefit from throttling in a recursive algorithm?
A naive implementation would be to only parallelize the execution for the 1st-level branches, so that P1 // P2 // P3, and keep the subdirectories on the same thread, but this is obviously sub-optimal in case P1 and P2 finish quite fast, sub-jobs of P3 should be allowed to use the additional thread.

Cheers,

--
Mickael Istria
Eclipse developer at
JBoss, by Red Hat
My blog - My Tweets_______________________________________________
platform-dev mailing list
platform-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/platform-dev

Back to the top