Bug 509557 - Add deadlock protection and inheritance of scheduling rules to Job.join
Summary: Add deadlock protection and inheritance of scheduling rules to Job.join
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 4.6   Edit
Hardware: PC Linux
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: platform-runtime-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-20 22:53 EST by Stefan Xenos CLA
Modified: 2017-06-23 03:05 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Xenos CLA 2016-12-20 22:53:23 EST
Currently, Job.join has two serious problems:

Consider the two jobs which I'll call Builder and Indexer. Builder holds a scheduling rule R while invoking join on job Indexer.

1. It circumvents the deadlock protection of the rest of the Jobs framework by creating a bunch of implicit locks that the rest of the deadlock protection system is unaware of. Specifically, if the Indexer job tries to obtain a scheduling rule that conflicts with R, it can deadlock and the job manager won't detect it.

2. It doesn't permit the Indexer job from using R, when doing so would be perfectly safe. The fact that Builder holds R and is blocked on Indexer guarantees that no other threads could possibly make use of R. Currently this deadlocks but it should really succeed.

I'd like to propose the following algorithm:

A1. When Builder joins Indexer, the call to join with throw an exception if Builder and Indexer hold scheduling rules and they conflict. The logic for checking for conflicting rules would be exactly the same as if the rules currently held by Indexer were currently free and Builder was attempting to acquire them by a call to beginRule(...)

A2. If Indexer obtains any additional rules, we check the rules currently held by Indexer for conflicts like we do now... but we additionally perform the same check on every thread which has currently joined Indexer. If any of the threads that have currently joined Indexer would - independently - conflict with the newly-obtained rule, the attempt to obtain the rule will fail. That is, both the thread that obtains the rule and all of the threads that have joined it should have been able to independently obtain the newly acquired rule via a call to beginRule. If any of the threads currently hold a rule that contains the new rule, Indexer is immediately granted access to that rule without blocking.

A3. If Builder joins Indexer with a timeout and the timeout expires, if Indexer is currently using any rules contained in the rules owned by Builder then Builder continues to block until all such rules are released by Indexer. If Indexer holds no such rules, Builder stops blocking immediately after the timeout period as it does now.

That pretty much covers the rules in general. Now for some use-cases. Let's compare the current behavior with the proposed behavior:


U1. Builder holds no rules, joins Indexer which obtains R

This succeeds with or without the modified join


U2. Builder holds R, joins Indexer which obtains R

With the current behavior this deadlocks. With the modified join, the operation succeeds.


U3. Builder hold a rule that contains R, joins Indexer which obtains R

With the current behavior this deadlocks. With the modified join, the operation succeeds.


U4. Builder holds a rule that conflicts with R, joins Indexer before Indexer obtains R

With the current behavior this deadlocks. With the modified join, Indexer will throw an exception from beginRule - preferably with a message containing additional information about the stack trace for Builder.


U5. Builder holds a rule that conflicts with R, joins Indexer after Indexer obtains R

With the current behavior this deadlocks. With the modified join, Builder will throw an exception from the call to join - preferably with a message containing additional information about the stack trace from Indexer.


U6. Builder holds a rule that conflicts with R, joins Indexer after Indexer releases R

With the current behavior this succeeds. With the modified join this succeeds. Unfortunately, in this case success is undesirable and an exception would be preferred. However, this behavior is no worse with the proposed algorithm than with the current behavior.


Example:

Another way to think of the proposed change to join(): we'd be making join behave exactly the same as code invoked synchronously from within the same thread. In other words, the following would be equivalent wrt inheritance of scheduling rules:

// Example 1
Job someJob = Job.create("someJob", monitor -> doSomething(monitor));
someJob.schedule();
someJob.join(myMonitor);

// Example 2
doSomething(myMonitor);

With the modified join, any combination of rules (obtained by the caller and the doSomething method) that would have worked in example 2 will also work in example 1. The difference is that there may be some failure cases that fail intermittently in example 1 but fail unconditionally in example 2... but an intermittent exception is still an improvement over an intermittent deadlock.
Comment 1 Stefan Xenos CLA 2016-12-20 22:56:18 EST
This shouldn't break any code that doesn't currently deadlock, since the only cases where behavior changes are those cases that currently pose a deadlock risk.