Bug 206664 - JobManager.beginRule() should allow addition of non-contained rules
Summary: JobManager.beginRule() should allow addition of non-contained rules
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Runtime (show other bugs)
Version: 3.3.1   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: platform-runtime-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-17 14:05 EDT by Kevin Bracey CLA
Modified: 2019-09-06 16:04 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Bracey CLA 2007-10-17 14:05:13 EDT
At present, beginRule() only allows a rule that is contained within the current outer beginRule(). This seems to be unnecessarily restrictive.

Consider, for example, a system that has a number of jobs. If they have no scheduling rule, then they can all run concurrently. If they happen to make a call that manipulates resources, then the resource-accessing parts of the code can call IWorkspace.run or beginRule() to obtain a resource lock, and then release it. It all works well.

But if the system wants to fit those same jobs with a mutex then life gets difficult. The FAQ suggests a simple scheduling mutex as:

class Mutex implements ISchedulingRule {
    public boolean contains(ISchedulingRule rule) {
        return this == rule;
    }
    public boolean isConflicting(ISchedulingRule rule) {
        return this == rule;
    }
}

Such a mutex is pretty useless in practice though. If the same jobs are scheduled with this mutex, then the jobs can no longer make the calls that manipulate resources, as the code that calls beginRule() will throw an exception, as the resource rules are not contained within the mutex.

Is there any logic to this? By allowing beginRule() in a job with no rule, you are allowing a job's lock scope to extend during run from null to a specified rule (and it will block at beginRule while waiting for other jobs to prevent conflicts). Thus there's no absolute requirement that a Job acquires all its locks up-front.

So why should a job with the simple mutex lock above not be able to extend its lock scope during run in the same way? If the new beginRule isn't contained (and doesn't conflict?), then the job's rule should be set to MultiRule.combine(oldRule, newRule), blocking for newRule if necessary.

Clearly the implementation of the rule stack becomes harder, but there would be greatly increased flexibility in scheduling rule definition.

Or is there some reason why such a scheme cannot work?
Comment 1 Kevin Bracey CLA 2007-10-18 07:00:08 EDT
Okay, I've done a bit more digging, and realised that the restriction is to prevent potential deadlocks. But it's still unduly restrictive. One potential loosening that occurs to me is to allow rules to be specified as non-conflicting by an extension of ISchedulingRule.

Here's a possible approach, using ordered rules.

Add an interface ISchedulingRuleLevel that provides "int ruleLevel()". This returns a fixed scheduling level, constant for that rule.If an ISchedulingRule doesn't implement ISchedulingRuleLevel, then the level is 0.

The interface contract is:
1) a rule can only conflict with or contain another rule of the same level.
2) a rule can only begin if:
   a) there is no existing rule, or
   b) the new rule is contained by the old rule, or
   c) the new rule has a lower level than the old rule.

My mutex could then be implemented as:

class Mutex implements ISchedulingRule,ISchedulingRuleLevel {
    public boolean contains(ISchedulingRule rule) {
        return this == rule;
    }
    public boolean isConflicting(ISchedulingRule rule) {
        return this == rule;
    }
    public int ruleLevel() {
        return 100;  // level for resources is 0
    }
}

Then that high-level mutex could be used as a Job scheduling rule (in which case the Job could then lock resources while running), or could be using in beginRule as long as no resource was currently locked. Deadlocks are not possible.

You could also define a low-level mutex which would have a low rule level, and that could be claimed while a resource was claimed, but no resource could be claimed while the mutex was claimed.

There need to be some refinements - particularly guidelines on levels. But I think level constants are a better approach than a comparator method, as it doesn't require up-front knowledge of other rules that may not have even been written yet.

Also, it will be complicated by the need to extend this scheme to MultiRules, which also will be the result of combining old and new rules on a job. A MultiRule will need a level range, rather than just a single level. My first thought was that ruleLevel should instead return a SortedSet<Integer>, but I've been put off that by the fact that Collections.singleton() returns a Set not a SortedSet. Actually simple min-max level pair would suffice, but how to make that neat for normal rules which just want a single level?

Comment 2 John Arthorne CLA 2007-11-29 17:46:48 EST
You are correct that the restriction exists in order to make deadlock between scheduling rules impossible. The space of scheduling rules forms a tree, where each rule is a node in the tree, and contains the scheduling rules for all lower levels. When you acquire a scheduling rule, you implicitly acquire the entire subtree of scheduling rules.

How does your proposed system guarantee mutual exclusion?  Say two threads are running concurrently, one with a rule at level 100, and another with a rule at level 101. It seems they are both allowed to concurrently acquire the same rule at level 0, thus destroying the mutual exclusion property of that rule.
Comment 3 Kevin Bracey CLA 2007-11-30 04:57:23 EST
The mutual exclusion rules in my proposal would work same as currently. In your example, the second thread to claim the rule with level 0 would block because its isConflicting() would say that it conflicts with the copy of itself already claimed. But the second thread would be able to claim a different rule at level 0 that didn't conflict.

The levels are an extension to the current scheme, not a replacement. isConflicting() and contains() still work the same way.

At the present all rules have the same level, so the contract I gave above reduces to:

1) a rule can conflict with any other rule.
2) a rule can only begin if:
   a) there is no existing rule, or
   b) the new rule is contained by the old rule

Which is the current system. I'm just extending the contract to allow the new case 2(c) - being able to claim different /types/ of rules sequentially within a thread. But you still can only claim a rule if it doesn't conflict with any of the rules currently claimed in the system.

I'm not an expert in the field, but my online research suggests that this form of lock ordering is a well-known way of preventing deadlocks.

You still wouldn't be allowed to claim multiple resource rules sequentially (which are at the same level), as they would be unordered and could lead to deadlock. (Unless the new rules are contained, case 2b).

It could be that my use of the term "level" isn't a good idea, given that, as you say, the existing system can be considered as providin "level"s of locks within the resource tree, or whatever.

My suggestion is just an ordering between different types of locks - a simple job semaphore would be ordered before all resource locks, so you would be allowed to claim it before claiming any resources, but not after. A low-level hardware-type semaphore would be ordered after all resource locks, so you would be allowed to claim it after locking resources, but not before.
Comment 4 John Arthorne CLA 2007-11-30 17:31:37 EST
Lock ordering works by eliminating the possibility of circular wait. The scheduling rule system isn't a lock ordering scheme - it instead eliminates the possibility of hold and wait - it's illegal to hold a rule while waiting for a rule.  What you are suggesting introduces the possibility of hold and wait.  Either system is fine for preventing deadlocks (eliminating circular wait or eliminating hold and wait), but I don't think it's possible to combine them. You'll end up either violating mutual exclusion, or you'll allow the possibility of deadlock.

Of historical interest, I did also attempt a locking scheme based on lock ordered - you can see a remnant of this in the fact that the implementation of ILock is called OrderedLock.  The system breaks down in a purely extensible world like Eclipse. It's fine if a party knows about all the other locks it wants to acquire, and it knows their rank, so it can choose a lock rank that is higher than them all.  However, it fails when you need to co-exist with other locks you know nothing about.  For example, JDT knows about platform, so it can set its lock to be "1" and resources is "0". However, CVS also knows about resources, but doesn't know about JDT, so it also chooses "1". You can now have circular wait deadlock between the two level "1" locks.  If you force all locks to have a unique rank, then a thread owning either JDT or CVS lock, and attempting to acquire the other one, would violate the lock ordering principle.
Comment 5 Kevin Bracey CLA 2007-12-04 05:57:29 EST
As far as I can tell, my scheme eliminates circular wait. You are only permitted to claim a new resource if its rank is strictly lower than the current claim. In the case of rules with equal rank, there is a hold-and-wait prohibition on that rank. I have yet to see an example of how this could fail.

Definition of a strict rank ordering is indeed the problem in this scheme. Obviously one would start with a wide spacing allowing people to "slot in". Two independently developed rules in different plug-ins may inadvertently end up with an undesirable ranking order relative to each other, but then again they wouldn't be using each other's locks, so it wouldn't necessarily be much of a problem, unless they're both exposing API that uses the locks internally.

Really, a plug-in just needs to choose ranks that are sensible relative to the more underlying platform/platform locks their threads would wish to acquire, either directly or through platform calls. Their ranks relative to other plug-ins are not that important.

I'd envisage a central "registry" in the platform where rank in use were declared for documentation purposes.

I feel the situation would be better than the current one, where you cannot in practice use any sort of rule of your own to make your jobs execute in sequence, as the jobs will almost inevitably make platform calls which access resources, and thus will try to acquire resource locks - not currently allowed at all. With this scheme, you could choose a rank for your job rule higher than any underlying platform call you think it likely your jobs will make. On reflection, I suppose for a job scheduling rule, there could be a notional RANK_MAX=INT_MAX value that jobs could use for their scheduling rule, as the rule for starting a job is by definition the first rule that would be claimed on that thread.

In relation to your final example, a deadlock would not arise, because if CVS and JDT locks were both rank 1, then a thread holding a CVS lock would not be permitted to acquire a JDT lock, and vice-versa. This is the same hold-and-wait prevention that already exists. The problem is no worse than the current scheme, where you indeed can't acquire a CVS lock then a JDT lock. But as long as the plug-in designers were prudent enough to choose different ranks then you'd now be able to do it, as long as you claimed them in the prescribed order.

We move from a situation where you can't claim any lock after any other lock, and where you're liable to find your code failing the moment you use a lock due to the unknown locks the calls you make use, to a situation where at least there's a certain amount of order and predictability about the lock interactions.

I appreciate that such a scheme adds complexity, but I feel it is worthwhile comparable to the gain. The deadlock prevention built into the system seems like a very harsh mistress - she says "no" at the slightest provocation, so many attempts to use straightforward locks turn out to be prohibited.

I just found a concise view on "hold-and-wait" on Wikipedia, with which I concur:  "The 'hold and wait' conditions may be removed by requiring processes to request all the resources they will need before starting up (or before embarking upon a particular set of operations); this advance knowledge is frequently difficult to satisfy and, in any case, is an inefficient use of resources." That is exactly the problem I've hit, and I find myself needing to claim the entire resource tree in my jobs just-in-case. And other people's code I've seen tends to do this too.

The proposed scheme would loosen up the strictures. Care would still be required - there would still be many things that would be prohibited - but it would be more feasible to chart a course through to the desired result.
Comment 6 Eclipse Webmaster CLA 2019-09-06 16:04:11 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.