Bug 354993 - running PluginModelManager.UpdateClasspathsJob causes race condition, locks
Summary: running PluginModelManager.UpdateClasspathsJob causes race condition, locks
Status: VERIFIED FIXED
Alias: None
Product: PDE
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 4.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 4.3 M5   Edit
Assignee: Curtis Windatt CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
: 407147 (view as bug list)
Depends on:
Blocks: 398235 399995
  Show dependency tree
 
Reported: 2011-08-17 14:30 EDT by Jeff Nevicosi CLA
Modified: 2013-10-04 15:45 EDT (History)
4 users (show)

See Also:


Attachments
Move setRule() to constructor (857 bytes, patch)
2013-02-11 13:31 EST, Ken Lee CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Nevicosi CLA 2011-08-17 14:30:39 EDT
Build Identifier: 3.6.2.r362_v20110210  (also reproduced with pde.core plugin versions 3.6.3 and 3.7.1)

consider the following:

workspace.run((IWorkspaceRunnable)configRunnable, workspace.getRoot(), IWorkspace.AVOID_UPDATE, null);
workspace.run((IWorkspaceRunnable)buildRunnable, workspace.getRoot(), IWorkspace.AVOID_UPDATE, null);

where configRunnable modifies the Manifest's Required-Bundles (via calls to a given project's IBundleProjectDescription), and buildRunnable builds that project.

After the first run() call, UpdateClasspathsJob is scheduled (to finalize the update, via JavaCore.setClasspathContainer()).  However, since the workspace lock was released once the job was scheduled, not when it completed, there's a race condition, both buildRunnable and the operation within the scheduled job will require a lock.  Frequently, the build begins first & builds with a stale classpath.

Bug 276135 required this call to be scheduled, but this case isn't in the UI thread & we _require_ this job to be completed before we can proceed.

Recommendation:
1) Allow setting of some property to run JavaCore.setClasspathContainer() synchronously.

2) Worst case, if the Job was public, we could add a listener to the job manager to listen for its completion before proceeding.

Reproducible: Sometimes
Comment 1 Curtis Windatt CLA 2011-08-18 15:21:45 EDT
Not sure what the best solution will be here, but we should try and correct the ordering.
Comment 2 Jeff Nevicosi CLA 2012-11-05 10:27:06 EST
This seems to occur much more frequently adopting Eclipse 4.2.1.  Any chance this can be targetted to 4.2.2?
Comment 3 Curtis Windatt CLA 2012-11-05 12:09:04 EST
I don't currently have time to work on this.  Whether a fix was backported depends on the size and risk of the fix.
Comment 4 John Arthorne CLA 2012-11-15 11:09:54 EST
Two possible solutions come to mind:

1) The UpdateClasspathsJob could be given a scheduling rule, likely IWorkspaceRoot. Since it is scheduled first, no other job requiring a resource lock would be able to jump ahead of that job in the queue (such as a build job). The risk is that this locks too much and might reduce opportunities for concurrency, but concurrency during classpath updates doesn't sound like a safe thing to do anyway.

2) UpdateClasspathsJob could have a public job family object (see Job#belongsTo). That would allow an external consumer to join that job family, which would ensure you don't run concurrently. The downside is this requires some work for each client that may be scheduling build jobs to get it right.
Comment 5 Curtis Windatt CLA 2012-11-15 16:49:01 EST
(In reply to comment #4)
> Two possible solutions come to mind:
> 
> 1) The UpdateClasspathsJob could be given a scheduling rule, likely
> IWorkspaceRoot. Since it is scheduled first, no other job requiring a
> resource lock would be able to jump ahead of that job in the queue (such as
> a build job). The risk is that this locks too much and might reduce
> opportunities for concurrency, but concurrency during classpath updates
> doesn't sound like a safe thing to do anyway.

I think this would be a reasonable solution.  We don't want to run synchronously in case we are in the UI thread.  However, blocking any workspace jobs would be unlikely to interrupt the user.

We also have a list of the projects that are having their classpaths updated by the job.  The rule could be crafted based on this instead if we want to block fewer possible operations.
Comment 6 John Arthorne CLA 2012-11-16 16:48:18 EST
(In reply to comment #5)
> We also have a list of the projects that are having their classpaths updated
> by the job.  The rule could be crafted based on this instead if we want to
> block fewer possible operations.

You could try that but I don't know for sure that classpath updates never obtain rules outside that single project.. you could try and see if it works. I would lean towards just using root to be on the safe side - I can't imagine many workspace changes being safe while classpaths are being updated, and it's not a common operation.
Comment 7 Jeff Nevicosi CLA 2013-01-10 13:09:17 EST
Any update on this?  Would like to point out again that this occurs more frequently in our product based on 4.2 & may be a limitation preventing us from adopting it.
Comment 8 Curtis Windatt CLA 2013-01-10 17:05:04 EST
I'm just catching up on PDE UI bugs over the holidays.  

Fixed in master:
http://git.eclipse.org/c/pde/eclipse.pde.ui.git/commit/?id=e73ebd2a876f7553597dd809cf7aaa213531f6f4

It's already unusual for the classpath update to take significant time (though locking the UI for even a second feels brutal).  With this fix, if the user edits the manifest then immediately edits and saves again they'll get the popup saying user operation is waiting.

The fix is very small, but adding locks adds risk.  I forgot that Jeff had requested this to be backported and now 4.2.2 is fast approaching.

John, Dani, do either of you have an opinion on backporting this?
Comment 9 Dani Megert CLA 2013-01-11 03:44:58 EST
(In reply to comment #8)
> I'm just catching up on PDE UI bugs over the holidays.  
> 
> Fixed in master:
> http://git.eclipse.org/c/pde/eclipse.pde.ui.git/commit/
> ?id=e73ebd2a876f7553597dd809cf7aaa213531f6f4
> 
> It's already unusual for the classpath update to take significant time
> (though locking the UI for even a second feels brutal).  With this fix, if
> the user edits the manifest then immediately edits and saves again they'll
> get the popup saying user operation is waiting.
> 
> The fix is very small, but adding locks adds risk.  I forgot that Jeff had
> requested this to be backported and now 4.2.2 is fast approaching.
> 
> John, Dani, do either of you have an opinion on backporting this?

The fix looks right to me but it has the risk of introducing deadlocks. Given RC1 is already built and we even haven't run with this fix a single day, I don't want this to be in SR2. Having said that, I would accept that we add a system property which allows users and products to enable the fix in SR2.
Comment 10 Curtis Windatt CLA 2013-01-15 15:58:46 EST
Cloned for the 4.2.2 fix as Bug 398235.  For 3.8.2 the job rule is added only if a system property "pde.lockWorkspaceForClasspath" is set to true.

The rule is the default in 4.3.
Comment 11 Ken Lee CLA 2013-02-11 13:31:52 EST
Created attachment 226856 [details]
Move setRule() to constructor

Move setRule() to constructor
Comment 12 Ken Lee CLA 2013-02-11 13:32:31 EST
We're experiencing a problem (bug 400238) with the latest Eclipse Kepler M5 because of the fix of this bug.
In our Eclipse Scout project we use the same scheduling rule (ResourcesPlugin.getWorkspace().getRoot()) in a job that is responsible to create a Scout project, i.e. plugins and files are created during this process. This results in several plugin model events getting fired. 
Since the org.eclipse.pde.internal.core.PluginModelManager.UpdateClasspathsJob is scheduled to handle each event with the same rule, upon the first call to updateAffectedEntries() the scheduling rule is set on the instance of UpdateClasspathsJob and scheduled. 
Because of our running job with the same rule, the UpdateClasspathsJob gets into the state WAITING. If a second model event is handled, the same rule is set again on the same instance of UpdateClasspathsJob. In setRule() however, there's a constraint that the state of a job must be NONE, leading to an IllegalArgumentException.  

Since the UpdateClasspathsJob is a private class and the only instance not used outside the org.eclipse.pde.internal.core.PluginModelManager, it should be enough to move the setter of the scheduling rule inside the constructor (see attachment). In this case, the setRule() method is called only once.
This solves the problem in the Eclipse Scout project, but I'm not sure what kind of other side effects this could have.

Any other opinions?
Comment 13 Curtis Windatt CLA 2013-02-11 13:44:04 EST
(In reply to comment #12)
> Any other opinions?

Thanks for pointing out the problem.  We have already fixed the problem, but it was found too late for M5 inclusion.  See Bug 399995.
Comment 14 Curtis Windatt CLA 2013-05-03 10:10:06 EDT
*** Bug 407147 has been marked as a duplicate of this bug. ***
Comment 15 Jeff Nevicosi CLA 2013-10-04 15:45:52 EDT
Validated in 4.2.2