Bug 343256 - Workspace#buildInternal uses workspace root rule -> Saving while building is broken
Summary: Workspace#buildInternal uses workspace root rule -> Saving while building is ...
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.7   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 3.7 RC1   Edit
Assignee: James Blackburn CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 345596 (view as bug list)
Depends on:
Blocks: 331187 343569
  Show dependency tree
 
Reported: 2011-04-19 06:43 EDT by Jens Elmenthaler CLA
Modified: 2017-12-07 04:51 EST (History)
7 users (show)

See Also:
john.arthorne: review+


Attachments
jstack output (29.64 KB, text/plain)
2011-04-19 06:45 EDT, Jens Elmenthaler CLA
no flags Details
fix + tests (13.56 KB, patch)
2011-04-19 16:59 EDT, James Blackburn CLA
no flags Details | Diff
fix 2 + tests (20.57 KB, patch)
2011-04-20 08:30 EDT, James Blackburn CLA
no flags Details | Diff
alternative fix + test (19.93 KB, patch)
2011-04-20 18:19 EDT, James Blackburn CLA
no flags Details | Diff
minimal fix + tests (17.52 KB, patch)
2011-04-21 05:49 EDT, James Blackburn CLA
no flags Details | Diff
minimal fix + tests 2 (17.52 KB, patch)
2011-04-21 11:15 EDT, James Blackburn CLA
no flags Details | Diff
minimal fix + tests 3 (17.63 KB, patch)
2011-04-22 12:14 EDT, James Blackburn CLA
no flags Details | Diff
minimal fix + tests 4 (19.21 KB, patch)
2011-05-02 10:54 EDT, James Blackburn CLA
no flags Details | Diff
patch for commit (19.19 KB, patch)
2011-05-03 11:10 EDT, James Blackburn CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Elmenthaler CLA 2011-04-19 06:43:07 EDT
Build Identifier: HEAD

If saving a resource while a build is still running, the save operation is blocked until the build is finished. The UI blocks again as it did before Helios.


Reproducible: Always
Comment 1 Jens Elmenthaler CLA 2011-04-19 06:45:09 EDT
Created attachment 193567 [details]
jstack output
Comment 2 James Blackburn CLA 2011-04-19 06:55:58 EDT
Caused by BuildAction building project configuration using Workspace API which uses an overly course build rule...
Comment 3 James Blackburn CLA 2011-04-19 06:58:17 EDT
Should be a straightforward fix: create a multiRule of the rules required to build the configs in the build configuration array.
Comment 4 James Blackburn CLA 2011-04-19 08:24:15 EDT
This is essentially bug 331187.

AFAICS there are two ways of fixing this...

1) Create a MultiRule for all the build commands of all the build configurations being built.
   Advantage: most similar to the existing flow
   Disadvantage 1: this may degenerate to the WR rule frequently
   Disadvantage 2: resolving references, to discover the set of IBuildConfigs would need to happen before acquiring the build scheduling rule, so either we need to handle projects WS being changed concurrently with the computation, or lock / unlock the WS in quick succession during WS build.

2) Acquire the build scheduling rule immediately before running the build command in BuildManager#basicBuild.
   Advantage: very few lines of code change
   Advantage 2: runs each build command with just its required scheduling rules   => allowing more WS concurrency
   Disadvantage: allows more WS concurrency.

I like option 2 because it's the cleanest way to fix this.  And I'm trying to work out if there's any client impact...

From what I can see this doesn't change the accuracy/flow of the build resource deltas: pre + post build, and res change listener.   In particular: using a null scheduling rule between the PRE_BUILD and incProjBuild.build, and again after the end of the build before the POST_BUILD is fired, is equivalent to what we have today.  This is because Project#internalBuild already has a full prepareOperation(wrRule) and endOperation(wrRule) pair between the resource delta generation and running the build.

Note that as its late in 3.7, if we choose 2, we can continue to prepareOperation with the WS rule for Workspace#build(int) & the AutoBuildJob. So the semantics of those existing builds won't change at all...
Comment 5 James Blackburn CLA 2011-04-19 16:59:30 EDT
Created attachment 193622 [details]
fix + tests

Patch for the issue as per comment 4 option 2.

All existing tests continue to pass.

IWorkspace build and IProject build call BuildManager build with a null scheduling rule (WS lock held).  The correct scheduling rule for the builder is acquired immediately before invoking the build.

Project#internalBuild returns to how it was at v 1.150, which IMO is much tidier. Instead of wrapping PRE + POST_BUILD notification in an operation, the WS lock is held throughout.

BuildManager#basicBuild acquires the correct rule for the builder.  It does this having released the WS lock. And re-acquires the WS lock while the scheduling rule is still held.  This shouldn't affect the build delta, nor the build flow, as IProject#build already releases and reqquires the WSR rule and WS lock.

I've beefed up testTwoBuildersRunInOneBuild to assert that the correct scheduling rule is used for each builder.
Comment 6 James Blackburn CLA 2011-04-20 08:30:30 EDT
Created attachment 193687 [details]
fix 2 + tests

Bug fix:
The previous fix introduced a bug whereby changes could sneak in and not be included by the builder delta.

I've added added an additional test to exercise this window, and ensure this doesn't regress in the future...

Other:
Project#internalBuild was wrapped in a workspace.run during bug 198591. There doesn't seem to have been a good reason to do this. The preparedOperations will still be >1, so notification will be deferred, though not indefinitely. 
I've remove this, so #internalBuild is very much closer to how it was in CVS 1.150.

As before all tests pass.
Comment 7 James Blackburn CLA 2011-04-20 12:01:48 EDT
It would be good to get this into M7 and a n-build before then. John / Szymon if you have a chance to take a quick look over the patch, that would be appreciated.
Comment 8 John Arthorne CLA 2011-04-20 15:38:11 EDT
I'm trying to understand if this is a regression since Helios or a limitation we have always had. From what I can tell in HEAD, IProject#build uses a fine-grained rule, but IWorkspace#build uses the build rule (root). This doesn't seem to have changed in Indigo, or am I missing something?

I have briefly reviewed the patch. I agree with the change in principle and the approach looks reasonable. However this kind of change where we are making things more concurrent than before worries me - the problems tend to be subtle and don't always turn up right away. I would have liked to see a change like this much earlier in the dev cycle..
Comment 9 James Blackburn CLA 2011-04-20 15:57:32 EDT
(In reply to comment #8)
> I'm trying to understand if this is a regression since Helios or a limitation
> we have always had. 

The UI BuildAction used to call IProject#build and iteratively build referenced projects. It now uses IWorkspace#build(IBuildConfiguration[],...) and gets the workspace to build the references.  It's the workspace build that uses the WR rule.

> I have briefly reviewed the patch. I agree with the change in principle and the
> approach looks reasonable. However this kind of change where we are making
> things more concurrent than before worries me - the problems tend to be subtle
> and don't always turn up right away. I would have liked to see a change like
> this much earlier in the dev cycle..

Ok, that's fair.

This change will only affect manual build calls.  Auto-build still holds the WS rule.

The other option is to leave things as they are and enhance Workspace#build to create a multi-rule from the referenced configs.  The disadvantage is that this may frequently degenerate to the WR rule.
Comment 10 James Blackburn CLA 2011-04-20 18:19:26 EDT
Created attachment 193763 [details]
alternative fix + test

Start at an alternative fix which makes Workspace#buildInternal look more like Project#internalBuild.  A bunch of tests are broken - looks to do with broken builder handling...

Tests for the specific issue which tests the different mechanisms or running a WS build.
Comment 11 James Blackburn CLA 2011-04-20 18:39:07 EDT
(In reply to comment #10)
>A bunch of tests are broken - looks to do with broken
> builder handling... 

CoreException is logged and not propagated from BuildManager#getRule(...)

Also looks like builder lifecycle has changed, breaking a number of the BuilderTest(s). With this approach builders are instantiated at a different time.  During scheduling rule discovery, before the build is run.  This perturbs builder lifecycle breaking the BuilderTest. 



I was hoping to fix this without changing existing tests, but I'm not sure it's possible using comment 4 approach 1 (alternative fix)...

I'd rather to go fix 2 + tests. If we break any thing in the nightlies, or someone discovers an issue in M7, we can revert the change wholesale in 3.7 and take another crack at it in 3.8.  I don't think the alternative is necessarily less risky...
Comment 12 James Blackburn CLA 2011-04-21 05:49:45 EDT
Created attachment 193795 [details]
minimal fix + tests

A minimal fix which doesn't perturb the current scheduling rule flow except for the special case where a build configuration with relaxed scheduling rules is being built.

So specifically:
 - No change to IProject#build
 - No change to IWorkspace#build
 - IWorkspace#build(IBuildConfiguration[]) builds with relaxed rule if and only if the first build config has a non-Workspace scheduling rule (heuristic).

All tests continue to pass.  New tests added for the bug.
Comment 13 James Blackburn CLA 2011-04-21 11:15:03 EDT
Created attachment 193849 [details]
minimal fix + tests 2

Before running the build, and releasing the WS lock, do:
  workspace.newWorkingTree() rather than currentTree.immutable()
as we're in an operation and releasing the WS lock, we need to ensure that the tree remains mutable.
Comment 14 Szymon Brandys CLA 2011-04-22 11:44:15 EDT
This is unfortunate that we haven't noticed that earlier. It is not a regression in core.resources API, however because of changes in BuildAction we have a regression in behavior that may be noticed by CDT users using relaxed builders. Would be good to address it in 3.7.

The first comment I have is that we should use the most pessimistic rule, not the rule in first configuration
Workspace#buildInternal(IBuildConfiguration[] configs, int trigger, boolean buildReferences, IProgressMonitor monitor) line 447 (after patch is applied). At least in 3.7.
Comment 15 James Blackburn CLA 2011-04-22 12:14:03 EDT
Created attachment 193919 [details]
minimal fix + tests 3

As per Szymon's comment: Workspace#build falls back to WS rule if any of the passed in build configs requires it.
Comment 16 James Blackburn CLA 2011-05-02 10:54:14 EDT
Created attachment 194499 [details]
minimal fix + tests 4

Workspace#buildInternal needs to call PRE_BUILD + POST_BUILD with WR rule.  Holding the WS lock by itself will lead to deadlock if resources are modified, as acquiring rules must always precede acquiring the ws lock.
Comment 17 John Arthorne CLA 2011-05-03 10:26:30 EDT
I have reviewed the fix. It looks good. Release it whenever you are ready.

Hopefully there are real world scenarios in CDT where you can verify that the fix addresses the problem?

One small comment. I am always paranoid about mis-matching scheduling rules, since if that ever happened all locking would break. The pattern I try to use is:

final ISchedulingRule rule = ...;
try {
  prepareOperation(rule,...)
} finally {
  endOperation(rule,...);
}

This way we can be sure the rules always match. In your patch you have code like:

try {
  prepareOperation(getRuleFactory().buildRule(),...)
} finally {
  endOperation(getRuleFactory().buildRule(),...);
}

Now it so happens that the rule factory doesn't currently allow overriding the build rule, so this is safe. But I don't like the pattern because it might be copied elsewhere (other rules can change if the team provider changes). And who knows, maybe some day we would allow the rule factory to override this. So, in your patch I suggest creating a local:

final ISchedulingRule buildRule = getRuleFactory().buildRule();

And use that throughout the buildInternal method (five references to getRuleFactory().buildRule()).
Comment 18 James Blackburn CLA 2011-05-03 11:10:40 EDT
Created attachment 194597 [details]
patch for commit

Patch updated as per your comment, thanks for reviewing.  Attached in case there's need to quickly revert for some reason.

(In reply to comment #17)
> Hopefully there are real world scenarios in CDT where you can verify that the
> fix addresses the problem?

Yes, I've verified this fixes the build action based builder.

There are a few ways of running a build in CDT. For non-managed external builds, I had used the Makefile view (which calls IProject#build) allowing this regression to slip through the net...  

> One small comment. I am always paranoid about mis-matching scheduling rules,
> since if that ever happened all locking would break. The pattern I try to use
> is:

Fixed.

As an aside beginning and ending operations like this seems gross.  I think fixing bug 249951 -- not holding the WS lock while firing a resource change delta -- would fix this and the deadlock issues noted there.  I took a quick look and believe it shouldn't change the semantics of the resource delta at all.
Comment 19 James Blackburn CLA 2011-05-03 11:12:12 EDT
Committed
Comment 20 James Blackburn CLA 2011-05-12 09:40:53 EDT
*** Bug 345596 has been marked as a duplicate of this bug. ***