Bug 530649 - Allow workspace.build(...) to build independent projects in parallel
Summary: Allow workspace.build(...) to build independent projects in parallel
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 4.7   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.8 M6   Edit
Assignee: Mickael Istria CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy, performance
Depends on:
Blocks: 527212 531580
  Show dependency tree
 
Reported: 2018-02-02 06:10 EST by Mickael Istria CLA
Modified: 2021-01-05 13:33 EST (History)
8 users (show)

See Also:


Attachments
Screenshot (81.49 KB, image/png)
2018-02-19 04:04 EST, Karsten Thoms CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mickael Istria CLA 2018-02-02 06:10:49 EST
The workspace.build(...) method and related settings/preferences should allow for parallel builds of unrelated projects/build configuration.
Comment 1 Eclipse Genie CLA 2018-02-02 06:18:20 EST
New Gerrit change created: https://git.eclipse.org/r/116588
Comment 2 Eclipse Genie CLA 2018-02-07 06:40:38 EST
New Gerrit change created: https://git.eclipse.org/r/116843
Comment 3 Andrey Loskutov CLA 2018-02-08 11:04:09 EST
Can you please add some note about how the projects dependencies (or relationship) are calculated? I'm curious because this is very language specific.
Comment 4 Mickael Istria CLA 2018-02-08 11:52:53 EST
The references are computed in the same way of current BuildOrder is computed: it takes the reference from project.internalGetReferencedBuildConfigs(). The point is just that instead of using a build order (sequential array) the build methods would directly use the Graph of depedencies, that contain more necessary informations to sort out what can be built in parallel.
This commit shows the code that replace usage of the graph by usage of an array.
Comment 5 Andrey Loskutov CLA 2018-02-09 11:44:54 EST
Christian, can you please take a look on patches (from the XText point of view)?
Comment 6 Christian Dietrich CLA 2018-02-09 11:55:49 EST
i am not sure about this cc @karsten @sebastian
Comment 7 Christian Dietrich CLA 2018-02-09 11:57:22 EST
i have doubts that xtext can handle this (too many singletons)
Comment 8 Christian Dietrich CLA 2018-02-09 11:58:31 EST
created https://github.com/eclipse/xtext-eclipse/issues/558
Comment 9 Mickael Istria CLA 2018-02-09 12:09:00 EST
2 changes implement the 2 sides of graph-based scheduling:
* https://git.eclipse.org/r/116588 is about producing the graph and making it accessible
* https://git.eclipse.org/r/116843 is about consuming the graph by navigating independent vertexes/buildconfigurations it in parallel (when possible) and starting a (build) Job for each node we visit. It's mostly about moving tokens on a graph. The Jobs share the same JobGroup so they're throttled.

None of those change should modify current behavior per so, it just makes things available for...
https://git.eclipse.org/r/#/c/116843/ (which should be rebased on top of the 2 former ones) which pass the Graph from Workpsace to BuildManager using the new methods introduced, and makes the Workspace(Description) control the enablement and the throttling.


https://git.eclipse.org/r/116843
================================
wouldn't affect behavior at all on its own. It only makes a new method available in BuildManager to deal with parallel builds but nothing calls it. It's 100% safe to merge it as it.

https://git.eclipse.org/r/116588
================================
introduce a code change which makes Workspace deal with graph as 1st class citizen and ability to remove vertexes from the graph (ie exclude project/buildconfiguration) to create a diminished view of dependency graph, which should keep transitional dependencies.
The expectation, which I admit I'm not sure I mathematically validated, is that the build order generated of the trimmed graph is an acceptable build order compared to the current one (which is the trimmed build order of the full graph); I'm not sure whether this is true, but my "feeling" of graphs (yeah feeling isn't a mathematical concept) is that the result of build order from a trimmed graph is even more likely to be a better one that a trimmed build order from a full graph.
The patch keeps all the (good) test suite happy.
It IMO can be merged relatively safely, but we just need to be aware of that change so we can easily relate to it if we notice something strange in dependency order later.

https://git.eclipse.org/r/#/c/116843/
=====================================
Is the integration. Nothing crazy in it, but it just binds the two sides and introduce settings to make turn parallel builds on/off. The setting is important as I imagine we'd like to be defensive on such change (ie mark it experimental, turn it off by default) as we may have some builders which do wrong things and break when running in parallel.

Missing
=======
Adoption of parallel builds in AutoBuildJob (which happens to reimplement some build logic instead of calling workspace.build(...)
Comment 10 Mickael Istria CLA 2018-02-09 12:16:48 EST
(In reply to Christian Dietrich from comment #7)
> i have doubts that xtext can handle this (too many singletons)

Note that the proposal is (or at least should be) respectful of ISchedulingRule for IncrementProjectBuilder. So if some builder knows it cannot build in parallel with some other, it's still possible (and already recommended) for this build to set up the proper scheduling rule.
In such case, builders with conflicting rules wouldn't build in parallel and the graph processing should continue as best according to the number of remaining threads.
Comment 11 Mickael Istria CLA 2018-02-09 12:17:34 EST
(In reply to Mickael Istria from comment #10)
> Note that the proposal is (or at least should be) respectful of
> ISchedulingRule for IncrementProjectBuilder. So if some builder knows it
> cannot build in parallel with some other, it's still possible (and already
> recommended) for this build to set up the proper scheduling rule.
> In such case, builders with conflicting rules wouldn't build in parallel and
> the graph processing should continue as best according to the number of
> remaining threads.

While I'm writing this, I realize that this scenario does really require some automated tests and that the patch should be accepted without tests for this scenario.
Comment 12 Christian Dietrich CLA 2018-02-09 12:31:18 EST
does that means builders have to avtively opt out from parallel build? that would at least break old code tha was written agains oxygen or older wont it?
Comment 13 Mickael Istria CLA 2018-02-09 13:10:02 EST
(In reply to Christian Dietrich from comment #12)
> does that means builders have to avtively opt out from parallel build?

(Un?)Fortunately not. Builder that do not specify a scheduling rule all use the same workspace scheduling rule:
http://git.eclipse.org/c/platform/eclipse.platform.resources.git/tree/bundles/org.eclipse.core.resources/src/org/eclipse/core/resources/IncrementalProjectBuilder.java#n453 so they conflict one with the other, and only one can be run at once.
Builders that do have specified a scheduling rule may actually be more error-prone that others if they specified a scheduling rule which is incorrect or insufficient (ie not really taking into account the reasons why not parallelizing the builder with some other Job).

> that
> would at least break old code tha was written agains oxygen or older wont it?

For the initial question, I think it's fine (see above).
That said, I think the parallel build itself would be opt-in for some time, and it would be possible for any plugin to disable parallel builds if they really need to, using the WorkspaceDescription API. So even if it has pitfalls, it can be fully avoided.
In the very worst case scenario, parallel build is implemented and never almost used because we notice too many builders fail. And we can then start fixing the builders ;)
Comment 15 Eclipse Genie CLA 2018-02-12 14:28:41 EST
New Gerrit change created: https://git.eclipse.org/r/117209
Comment 17 Dani Megert CLA 2018-02-15 09:09:30 EST
Using the build which contains this change (I20180213-0125) building a project in my big workspace first takes forever (well over 5 minutes). During that time the workspace is blocked and the Progress view says "Build Project"/"Building Project..." but never starts to build as it cycles in computing the build order:

"Worker-14: Build Project" #59 prio=5 os_prio=0 tid=0x0000000053fad000 nid=0x1144 runnable [0x000000005f6fe000]
   java.lang.Thread.State: RUNNABLE
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
        at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1548)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.eclipse.core.internal.resources.ComputeProjectOrder.buildFilteredDigraph(ComputeProjectOrder.java:758)
        at org.eclipse.core.internal.resources.Workspace.computeProjectBuildConfigOrderGraph(Workspace.java:952)
        at org.eclipse.core.internal.resources.Workspace.buildInternal(Workspace.java:483)
        at org.eclipse.core.internal.resources.Workspace.build(Workspace.java:408)
        at org.eclipse.ui.actions.BuildAction$1.runInWorkspace(BuildAction.java:288)
        at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:39)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:60)
        



then fails with OOMEE (including a dialog in the IDE):

!ENTRY org.eclipse.core.jobs 4 2 2018-02-15 11:57:19.855
!MESSAGE An internal error occurred during: "Build Project".
!STACK 0
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.eclipse.core.internal.resources.ComputeProjectOrder.lambda$3(ComputeProjectOrder.java:761)
	at org.eclipse.core.internal.resources.ComputeProjectOrder$$Lambda$492/271358561.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.eclipse.core.internal.resources.ComputeProjectOrder.lambda$2(ComputeProjectOrder.java:761)
	at org.eclipse.core.internal.resources.ComputeProjectOrder$$Lambda$491/1047828385.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.eclipse.core.internal.resources.ComputeProjectOrder.buildFilteredDigraph(ComputeProjectOrder.java:761)
	at org.eclipse.core.internal.resources.Workspace.computeProjectBuildConfigOrderGraph(Workspace.java:952)
	at org.eclipse.core.internal.resources.Workspace.buildInternal(Workspace.java:483)
	at org.eclipse.core.internal.resources.Workspace.build(Workspace.java:408)
	at org.eclipse.ui.actions.BuildAction$1.runInWorkspace(BuildAction.java:288)
	at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:39)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:60)

!ENTRY org.eclipse.core.jobs 4 2 2018-02-15 12:01:37.117
!MESSAGE An internal error occurred during: "Build Project".
!STACK 0
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.HashMap.newNode(HashMap.java:1742)
	at java.util.HashMap.putVal(HashMap.java:630)
	at java.util.HashMap.put(HashMap.java:611)
	at java.util.HashSet.add(HashSet.java:219)
	at org.eclipse.core.internal.resources.ComputeProjectOrder.lambda$3(ComputeProjectOrder.java:761)
	at org.eclipse.core.internal.resources.ComputeProjectOrder$$Lambda$492/271358561.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.eclipse.core.internal.resources.ComputeProjectOrder.lambda$2(ComputeProjectOrder.java:761)
	at org.eclipse.core.internal.resources.ComputeProjectOrder$$Lambda$491/1047828385.accept(Unknown Source)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at org.eclipse.core.internal.resources.ComputeProjectOrder.buildFilteredDigraph(ComputeProjectOrder.java:761)
	at org.eclipse.core.internal.resources.Workspace.computeProjectBuildConfigOrderGraph(Workspace.java:952)
	at org.eclipse.core.internal.resources.Workspace.buildInternal(Workspace.java:483)
	at org.eclipse.core.internal.resources.Workspace.build(Workspace.java:408)
	at org.eclipse.ui.actions.BuildAction$1.runInWorkspace(BuildAction.java:288)
	at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:39)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:60)


Unfortunately I can't reproduce in a small workspace and my big one is really big (320 projects).

Also, adding a new Java project with no dependencies suffers from the same issue!

Please take a look today. Otherwise I will revert the change for the next I-build.
Comment 18 Mickael Istria CLA 2018-02-15 09:17:45 EST
I'm looking at it right now.
Comment 19 Mickael Istria CLA 2018-02-15 11:06:26 EST
@Dani: I'm having hard time to reproduce it, I'm still trying. In the meantime, I have a few questions/comments:

> as it cycles in computing the build order:

Did you have the opportunity to see where it does cycle: is it always re-computing the build order, or is it that the (single) computation of build order is looping on itself?

> Unfortunately I can't reproduce in a small workspace and my big one is really big (320 projects).

Theorically, 320 projects creates a graph of 320 vertexes and in worst case, even assuming there are cycles 320*319=102080 edges. All that should take a couple of MB max cumulated.
However, the error seems to be about GC being too much requested. I hope I can reproduce it to figure out why so many objects are created in this case while I don't think the code should be exploding here...

Do you see multiple threads running the ComputeBuildOrder.buildFilteredGraph in parallel? That's not much expected and maybe this could be the cause.
Comment 20 Andrey Loskutov CLA 2018-02-15 11:09:42 EST
(In reply to Mickael Istria from comment #19)
> Theorically, 320 projects creates a graph of 320 vertexes and in worst case,
> even assuming there are cycles 320*319=102080 edges. All that should take a
> couple of MB max cumulated.
> However, the error seems to be about GC being too much requested.

Right. Dani, how big is your max heap size for Eclipse JVM? 1024 MB? Can you try to double that, just to verify that there is no endless loops or something like this?
Comment 21 Mickael Istria CLA 2018-02-15 11:22:58 EST
@Andrey: do you think that replacing streams by a good old for(...) iteration would help there?
Comment 22 Mickael Istria CLA 2018-02-15 11:34:31 EST
@Dani: Also, I imagine this same use-case (same 320 projects) prior that patch but with same memory settings do work well and doesn't block when computing project order?
Comment 23 Dani Megert CLA 2018-02-15 11:38:09 EST
(In reply to Mickael Istria from comment #22)
> @Dani: Also, I imagine this same use-case (same 320 projects) prior that
> patch but with same memory settings do work well and doesn't block when
> computing project order?

Yes, going back to I20180208-2000 fixes the issue.

I run with -Xmx2048M.
Comment 24 Mickael Istria CLA 2018-02-15 11:51:16 EST
Unfortunately, I cannot reproduce it:
1. I picked  I20180214-2000
2. Cloned the aggregator repo (clean build)
3. Disabled build automatically
4. Imported with File > Import > General > Existing Projects into Workspace
  as a result, I get 624 projects in my workspace
all that went smoothly, nothing seemed stuck neither in UI nor in Progress View, the whole operation took about a minute of "Refresh Workspace"
5. Re-enabled Build Automatically
  Build started, without blocking on computing build order, and completed in a reasonable time, with constant progress (I could see the ant script running).

@Dani: Do you think you could send me a snapshot of your workspace you're using for testing?
Comment 25 Dani Megert CLA 2018-02-15 12:06:54 EST
(In reply to Mickael Istria from comment #19)
> @Dani: I'm having hard time to reproduce it, I'm still trying. In the
> meantime, I have a few questions/comments:
> 
> > as it cycles in computing the build order:
> 
> Did you have the opportunity to see where it does cycle: is it always
> re-computing the build order, or is it that the (single) computation of
> build order is looping on itself?

ComputeProjectOrder#buildFilteredDigraph is only called once and then it hangs in the following loop:

for (Vertex<T> vertexToRemove : toRemove) {
...
}


> Do you see multiple threads running the ComputeBuildOrder.buildFilteredGraph
> in parallel? That's not much expected and maybe this could be the cause.

It's just one thread.


NOTE: The problem only happens if auto-build is disabled. If I clean a project when auto-build is enabled, it works fine.
Comment 26 Dani Megert CLA 2018-02-15 12:07:22 EST
(In reply to Mickael Istria from comment #24)
> @Dani: Do you think you could send me a snapshot of your workspace you're
> using for testing?

No.
Comment 27 Dani Megert CLA 2018-02-15 12:08:49 EST
(In reply to Mickael Istria from comment #24)
> Unfortunately, I cannot reproduce it:
> 1. I picked  I20180214-2000
> 2. Cloned the aggregator repo (clean build)
> 3. Disabled build automatically
> 4. Imported with File > Import > General > Existing Projects into Workspace
>   as a result, I get 624 projects in my workspace
> all that went smoothly, nothing seemed stuck neither in UI nor in Progress
> View, the whole operation took about a minute of "Refresh Workspace"
> 5. Re-enabled Build Automatically
>   Build started, without blocking on computing build order, and completed in
> a reasonable time, with constant progress (I could see the ant script
> running).

See my previous comment: it works when auto-build is on.
Comment 28 Mickael Istria CLA 2018-02-15 12:25:44 EST
Ok, thanks Dani. I managed to reproduce it by building only a single project.
Comment 29 Mickael Istria CLA 2018-02-15 12:51:20 EST
It seems like usage of streams and lambdas in that loop was triggering way too many hidden instantiations/GCs. I could reproduce the latency in debug mode, witness an OOM and a huge latency in an automated test, and then moved to a for loop and everything seemed better. Patch minus 10 seconds.
Comment 30 Eclipse Genie CLA 2018-02-15 12:52:39 EST
New Gerrit change created: https://git.eclipse.org/r/117460
Comment 31 Dani Megert CLA 2018-02-15 14:49:16 EST
(In reply to Eclipse Genie from comment #30)
> New Gerrit change created: https://git.eclipse.org/r/117460

Please commit this change or revert the other change.
Comment 33 Dani Megert CLA 2018-02-16 08:03:02 EST
(In reply to Eclipse Genie from comment #32)
> Gerrit change https://git.eclipse.org/r/117460 was merged to [master].
> Commit:
> http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/?id=432036ddba905037b083ffddab4788995dc3fe70
> 

This fixed the problem. Thanks.


Do you have numbers that show the performance improvement of the change?
Comment 34 Mickael Istria CLA 2018-02-16 08:27:16 EST
(In reply to Dani Megert from comment #33)
> This fixed the problem. Thanks.

Thanks for reporting. I'm a bit disappointed by lambdas and streams here... 
 
> Do you have numbers that show the performance improvement of the change?

The patch fixing the issue turns a several minute failing operation (because of OOM) into a successful 140ms one on my machine.
Relying on filtered dependency graph instead of vertex order introduce a few milliseconds penalty because it's a relatively expensive operation (which I think we can optimize more and more, at least to cover probable cases such as "only one project"). But it's a requirement for ability to process project builds in parallel.
If we look at the master change at https://git.eclipse.org/r/#/c/116717/ , the test case (which implement somehow the best cases: independent branches in graph, not conflicting builder rules) show that in best conditions, we simply divide the overall build time by the number of threads allocated to it.
I imagine in some case of mature builders, a lot of optimizations are already involved that would make the result less remarkable. But for some others which typically do a lot of work inside the builders themselves (not triggering job), then the result should be substantial. I'll soon give a try to some big Maven or PDE projects and report some real-world metrics.
Comment 35 Eclipse Genie CLA 2018-02-16 08:39:11 EST
New Gerrit change created: https://git.eclipse.org/r/117525
Comment 36 Mickael Istria CLA 2018-02-16 09:43:51 EST
(In reply to Mickael Istria from comment #34)
> I imagine in some case of mature builders, a lot of optimizations are
> already involved that would make the result less remarkable. But for some
> others which typically do a lot of work inside the builders themselves (not
> triggering job), then the result should be substantial. I'll soon give a try
> to some big Maven or PDE projects and report some real-world metrics.

At the moment, the proposed code has a guard to prevent parallel builds to happen when one of the involved builder has a "non relaxed" scheduling rule. The thing is that at least the following builder do use or include the Workspace Root as scheduling rule and thus prevent the parallel builds to be triggered:
* org.eclipse.jdt.internal.core.builder.JavaBuilder
* org.eclipse.pde.internal.core.builders.ManifestConsistencyChecker
* org.eclipse.pde.internal.core.builders.ExtensionPointSchemaBuilder
* org.eclipse.core.internal.events.BuildManager$MissingBuilder

I'll have to investigate a bit more to check whether we can run those builds in parallel anyway, and if not, whether we can run a subset of the graph in parallel, or even better, whether we can make builders use narrower scheduling rules directly. .
Comment 37 Karsten Thoms CLA 2018-02-19 04:04:37 EST
Created attachment 272728 [details]
Screenshot

Not sure if this is related, but I am experiencing problems building projects. In attached screenshot you see that plugin "org.eclipse.xtend.ide" is not build since its prerequisite project "org.eclipse.xtend.core" was not build. But this is not true, the project was build already successfully. A rebuild of all workspace project helped. Could the change cause situation like this?
Comment 38 Dani Megert CLA 2018-02-19 05:43:29 EST
(In reply to Karsten Thoms from comment #37)
> Created attachment 272728 [details]
> Screenshot

Which build are you using?
Comment 39 Mickael Istria CLA 2018-02-19 07:44:13 EST
(In reply to Karsten Thoms from comment #37)
> Could the change cause situation like this?

While it shouldn't cause such situation (it shouldn't introduce any regression),
it's definitely a good thing to suspect this change in case you spot a regression.
I'd like to reproduce it. Which build were you using and what were the steps you used to provision your workspace? Is it the first time you spot such issue?
Comment 40 Karsten Thoms CLA 2018-02-19 08:05:33 EST
This is my Xtext developer IDE which is currently having org.eclipse.core.resources 3.13.0.v20180215-1926

Reproduction will be hard. I don't think there is a clear path. I just did an update of the Git repos and expected a clean rebuild of changed projects. You could use Oomph to provision an Xtext developer IDE.
Comment 41 Mickael Istria CLA 2018-02-20 08:50:40 EST
(In reply to Dani Megert from comment #33)
> Do you have numbers that show the performance improvement of the change?

As mentioned earlier, parallel build is currently disabled by default (ie only one build at a time and regular behavior) so it doesn't change any performance by default. And when applying it on regular SDK, it seems like almost all builder use a non-relaxed scheduling rule, so that parallel build is also disabled.

I tried it with the Eclipse aCute builder which basically just invokes a `dotnet build` command. I took care of having the builder specifying a null scheduling rule.
Building whole workspace with 25 .NET projects without parallel build (current IDE, or proposed patch with max simultaneous builds == 1) takes 5minutes.
Setting max simultaneous builds to 4, and building this same workspace (after cleanup) takes 2minutes and 45 seconds.
Then I retried the same thing the other way round and got same results.
So in this case for instance, parallel build saves 2m15s, which is ~43% of time (and probably about as much of power). While we could have expected the perf to be improved by 75% of 4 threads, it's actually not the case here because running multiple simultaneous `dotnet build` commands make all of them a bit slower. This is a behavior that can be reproduced on CLI.

So at this point:
* It's safe to include it as it's disabled by default
* It's safe to include it because most legacy builder wouldn't be able to trigger parallel builds anyway (that's something to work on afterwards)
* It provides a significant performance improvements for some real use-cases.
As a consequence, I'll try to give another round of cleanup of the provided patch and plan to merge this by 4.8.M6. Reviews are highly welcome!
Comment 44 Mickael Istria CLA 2018-02-23 05:47:11 EST
With https://git.eclipse.org/r/#/c/116717/ merged, this one can be considered as resolved from Resources perspective. A user who wants parallel build can enable them by setting the max number of simultaneous build to a value > 1 in workspace description.
By default, value is 1 and nothing changes.
If value > 1, then parallel builds *may* happen if some scheduling conditions are met.