Bug 126121 - Parallelize Java build process
Summary: Parallelize Java build process
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 4.3   Edit
Hardware: All All
: P3 enhancement with 50 votes (vote)
Target Milestone: ---   Edit
Assignee: Platform-Resources-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks: 151053
  Show dependency tree
 
Reported: 2006-02-01 19:34 EST by Jörg von Frantzius CLA
Modified: 2020-06-04 11:15 EDT (History)
48 users (show)

See Also:


Attachments
Stub to parallelize project build (19.71 KB, patch)
2011-04-29 17:47 EDT, Jens Kübler CLA
no flags Details | Diff
Patch core resources against 3.7 HEAD (27.77 KB, patch)
2011-07-04 04:00 EDT, Jens Kuebler CLA
no flags Details | Diff
Modified parallel build patch against Eclipse 4.x HEAD (48.53 KB, patch)
2013-07-08 11:33 EDT, Szabolcs Pota CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg von Frantzius CLA 2006-02-01 19:34:19 EST
Not being entirely sure whether that is possible at all or not, I still thought it would be very nice if the build process would make use of available processors/cores for speeding up the process.

With the advent of dual core processors on the desktop and even multi-core processors in the years to come, it seems sensible to parallelize in general processing intensive computations where possible. It occurs to me that for Eclipse, the build process would be the first natural candidate.

Or is the build process already parallelized? It doesn't seem so to me.
Comment 1 John Arthorne CLA 2006-02-02 09:42:48 EST
Build already exploits multiple processors by running in a background thread, allowing the user to continue working while the build is running.  Multiple builders do not currently run simultaneously, but that would be a significant change for the builder implementations to adapt to. Currently builders can rely on having exclusive access to workspace resources while they are running.  However, I'll leave this enhancement open, since it would be possible to extend the builder framework to allow for parallel builds.  The Web Tools project has a category of read-only builders called validators, and they are in the process of making similar changes to allow validators to run concurrently:

http://www.eclipse.org/webtools/jst/components/j2ee/proposals/ValidatorFramework.xml 
Comment 2 Jörg von Frantzius CLA 2006-02-02 11:45:21 EST
Then what I actually mean seems to be that the *Java Builder* could be parallelized itself. Do you think that's possible?

I thought it possible to identify compilation units that are independent of each other, so they could be compiled in parallel. However, I'd guess that's not trivial at all. There probably is some kind of dependency tree, the branches of which could be worked on in parallel? Just my naive thinking maybe.
Comment 3 Andrey Loskutov CLA 2006-02-03 09:35:55 EST
According to the comment 2, I would propose to move this bug to the JDT Core component, cause this seems to be more JDT internal RFC, or to create an another one RFC for Java builder/compiler only.

@Oliver, Philippe - do you think it would be possible to parallelize Java Builder/Compiler task? It should be possible to identify classes or packages, which are not dependend on each other and could be compiled "out of the order".

Although the file I/O is probably the main performance bottle neck, even on multi-processor maschine, I think that this could reduce build times for bigger workspaces.
Comment 4 John Arthorne CLA 2006-02-03 09:47:54 EST
Moving back to JDT as requested.  Feel free to enter a new RFE for making the generic build process parallel (although as I mentioned this is not likely to happen without a complete overhaul of the build infrastructure).
Comment 5 Philipe Mulet CLA 2006-02-03 10:05:16 EST
I was thinking unrelated projects could be built in parallel, which would give some degree of parallelization (and this is why I moved it to platform as a general build issue). Now within the Java builder, it would mean splitting the build in smaller pieces. Unclear how we could efficiently detect the independance in between the pieces, and reuse our typesystem in concurrent compile.
Comment 6 Jörg von Frantzius CLA 2006-02-03 17:59:40 EST
When working on a project, the Java builder needs to perform some kind of topological sorting of compile targets, I guess? I have no clue how it works, though...
Comment 7 Stephen Kurlow CLA 2006-06-09 17:04:02 EDT
This is a desperately needed feature now that multi-core CPUs are becoming more common...also with HT available too. Please make Eclipse able to fully utilise multiple CPUs to do builds when lots of source files are available for compilation. I have RAID0 for my disk subsystem and the CPUs are flat at 50% utilisation not able to push my disk subsystem anywhere near full utilisation! I want to halve my build process time!

Stephen
Comment 8 Axel Rauschmayer CLA 2006-06-16 01:59:36 EDT
Distributed builds would be nice as well. I know that XCode does it:

    http://www.apple.com/macosx/features/xcode/

It really makes sense to tap into all this unused computer power that is usually available within a network.
Comment 9 Andrey Loskutov CLA 2006-11-18 03:39:10 EST
FYI: CDT 4 has already this kind of builder: see bug 156872.
(On 4 processor system the compile time dropped to 36.5% !!!)

I'm wondering how much effort is required to port it or create similar fix in JDT?
Comment 10 Alex Blewitt CLA 2006-11-18 05:23:33 EST
As an aside, is there any reason why multiple projects couldn't be built at once, even without the JDT knowing about it? For example, if I have Project A->B and Project C->B, then Project B needs to be built first (obviously, and that happens at the moment) but is there any problem with building A and C concurrently?
Comment 11 Jörg von Frantzius CLA 2006-11-20 05:08:02 EST
(In reply to comment #3)
> Although the file I/O is probably the main performance bottle neck, even on
> multi-processor maschine, I think that this could reduce build times for bigger
> workspaces.

By chance right now I have to deal with a project generated from a complex XSD that has >1000 classes in it, i.e. 1000 classes in one package (and another 1000 each in two sub-packages). When I look a the CPU utilization, it is clearly only one processor being used, and building takes very long. So there definitely are cases where file I/O is not the bottleneck.
Comment 12 Philipe Mulet CLA 2006-11-20 05:58:59 EST
Also see bug 151053 for similar request on pure compiler front. 
Note: the builder encapsulates compiler tasks.

The JDT compiler accumulates cached information along the compilation process, and  dividing compilation in smaller chunks will reduce cache reuse, and could worsen performance quite a bit (need to be balanced by gains from paralleling).

I was thinking an easy goal would be to delegate classfile writing to a different thread, so the compiler would spend less time waiting for some of the I/O. However, this wouldn't address all issues.

Ideally, the compiler should be able to consume its workqueue in parallel, but this would require making its internals thread-safe, which is substantial work, and could deteriorate performance of single processor usecase.
Comment 13 Stephen Kurlow CLA 2006-11-20 09:27:14 EST
Hello Philippe,

As discussed offline I am interested in being involved in attempting to resolve this issue. I have never been involved in developing a feature of Eclipse in the past. Where do you suggest I begin?
Comment 14 Laurent Mirguet CLA 2007-06-14 04:48:11 EDT
This bug is currently assigned for the JDT because indeed the Java build is very impacting for many users. But I also think that there should be some framework improvements at the platform level to better support parallel builds. This would be of great help for all plugins contributors.

Should we also open an issue in the platform module ?
Comment 15 Frederic Fusier CLA 2008-04-04 04:36:31 EDT
*** Bug 142126 has been marked as a duplicate of this bug. ***
Comment 16 Paul Webster CLA 2008-04-04 07:52:44 EDT
*** Bug 225646 has been marked as a duplicate of this bug. ***
Comment 17 Philipe Mulet CLA 2008-04-14 12:21:05 EDT
Marking this as a 4.0 item, since it revolves around the idea to change the build manager to become multi-threaded.

For 3.4, we will provide multi-threaded inside the Java compiler, so it means each project will build faster, but projects will still be iterated in sequence, and classfiles will be written in auto-build thread (to preserve backward compatibility with 3.x).
Comment 18 Jerome Lanneluc CLA 2008-04-15 07:37:55 EDT
Lowering priority since this is now targeted at 4.0
Comment 19 Chris Lee CLA 2009-08-11 16:21:31 EDT
I'd like to see this sort of thing as well - our workspaces tend to have on the range of 100 different java projects open at once, and changing something in one of the root projects causes compiling to take a very long time.

There's a product called Incredibuild that can be used with Visual Studio to distribute the compilation among multiple machines in a network - there should be an Eclipse plugin or something to support similar behaviour.
Comment 20 Sebastian Dietrich CLA 2010-11-01 12:33:33 EDT
Any news on this topic?
Comment 21 Olivier Thomann CLA 2010-11-01 12:38:11 EDT
No. There is no work done in this area at the moment. Do you want to participate ?
Comment 22 Jens Kübler CLA 2011-04-29 17:47:34 EDT
Created attachment 194413 [details]
Stub to parallelize project build

Here is a patch against 3.6.1 that does handle proper parallel scheduling of builds for projects. It should be merely considered as starting point as it requires java 1.5 and does not always complete properly due to deadlocks in scheduling rules that I have not understood yet. Maybe someone could shed some spot lights on what needs to be done in this area.
Comment 23 James Blackburn CLA 2011-04-30 04:33:07 EDT
(In reply to comment #22)
> Created attachment 194413 [details]
> Stub to parallelize project build

Interesting patch.  I think this patch belongs against Platform / Workspace - as there's nothing JDT specific here.  This bug description is more about parallelizing the Java builder specifically.

I think this approach can work. However there will be a little bit of work required to ensure that all existing builders continue to work correctly and safely.  The Build infrastructure has changed a bit in 3.7.  If you could start your patch from current HEAD it would be easier to comment and review.
Comment 24 Jens Kuebler CLA 2011-07-04 04:00:05 EDT
Created attachment 199030 [details]
Patch core resources against 3.7 HEAD

Does not handle Build Configurations and needs Java 1.5 compliance
Comment 25 Jens Kuebler CLA 2011-07-04 04:20:56 EDT
I updated the previous patch with no major improvements to compile against 3.7 HEAD. The build configurations api introduced in 3.7 is currently not considered.
I wrote myself another plugin that owns a single action that calls buildParallel to test this patch.

When compiling a large workspace (800+ plugins), the compile starts off in a parallel, compiles an undefined amount of projects and then deadlocks as one of the threads does not seem to release the workspace lock that the other threads require in order to proceed. Currently I think an exception is thrown somewhere so the release lock is never called in this thread but some further investigation is required in order to sort out the problem.

Furthermore I'm not quite sure if the workspace lock the threads compete for will become a bottleneck. 
However when looking closely at our use case with 800+ rather small plugin projects that need to be compiled, I doubt that further parallelizing the >>JDT compiler<< (not the builder API !) would yield any performance improvements as compiling is executed so fast that resource allocation probably becomes the bottleneck.
Comment 26 Szabolcs Pota CLA 2013-07-08 11:33:47 EDT
Created attachment 233219 [details]
Modified parallel build patch against Eclipse 4.x HEAD
Comment 27 Szabolcs Pota CLA 2013-07-08 11:37:52 EDT
> Created attachment 233219 [details]
> Modified parallel build patch against Eclipse 4.x HEAD

Hi,

Motivated by relatively slow Scala compile times  we took Jens's patch submitted here and wanted to activate the parallel builder with it. Though the patch served as a really good basis we have found that a number of concurrency issues were not completely worked out in BuildManager so it often just made Eclipse hang. For this reason I made a number of modifications on the original patch so it now reliably works under Eclipse 3.7+. I have also added a new IncrementalProjectBuilder.PARALLEL_CLEAN_BUILD type to support parallel clean and build of selected projects.

The modified patch has also been ported to Eclipse 4.x HEAD (this is attached).

We are using this patch actively for a few months now on large Scala projects where it gives us significant performance improvement. On a 6 core machine (with 5 worker tasks) I managed to reduce the build time (depending on the level of parallelism in the dependency graph) with more than 50%. Note that I also needed to submit a ScalaIDE patch so that it can support this parallel builder.

So that other Eclipse users could benefit from using the parallel builder, Morgan Stanley would like to feed this back to the Eclipse community. We hope that it can soon appear in official Eclipse releases as maintaining this patch ourselves is quite cumbersome. All the changes are backwards compatible. 

There are still some issues that need to be worked out though:

 1) the patch is still just an API change on the internal builder framework. In its current format one needs a custom plugin to invoke it, usually with code like this:

---
IProject[] projects = ...
IBuildConfiguration[] buildConfigs = new BuildConfiguration[projects.length];
for (int i = 0; i < buildConfigs.length; i++) {
  buildConfigs[i] = new BuildConfiguration(projects[i]);
}
ResourcesPlugin.getWorkspace().build(buildConfigs, IncrementalProjectBuilder.PARALLEL_FULL_BUILD, true, monitor);
---

 Some UI components could be added to JDT to support this out of the box. E.g. a check box on the Clean Projects dialog when one cleans all projects, or  a new pop up item when one right clicks on a Working Set in the Project Explorer.

 2) Parallel build only works on full builds. It would be nice to make it work on incremental builds too.

 3) In this patch I have implemented a simple progress bar that displays aggregate information about the projects that are being built in parallel. It displays something like: "Building projects in parallel: [projectA, projectB, projectF]". The project names in the brackets are changing as worker threads start/finish project compile. The progress bar shows the overall process. Unfortunately with this individual files being built are not shown any more.

This is something that should be changed to, e.g. a multi progress bar. The above one could be the parent and child progress bars should show individual project build progress.  Just an idea of course, not sure how parallel build could be displayed best.

Could we get some response if this is something that could be included in future Eclipse releases?

Regards,

Szabolcs

--

THE FOLLOWING DISCLAIMER APPLIES TO ALL SOFTWARE CODE AND OTHER MATERIALS CONTRIBUTED IN CONNECTION WITH THIS PROGRAM: 
THIS SOFTWARE IS LICENSED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OF NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THIS SOFTWARE MAY BE REDISTRIBUTED TO OTHERS ONLY BY EFFECTIVELY USING THIS OR ANOTHER EQUIVALENT DISCLAIMER IN ADDITION TO ANY OTHER REQUIRED LICENSE TERMS.
 
Copyright © 2013 Morgan Stanley
All rights reserved.
 
THIS PROGRAM IS SUBJECT TO THE TERMS OF THE ECLIPSE PUBLIC LICENSE – v 1.0, A COPY OF WHICH IS AVAILABLE FROM THE ECLIPSE FOUNDATION.
Comment 28 Jay Arthanareeswaran CLA 2013-07-09 01:56:43 EDT
I have couple of questions:

1. I haven't looked at the patch well enough, but can tell you us what happens in case of dependencies between files being compiled across threads?

2. The patch doesn't have anything related to jdt/core. Perhaps we should move this to platform?
Comment 29 Dani Megert CLA 2013-07-09 02:42:36 EDT
(In reply to comment #28)
> I have couple of questions:
> 
> 1. I haven't looked at the patch well enough, but can tell you us what
> happens in case of dependencies between files being compiled across threads?
> 
> 2. The patch doesn't have anything related to jdt/core. Perhaps we should
> move this to platform?

Note that the patch needs to be submitted under the EPL, not some other/additional license. You need to have a CLA and write in this bug report that you contribute under the CLA. For more details see
http://wiki.eclipse.org/Development_Resources/Handling_Git_Contributions#Bugzilla
Comment 30 Szabolcs Pota CLA 2013-07-09 05:13:19 EDT
> Note that the patch needs to be submitted under the EPL, not some
> other/additional license. You need to have a CLA and write in this bug
> report that you contribute under the CLA. For more details see
> http://wiki.eclipse.org/Development_Resources/
> Handling_Git_Contributions#Bugzilla

I have the approval from Legal to submit this patch under EPL. This is indicated in the header of each file. Our disclaimer just says that the patch is provided "as is" without any warranties and otherwise subject to terms of EPL. I have also signed the CLA as suggested above.

Are you ok with this?
Comment 31 Dani Megert CLA 2013-07-09 05:18:03 EDT
(In reply to comment #30)
> > Note that the patch needs to be submitted under the EPL, not some
> > other/additional license. You need to have a CLA and write in this bug
> > report that you contribute under the CLA. For more details see
> > http://wiki.eclipse.org/Development_Resources/
> > Handling_Git_Contributions#Bugzilla
> 
> I have the approval from Legal to submit this patch under EPL. This is
> indicated in the header of each file. Our disclaimer just says that the
> patch is provided "as is" without any warranties and otherwise subject to
> terms of EPL. I have also signed the CLA as suggested above.
> 
> Are you ok with this?

Yup.
Comment 32 Szabolcs Pota CLA 2013-07-09 05:30:05 EDT
> I have couple of questions:
> 
> 1. I haven't looked at the patch well enough, but can tell you us what
> happens in case of dependencies between files being compiled across threads?

It is not clear to me what do you mean under "dependencies between files". Could you clarify that?

First the patch builds up a dependency graph with the projects being the graph nodes. It starts from root projects that have no other dependencies (parents). Then the graph is traversed by removing compiled projects one by one. A project (node) becomes eligible for building when all of its dependencies (parent nodes) were built and eliminated from the graph. Eligible nodes are continuously enqueued to a global queue that is served by a pool of worker threads. The process finishes when the graph has no more nodes and the queue is empty.

Each worker thread invokes the Builder of the actual project node. It is essential that the Builder must not lock the entire workspace but the project only. This was the patch I need to submit to ScalaIDE because the Scala builder locked the complete workspace all the time. 

> 
> 2. The patch doesn't have anything related to jdt/core. Perhaps we should
> move this to platform?

I have submitted the modified patch here as the original patch was submitted here too. The patch only affects the eclipse.platform.resources plugin so you could be right that probably platform is a better place.
Comment 33 Jay Arthanareeswaran CLA 2013-07-09 05:43:29 EDT
(In reply to comment #32)
> It is not clear to me what do you mean under "dependencies between files".
> Could you clarify that?

Actually, never mind. I must admit I wasn't sure what to expect from the patch as I started with the assumption that the fix would address the JDT/Core compiler itself. Thanks for the explanation, though!

> I have submitted the modified patch here as the original patch was submitted
> here too. The patch only affects the eclipse.platform.resources plugin so
> you could be right that probably platform is a better place.

Moving to the correct component, so the rightful owners can look at the patch.
Comment 34 Stephan Herrmann CLA 2013-07-09 05:57:50 EDT
(In reply to comment #32)
> First the patch builds up a dependency graph with the projects being the
> graph nodes.

What about folder-level dependencies: Project A uses an output folder of
project B as a library. While project A won't trigger building those shared
classes, could there still be a race condition that goes under the radar
of your dependency graph?
Comment 35 Jens Kuebler CLA 2013-07-09 07:40:31 EDT
Could you indicate how folder dependencies may be formulated from the build configuration? I suppose you mean that there is an implicit dependency in which case the single threaded build can not resolve it either or am I missing something?

btw: Thanks for taking the patch further down the road.
Comment 36 Piotr Tomiak CLA 2014-01-22 05:11:29 EST
I've just stumbled upon this bug and thought that there is no parallelization of the java build process. There is actually a bug #142126 , which is fixed for a long time. I think the title of this bug should be changed to parallelize Eclipse build process as the current title might be a bit confusing.
Comment 37 Markus Duft CLA 2014-12-22 11:07:47 EST
We'd have a similar use case. We're using the IProject.build() API directly to drive our build using custom build orders, generators, etc. (don't ask ;)). Today I tried to get some more speed out of the build by trying to call .build() on multiple projects in parallel. While the actual algorithm to find out what goes together and what not is in place (based on Require-Bundle), the building in parallel brings no benefit at all (with current Eclipse 4.4.1).

I found this bug and, after reading it, profiled my Eclipse (using Java Mission Control). It seems that it hangs A LOT in synchronization - acquiring the workspace root scheduling rule.

Is there any chance to get the java builder to not lock the workspace? It seems that would be the only blocker for me right now. I (naively) tried to just change the IncrementalProjectBuilder.getRule method to return getProject() always, but that broke my build completely - lol. So it is obviously not that easy. Any hints?