Bug 258968 - ValidationOperation straining Eclipse Jobs infrastructure
Summary: ValidationOperation straining Eclipse Jobs infrastructure
Status: CLOSED FIXED
Alias: None
Product: WTP Common Tools
Classification: WebTools
Component: wst.validation (show other bugs)
Version: 3.0.3   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.0.4   Edit
Assignee: Gary Karasiuk CLA
QA Contact: Chuck Bridgham CLA
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2008-12-16 11:43 EST by Randall Theobald CLA
Modified: 2009-01-14 08:22 EST (History)
4 users (show)

See Also:
rstheo: review?


Attachments
Proposed patch (3.61 KB, patch)
2008-12-16 11:43 EST, Randall Theobald CLA
karasiuk: iplog+
rstheo: review?
Details | Diff
patch with some minor changes (3.53 KB, patch)
2008-12-16 16:01 EST, Gary Karasiuk CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Randall Theobald CLA 2008-12-16 11:43:24 EST
Created attachment 120606 [details]
Proposed patch

plugin version: 1.2.3.v200811101716

I am a performance analyst for an adopting product. During analysis of our recent release (based on Eclipse 3.4.1), we observed a lot of stress on the Eclipse Jobs infrastructure locks during builds of a medium-sized workload. Analysis was quite difficult as results were intermittent, but I found a fix that brings down the contention considerably. While there is probably improvements to be made in the Eclipse Jobs infrastructure itself, clients should also be smart and not assume that using the infrastructure is free.

The class 

   org.eclipse.wst.validation.internal.operations.ValidationOperation

is very demanding on the Eclipse Jobs infrastructure. I found two issues.

(1) For my medium-sized workload, ValidationOperation ends up scheduling hundreds of ValidationOperation.ValidationLauncherJob instances early in the build cycle that have the WorkspaceRoot rule (meaning they are blocked until the build is done, and when each is run, all it does is to schedule the given ValidatorJob). Keeping all these in the Eclipse Jobs queues during the build is costly since locks always must be obtained to traverse those queues. A simple solution to this is to only use one ValidationLauncherJob that has a queue of jobs to schedule once he runs.

(2) For my medium-sized workload, at the end of the build, hundreds of ValidationLauncherJobs and ValidatorJobs are scheduled/run in parallel. Not only does this strain the Jobs infrastructure, but it seems to expose timing issues within the validators and/or models loaded during validation (see bugs 254923, 254934, 254924, 254920, 254916). A simple fix is to only schedule a certain number of ValidationJobs per processor at a time. This reduces lock contention and drastically reduces the chances of hitting the random errors mentioned in the bugs shown above.

Using the provided patch, not only did my average subsequent build time for my medium-sized workload improve by 7.4% (8.3 sec), but my measurements were MUCH more consistent, and I only encountered 1 of the errors mentioned above out of 63 builds, instead of 41 errors out of 63 builds without the patch.

The patch uses a property to determine the number of jobs per processor to schedule to run concurrently. I guessed at a default of 3, and ran measurements for various numbers. Values of 1-10 were fairly similar. A higher number seemed to lead to slightly higher CPU usage, but didn't necessarily finish the build any quicker (i.e. contention). Values of 100 and 9999 led to much slower average times with higher CPU usage.
Comment 1 Gary Karasiuk CLA 2008-12-16 16:01:09 EST
Created attachment 120639 [details]
patch with some minor changes
Comment 2 Gary Karasiuk CLA 2008-12-16 16:04:00 EST
I have released this to 3.0.4.

In testing with a large workspace (the shipping3 workspace), I saw my build time drop from 5.8 minutes to 3.6 minutes. Also without the patch the number of threads climbed to over 400, but with the patch it only went to 72.
Comment 3 Gary Karasiuk CLA 2008-12-16 16:12:15 EST
FYI - This patch was for the old validation framework, so it only effects the validators that have not ported to the new framework. The new framework already throttled back the number of validator jobs.

I am going to do some experimenting to see if I can use the same technique that was used in the patch to increase the number of jobs (by a couple) that the new framework uses. 
Comment 4 Randall Theobald CLA 2008-12-16 16:14:02 EST
Thanks Gary, your workload proved a better advocate than mine.
Comment 5 Gary Karasiuk CLA 2009-01-05 06:26:30 EST
Patch released to 3.1
Comment 6 Gary Karasiuk CLA 2009-01-14 05:22:31 EST
I am in the process of cleaning up the Validation Framework defects. Could you please close this Bugzilla. If I don't hear back within 7 days,
I will assume that everything is OK, and will close it.
Comment 7 Randall Theobald CLA 2009-01-14 08:22:18 EST
Closed.