Community
Participate
Working Groups
Created attachment 120606 [details] Proposed patch plugin version: 1.2.3.v200811101716 I am a performance analyst for an adopting product. During analysis of our recent release (based on Eclipse 3.4.1), we observed a lot of stress on the Eclipse Jobs infrastructure locks during builds of a medium-sized workload. Analysis was quite difficult as results were intermittent, but I found a fix that brings down the contention considerably. While there is probably improvements to be made in the Eclipse Jobs infrastructure itself, clients should also be smart and not assume that using the infrastructure is free. The class org.eclipse.wst.validation.internal.operations.ValidationOperation is very demanding on the Eclipse Jobs infrastructure. I found two issues. (1) For my medium-sized workload, ValidationOperation ends up scheduling hundreds of ValidationOperation.ValidationLauncherJob instances early in the build cycle that have the WorkspaceRoot rule (meaning they are blocked until the build is done, and when each is run, all it does is to schedule the given ValidatorJob). Keeping all these in the Eclipse Jobs queues during the build is costly since locks always must be obtained to traverse those queues. A simple solution to this is to only use one ValidationLauncherJob that has a queue of jobs to schedule once he runs. (2) For my medium-sized workload, at the end of the build, hundreds of ValidationLauncherJobs and ValidatorJobs are scheduled/run in parallel. Not only does this strain the Jobs infrastructure, but it seems to expose timing issues within the validators and/or models loaded during validation (see bugs 254923, 254934, 254924, 254920, 254916). A simple fix is to only schedule a certain number of ValidationJobs per processor at a time. This reduces lock contention and drastically reduces the chances of hitting the random errors mentioned in the bugs shown above. Using the provided patch, not only did my average subsequent build time for my medium-sized workload improve by 7.4% (8.3 sec), but my measurements were MUCH more consistent, and I only encountered 1 of the errors mentioned above out of 63 builds, instead of 41 errors out of 63 builds without the patch. The patch uses a property to determine the number of jobs per processor to schedule to run concurrently. I guessed at a default of 3, and ran measurements for various numbers. Values of 1-10 were fairly similar. A higher number seemed to lead to slightly higher CPU usage, but didn't necessarily finish the build any quicker (i.e. contention). Values of 100 and 9999 led to much slower average times with higher CPU usage.
Created attachment 120639 [details] patch with some minor changes
I have released this to 3.0.4. In testing with a large workspace (the shipping3 workspace), I saw my build time drop from 5.8 minutes to 3.6 minutes. Also without the patch the number of threads climbed to over 400, but with the patch it only went to 72.
FYI - This patch was for the old validation framework, so it only effects the validators that have not ported to the new framework. The new framework already throttled back the number of validator jobs. I am going to do some experimenting to see if I can use the same technique that was used in the patch to increase the number of jobs (by a couple) that the new framework uses.
Thanks Gary, your workload proved a better advocate than mine.
Patch released to 3.1
I am in the process of cleaning up the Validation Framework defects. Could you please close this Bugzilla. If I don't hear back within 7 days, I will assume that everything is OK, and will close it.
Closed.