Community
Participate
Working Groups
Build: Eclipse 3.2.1, WTP 1.5.1. Background: Auto-build is a "polite" job. If it sees that another job with a resource scheduling rule is waiting to run, it aborts the build process to allow that job to run. The reasoning is that there is no point building if there is a job waiting to modify the workspace, which may cause the build effort to be wasted. In WTP 1.5, the WST validation builder was changed to fork background jobs for certain types of validators (bug 91563). Now the fun begins. Here is the process flow I have been observing while debugging an unrelated problem: 1) A file in the workspace is modified by the user 2) Autobuild starts running 3) The validation builder forks a background job 4) Autobuild notices that a job is waiting, and aborts the autobuild 5) The validation job runs 6) Autobuild starts again Thus the validation job and the autobuild job are thrashing against each other. Even worse, if any thread modifies the workspace between 4) and 6), then the validation builder will run again. In theory at least, this could cause indefinite churn as the auto-build and background validation kick each other into action. Note: I have *not* observed this indefinite churn myself, but I have noticed even in a very trivial workspace with one project the autobuild being interrupted several times before completing. Here is a stack trace of a validation job being scheduled from within a build (thus causing the build to abort): Thread [Worker-2] (Suspended (breakpoint at line 349 in InternalJob)) ValidatorJob(InternalJob).schedule(long) line: 349 ValidatorJob(Job).schedule() line: 421 EnabledIncrementalValidatorsOperation(ValidationOperation).launchValidatorJob(WorkbenchReporter, IValidatorJob, ValidatorMetaData, IWorkbenchContext, IFileDelta[]) line: 1762 EnabledIncrementalValidatorsOperation(ValidationOperation).launchJobs(HashSet, WorkbenchReporter) line: 1629 EnabledIncrementalValidatorsOperation(ValidationOperation).validate(WorkbenchReporter) line: 867 EnabledIncrementalValidatorsOperation(ValidationOperation).run(IProgressMonitor) line: 642 ValidationBuilder.build(int, Map, IProgressMonitor) line: 210 BuildManager$2.run() line: 603 SafeRunner.run(ISafeRunnable) line: 37 BuildManager.basicBuild(int, IncrementalProjectBuilder, Map, MultiStatus, IProgressMonitor) line: 167 BuildManager.basicBuild(IProject, int, ICommand[], MultiStatus, IProgressMonitor) line: 201 BuildManager$1.run() line: 230 SafeRunner.run(ISafeRunnable) line: 37 BuildManager.basicBuild(IProject, int, MultiStatus, IProgressMonitor) line: 233 BuildManager.basicBuildLoop(IProject[], IProject[], int, MultiStatus, IProgressMonitor) line: 252 BuildManager.build(int, IProgressMonitor) line: 285 AutoBuildJob.doBuild(IProgressMonitor) line: 149 AutoBuildJob.run(IProgressMonitor) line: 216 Worker.run() line: 58
This certainly sounds important! So, offhand, seems one approach to a better design is the validation framework (or, each validator?) should be wise if not polite :) and if its sees a build in in progress, that it not even schedule a new validation job ... I guess it could schedule a "validate later" job that would be schedule for ... oh, 200 msec's? ... in the future. That job's responsibility would be to just check again if time to validate? So, I'm pretty sure there is, but we should see if there is a "build job" (or family) that's API? I guess to be complete, the ValidationManager should also be a job listener, and if it "sees" a build job start, then it should cancel any validatin jobs it started.
> So, I'm pretty sure there is, but we should see if there is a "build job" (or > family) that's API? The API for this would be Platform.getJobManager().find(ResourcesPlugin.FAMILY_AUTO_BUILD); > I guess to be complete, the ValidationManager should also be a job listener, > and if it "sees" a build job start, then it should cancel any validatin jobs it > started. I don't think this is necessary because the validation job owns a resource rule, and the build job cannot start while that rule is owned. It seems with the current design, if there are 100 projects in my workspace that contain validators, then there will be 100 of these validation jobs queued up by the build process (unless I'm missing something). Another option would be to have the ValidationBuilder just contribute to a queue of ValidationOperations that are pending. Then, an IResourceChangeEvent.POST_BUILD event could schedule a single job to execute this queued set of validations. Certainly David's suggestion would be simpler as an interim fix, but in the 2.0 timeframe you may also want to consider further streamlining to avoid creating many jobs.
I *have* seen infinite builds happening many times, when using a huge J2EE workspace on WTP 1.5.1, following exactly this pattern. This behavior has disruptive side-effects, taking away responsiveness from the entire machine, and is a critical defect for WTP. I have sometimes been able to break the cycle, but you have to click really fast for a long time in the progress view to cancel the jobs that do validation/building. Normal users would be lost. I vote a +1 for WTP 1.5.2 investigation
Slightly better than polling every 200ms would be to create and schedule a proxy job that schedules the actual validation job when autobuild is complete. Here is a generic job that takes another job, and schedule it when autobuild is not running: public class AfterBuildJob extends Job { private static ISchedulingRule afterBuildRule = new ISchedulingRule() { public boolean contains(ISchedulingRule rule) { return rule == this; } public boolean isConflicting(ISchedulingRule rule) { return rule == this; } }; private Job job; public AfterBuildJob(Job job) { super("Waiting for build"); setSystem(true); setRule(afterBuildRule); this.job = job; } protected IStatus run(IProgressMonitor monitor) { waitForBuild(); job.schedule(); return Status.OK_STATUS; } protected void waitForBuild() { try { Platform.getJobManager().join(ResourcesPlugin.FAMILY_AUTO_BUILD, null); } catch (OperationCanceledException e) { } catch (InterruptedException e) { } } } The afterBuildRule is needed to make sure you don't have a hundred of these jobs running at once. It's still less than optimal because you end up with one job instance per validation operation.
Hi John, Thanks for these suggestions, but I'm still a little fuzzy on rules here. The validators are started from a builder. I didn't realize we were not following the rules here... what I am understanding, is builders can't schedule a Job, but "possibly" could schedule a Job that waits for the build to finish.... right? What is the reason again for Autobuild aborting?
Wait - I re-read you initial comments.... We do use scheduling rules in some validators to "prevent" change from happening during the process.... validators should never affect any resources... Sounds like this is bad practice, and we should "always" use a null scheduling rule?
Allowing other jobs to concurrently update the resources that the validators are validating will probably cause Undesirable Consequences.
Affiliation: IBM Release: WTP 1.5.2 Justification: Besides the nastiness described in Chris' comment, this bug was found by John A while helping us identify a behavioural problem in our adopter product. This bug appears to be the cause of a regression and breaks one of our headless applications that worked on WTP 1.0.x.
The validation job is a bit unusual in that it requires locking the whole workspace but doesn't actually modify anything. I think it is important that you have this lock because validators likely are not able to handle a workspace that is changing under its feet while validation is happening. So, I think you are using the correct scheduling rule here. The autobuild behaviour is designed under the assumption that jobs with resource scheduling rules are likely to modify the workspace. If a job is going to modify the workspace, then it's a wasted effort to continue building. For example, if a CVS update is waiting to run, it's much more efficient for the autobuild to back off, and the CVS update to go ahead first. Also, it's very important to allow the user to continue editing the workspace while autobuild is running. When the user makes a change, we need to silently restart the autobuild. What is a bit unusual in this case is that there is a job forked from the build job. I didn't expect this when I introduced the autobuild job in the first place (I thought by putting autobuild in a job I would prevent builders from needing to do so). For cases like this, there really needs to be a way for someone to flag a job so that it should not interrupt autobuild. I have entered platform bug 160024 for this. In the short term, I am suggesting we find a solution to solve this particular case in WTP.
An even simpler solution just occurred to me. The autobuild cancels by calling the Job.isBlocking() method, which specifies that it returns true if there is a *non-system* job being blocked. Therefore, it would sufficie to create a system job from the validation builder, whose sole task is to launch the validation job: public class ValidationLauncherJob extends Job { private Job validationJob; public ValidationLauncherJob(Job validationJob) { super("Waiting for build"); setSystem(true); setRule(ResourcesPlugin.getWorkspace().getRoot()); this.validationJob= validationJob; } protected IStatus run(IProgressMonitor monitor) { validationJob.schedule(); return Status.OK_STATUS; } } Again, this is just a short term workaround. A better solution would involve new support from platform to specify when a job should interrupt autobuild, combined with a streamlining of the validators to avoiding forking N validation jobs for N projects.
Thanks John, I was thinking of a similar solution.... And quickly looking into making a small change that would batch up the n validator jobs per project to reduce the number of active Job's.... Its becoming more obvious the overhead of the Job's themselves is quickly outweighing the benefit.
Another dimension to this: I instrumented the SSE ModelManagerImpl and its cache is basically rendered useless on larger workspaces. What we do is let each validator validate a long list of files. We have a race going on. Each one works down its own list and gets models for each files on its list. In practices, the validators never sync up and we constantly have to create the same expensive model over and over. I saw a big validation where 5,562 models were repeatedly created, and only 3 came from the cache. I added a timer-based model cache where models expire later, and it just won't scale, because on the non-synchronized races happening. Instead, there should be one master validator job that works over the delta set. For each file in the delta set, the master should provide each individual validator with the same file. A timer-based cache in SEE would work wonders then, because all validators would be synced up. Over 90% of time in validation is spent in SSE model creation.
(In reply to comment #9) > The validation job is a bit unusual in that it requires locking the whole > workspace but doesn't actually modify anything. I think it is important that > you have this lock because validators likely are not able to handle a workspace > that is changing under its feet while validation is happening. So, I think you > are using the correct scheduling rule here. Having to lock the entire workspace, for an operation that is inherently read-only still seems wrong to me. It seems that the preferred approach would be to write validators that are defensive in nature. Normally you would expect a small number or even zero user files to be changing, so that if a few files need to be revalidated it would be a small price to pay for not having to lock the entire workspace. It also seems to me that the core platform is becoming more and more like a database, and like a database perhaps it needs to support stable reads.
It may be overkill for some validators to lock the entire workspace, but that likely depends on the validator. Some validators could be checking for referential integrity across multiple files in multiple projects (think of a link checker for HTML), that would have difficulty handling concurrent changes. Other validators that act only on single files in isolation may not require any locking at all (workspace reads require no locks). One option would certainly be to have a null rule on the validate job, and leave it up to particular validator implementations to lock what they need, but that's likely a breaking change to the contract between validators and the validation framework. I.e., something to improve in 2.0 but doesn't help to solve immediate problems.
assigning 152 bug to specific owner. Please reassign if needed.
> assigning 152 bug to specific owner. Please reassign if needed. I really mean it this time :)
Hi John, I am currently debugging this issue. I implemented the ValidatorJobLauncher solution as you have described. It solves the problem where AutoBuildJob is getting cancelled because of EJBValidation waiting (now autobuild runs to completion before EJB Validation starts). However, the issue i am seeing is this: When AutoBuildJob completes, the following happens: In Workerpool.endJob, the following is called, manager.endJob(job, result, true); which results in JobManager.endJob calling JobManager.changeState() In JobManager.changeState() the state of AutoBuildJob is set to Job.NONE, and startTime is set to -1 (indicating this job should not be scheduled again). Now the remaining jobs that are waiting (including validation) run, and at the end of each validaor job, markers are created in a workspace runnable operation, which eventually calls autoBiildJob.build(false). In this method, int state = getState(); is called, which returns 0, and a schedule is called. I have a couple of questions here: 1. In AutoBuildJob.build(), the delay calculation does not take into account the startTime that is set to -1, why is this so? 2. in AutoBuildJob.schedule(delay), a shouldSchedule() is called, which always returns a true (without considering startTime has been set to -1). As a result, a new AutoBuildJob is getting scheduled at the end of EJB Validation, which in turn triggers EJB validation, and the cycle continues infinitely. Your insights into this would be much appreciated.... Thanks!
Hari, you're on the wrong track with your analysis of the startTime field - the field is used for different things depending on the state of the job. If the workspace has changed in any way (someone has done IWorkspace.run, or directly modified a resource), then the autobuild job will run. The autobuild job then does a computation to determine if a build is actually necessary. In this calculation it ignores changes to IMarkers (obviously markers will be generated by the validator job). If there are changes (added, removed, or changed resources), then it will perform a build. From the sound of it, the validator job is modifying the workspace, which is then triggering autobuild. If this never completes, it suggests to me that there is a validator and possibly also a builder that are not behaving incrementally - i.e., they are blindly making changes regardless of the incoming delta. That would be bad news...
John, how easy/hard is it to find out whether a given validator writes to the workspace? Aren't we supposed to launch the validators with a special workspace rule that avoids them from making changes to the workspace? The JSP Syntax validator creates compilation units (and changes the classpath). They never save the class file in the workspace though. Could changing the classpath cause a rebuild?
I've been playing with the test case (WCS workspace) for a few hours, and it doesn't look like the validators are the source of the problem. I think it is the cyclic nature of the workspace that is causing autobuild to run multiple times. It does eventually complete building for me... You can see if validators are modifying the workspace by using a resource change listener (for example there is one in the core resource tools project that dumps delta information into a view in the workspace). As far as I know, validators are not launched with a rule that prevents them modifying the workspace - perhaps someone more familiar with validators can verify that. In any case, Hari's current fix looks like a definite improvement over the existing behaviour.
The neverending build problem with that workspace happens when you have the ejb validator enabled and perform clean on one project (I think the first one in their workspace)which looks like cascades into all the others Chris and I have been able to reproduce it, infact I ran it for more than a day and it had still not finished. With all validators disabled build does complete in 10 min.
Just to make sure, I once again stepped through the EJB Validator code to ensure we are not modifying anything, and also checked if any modifications are happening in the J2EEComponentClasspathUpdater which is receiving a resourceChanged event, in addition to a bunch of other listeners. It appears that we are not making any resource changes before we call a Workspace.run(), other than the marker creation for the validation errors. One thought is: When the autoBuildJob is done, its state is set to 0 in by the JobManager. Then, in the autoBuildJob.build(), it is rescheduled if its state is still 0. Hence, are we supposed/can we to change the state in the EJB validators to a state other than waiting and none, so that it doesn't start? Because if we can do that (and also use the current system job patch) then the cycle of Autobuild-EJB might be breakable....
> Because if we can do that (and also use the current system job patch) then the > cycle of Autobuild-EJB might be breakable.... The autobuild job is always scheduled at the end of a workspace operation or workspace runnable. Within the autobuild job's run method, it only actually invokes builders if there are changes in the workspace that the builder depends on. So, stopping the autobuild job from being scheduled isn't the answer - the fix is to make sure the chain of events after autobuild completes doesn't cause workspace changes that in turn require another build.
> The neverending build problem with that workspace happens when you have the ejb > validator enabled Does it happen when *only* the EJB validator is enabled, or do you have all validators enabled? Still trying to reproduce...
Hi John, Please find below the steps Raj has described in RATLC01129326 to reproduce this issue: Steps import the commerce workspace PI file, ftp://perfdata:perfdata@wsperf.torolab.ibm.com/commerce/radv7/migrated_workspace_brian.zip follow the instructions listed here before importing 2. Create a brand new, empty workspace somewhere. 3. To be able to build the workspace in a more reasonable time, we'll need to suspend all validators. Under Window->Preferences->Validation, select the "suspend all validators" check box. 4. The LinksBuilder validator can take quite a while to complete its work so it's a good idea to disable it. Under Window->Preferences->Web Tools->Links->Validation and Refactoring, uncheck the "Enable LinksBuilder" box. 5. Finally some Java compiler build settings will need to be changed to allow the build to work. Under Window->Preferences->Java->Compiler->Building, set the following: Uncheck "Treat configurable errors like fatal errors", uncheck "Abort build when build path errors occur", set "Incopmlete build path" to warning, and set "Circular dependencies" to Warning. 8. Import the Commerce workspace using the project interchange file import wizard. choose the file downloaded below. after import is done, close rad and bring back up enable only the EJB validator perform project -clean on the first ejb project in the workspace Catalog*Product*Data, enable progress view note that now building will go on forever and ever, i canceled after more than an hour To find out if validation had some influence on this problem, the next time I suspended all validators and tried the same scenario, and the entire operaiton was done in 5 min
The cyclic build is a bug in JDT core, and is unrelated to validators. See bug 160550. I still recommend addressing the bug with validators interrupting the build, since this is a performance issue for any workspace.
Created attachment 51814 [details] Patch containing the system validator launcher implementation
The issue here is that when Autobuild starts, it triggers the EJB Validation. Since EJB Validation is a non-system job, autobuild then politely backs off, and lets EJB validation run. Once the ejb validation runs, it again kicks autobuild off, which triggers EJB validation and the cycle continues for an indefinite period of time. This issue becomes highly repeatable and affects performance in large complex workspaces that have cyclical dependencies. In order to fix this, the patch attached above creates a 'ValidatorLauncherJob' which is a system job. The only thing this job does is launch the actual validation job, and we now schedule the validatorlauncherjob instead of directly scheduling the validationjob. Since autobuild does not stop when a system job is waiting, it now runs to completion without interruption by ejb validation, which is followed by the scheduling and execution of the ejb validation jobs.
Created attachment 51854 [details] This patch supersedes the previous one
I've been watching this bug since I too have a job (Server$ResourceChangeJob, which updates server status and triggers auto-publishing) that is triggered on resource changes and requires a resource lock. FWIW, it looks like I don't need a change since I got lucky and it is already a system job.
The changes are straightforward and each validator will get them for free. Approve.
Straight forward fix, and doesn't change existing Job's behavior, other than waiting for the AutoBuild Job to finish. Hari - Can you seperate the message: "Waiting for build" I approve after this change...
One question. Were you planning on addressing the multiple validator jobs problem as well? Basically reduce the number of validation jobs which is one per validator per project down to just one...period?
Dan, It hasn't been proven this makes a large impact, and we don't want to make such a large change in behavior at this point in the cycle.... We def should continue to "optimize" how validators are run, including batching "similar" validators based on content-type, that would reduce file loading time.
> It hasn't been proven this makes a large impact, and we don't want to make such > a large change in behavior at this point in the cycle.... I definitely agree, but it's something to consider for 2.0. It would be beneficial to reduce the number of jobs, if only to reduce the clutter and flashing in the progress view when validators are running in a large workspace.
Created attachment 51968 [details] Patch incorporating Chuck's review comment to externalize string A summary of the problem and the solution put forth by the attached patch can be found in comment #28.
I just tried the patch posted here on a large workspace (project clean all) I found one problem/behavior that I want to report So earlier when I would perform a project clean the build would go on and validation would go on side by side Now after this patch I find build goes from 0 to 100 % (during which no validation occurs), validation begins after that. This works really well. However one thing to note, If I have progress view open, then UI hangs/ CPU is at 100% for at least 2-3 minutes (this is at the stageu when build ends at 100 % and validation has not begun yet) I am assuming its because so many validation jobs are to be run that the progress view cannot report progress without taking a lot of cpu cycles This was the code that the main thread was executing during the 2-3 minute phase Thread [main] (Suspended) OS.CreateWindowExW(int, char[], char[], int, int, int, int, int, int, int, int, CREATESTRUCT) line: not available [native method] OS.CreateWindowEx(int, TCHAR, TCHAR, int, int, int, int, int, int, int, int, CREATESTRUCT) line: 1903 Label(Control).createHandle() line: 498 Label.createHandle() line: 178 Label(Control).createWidget() line: 523 Label(Control).<init>(Composite, int) line: 98 Label.<init>(Composite, int) line: 91 ProgressInfoItem.createChildren() line: 202 ProgressInfoItem.<init>(Composite, int, JobTreeElement) line: 186 DetailedProgressViewer.createNewItem(JobTreeElement) line: 144 DetailedProgressViewer.add(Object[]) line: 118 DetailedProgressViewer.internalRefresh(Object) line: 329 DetailedProgressViewer(StructuredViewer).internalRefresh(Object, boolean) line: 1211 StructuredViewer$8.run() line: 1415 DetailedProgressViewer(StructuredViewer).preservingSelection(Runnable) line: 1323 DetailedProgressViewer(StructuredViewer).refresh(Object, boolean) line: 1413 ProgressViewerContentProvider.refresh(Object[]) line: 137 ProgressViewUpdater$1.runInUIThread(IProgressMonitor) line: 274 UIJob$1.run() line: 94 RunnableLock.run() line: 35 UISynchronizer(Synchronizer).runAsyncMessages(boolean) line: 123 Display.runAsyncMessages(boolean) line: 3325 Display.readAndDispatch() line: 2971 Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 1914 Workbench.runUI() line: 1878 Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 419 PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149 IDEApplication.run(Object) line: 95 PlatformActivator$1.run(Object) line: 78 EclipseAppLauncher.runApplication(Object) line: 92 EclipseAppLauncher.start(Object) line: 68 EclipseStarter.run(Object) line: 400 EclipseStarter.run(String[], Runnable) line: 177 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25 Method.invoke(Object, Object...) line: 585 Main.invokeFramework(String[], URL[]) line: 336 Main.basicRun(String[]) line: 280 Main.run(String[]) line: 977 Main.main(String[]) line: 952 After the 2-3 mins of UI getting locked up/CPU at 100% validation runs and I see everything end. Clearly there is something in progress view that maybe causing it. If I have the progress view closed, then everything works fine (ie build ends, validation runs seamlessly)
Just want to make it clear that the patch worked, its just that now since all the validation happens at the end of the build, if you have the progress view open at that point of time, main thread hangs for few minutes in trying to update progress view
Sorry to go back on my previous remark(In reply to comment #38) > Just want to make it clear that the patch worked, its just that now since all > the validation happens at the end of the build, if you have the progress view > open at that point of time, main thread hangs for few minutes in trying to > update progress view > When I said the patch works, I meant it does work in the sense, it makes validation kick in only after build is done. However in the commerce workspace you still see build run and then validation run in a never ending cyclic manner. I am told by Hari Shankar that bug 160550 needs to be fixed to address this issue.
+1 for WTP 1.5.2 This is a very thoroughly discussed bug. However, it struck me that John's logic for politeness is actually completely backward for validators. The auto-build aborts because it thinks the validator will change a resource. However, if a builder is modifying the workspace, then the validator should actually abort since the change may alter the validity of the workspace. John - seems like the job scheduling systems needs to have more information about the jobs, i.e. what they might modify, and what they might depend on. Then some dependency analysis could be used to schedule the jobs. (make does that :-)
+1 BTW, bug 151547 has been open for a while to run only one validation job (at a time). You can read it for my reasons, but I admit, "no hard proof" of increased threading and deadlock problems, so, am not saying it has to be fixed in 1.5.2. (And, as stated, there could be other problems that show up once we do that, such as one slow validator preventing any from finishing quickly, thus hurting some UI responsiveness?). Also, I agree with Arthur, validators should exhibit the same "polite" behavior as incremental builds ... if resources really are changed, the current validation cycle should be canceled, and wait until the next incremental build finishes. In other words ... much room for improvement :) But these fixes improve some obvious performance hits, so that's great to have in 1.5.2.
Regarding the single-job validation, bug 160941 shows how many jobs tend to choke the Progress View and substantially slow down validation at this point in time.
Hey guys, We are not disputing that we need to look at consolidating the exceesive jobs problem. but focusing on the problem at hand, (Interrupting the autobuild job), this resolves the issue, and is a safe fix at this point in the cycle. As David pointed out we already have https://bugs.eclipse.org/bugs/show_bug.cgi?id=151547 for this seperate issue.
+1
This is released for the 101706 WTP 1.5.1 and 2.0 builds.
Verified 101906.
verified by me also