Bug 159913 - [hotbug] [validation] Validation builder launches a job that causes build to cancel
Summary: [hotbug] [validation] Validation builder launches a job that causes build to ...
Status: CLOSED FIXED
Alias: None
Product: WTP Common Tools
Classification: WebTools
Component: wst.validation (show other bugs)
Version: 1.5   Edit
Hardware: PC Windows 2000
: P2 major (vote)
Target Milestone: 1.5.2 M152   Edit
Assignee: Hari Shankar CLA
QA Contact: Chuck Bridgham CLA
URL:
Whiteboard: PMC_approved
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-05 13:43 EDT by John Arthorne CLA
Modified: 2006-10-23 14:35 EDT (History)
9 users (show)

See Also:


Attachments
Patch containing the system validator launcher implementation (3.01 KB, patch)
2006-10-11 17:39 EDT, Hari Shankar CLA
no flags Details | Diff
This patch supersedes the previous one (2.80 KB, patch)
2006-10-12 10:22 EDT, Hari Shankar CLA
no flags Details | Diff
Patch incorporating Chuck's review comment to externalize string (5.25 KB, patch)
2006-10-13 14:56 EDT, Hari Shankar CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2006-10-05 13:43:13 EDT
Build: Eclipse 3.2.1, WTP 1.5.1.

Background: Auto-build is a "polite" job.  If it sees that another job with a resource scheduling rule is waiting to run, it aborts the build process to allow that job to run. The reasoning is that there is no point building if there is a job waiting to modify the workspace, which may cause the build effort to be wasted.

In WTP 1.5, the WST validation builder was changed to fork background jobs for certain types of validators (bug 91563). Now the fun begins.  Here is the process flow I have been observing while debugging an unrelated problem:

1) A file in the workspace is modified by the user
2) Autobuild starts running
3) The validation builder forks a background job
4) Autobuild notices that a job is waiting, and aborts the autobuild
5) The validation job runs
6) Autobuild starts again

Thus the validation job and the autobuild job are thrashing against each other.  Even worse, if any thread modifies the workspace between 4) and 6), then the validation builder will run again.  In theory at least, this could cause indefinite churn as the auto-build and background validation kick each other into action.  Note: I have *not* observed this indefinite churn myself, but I have noticed even in a very trivial workspace with one project the autobuild being interrupted several times before completing.

Here is a stack trace of a validation job being scheduled from within a build (thus causing the build to abort):

Thread [Worker-2] (Suspended (breakpoint at line 349 in InternalJob))	
	ValidatorJob(InternalJob).schedule(long) line: 349	
	ValidatorJob(Job).schedule() line: 421	
	EnabledIncrementalValidatorsOperation(ValidationOperation).launchValidatorJob(WorkbenchReporter, IValidatorJob, ValidatorMetaData, IWorkbenchContext, IFileDelta[]) line: 1762	
	EnabledIncrementalValidatorsOperation(ValidationOperation).launchJobs(HashSet, WorkbenchReporter) line: 1629	
	EnabledIncrementalValidatorsOperation(ValidationOperation).validate(WorkbenchReporter) line: 867	
	EnabledIncrementalValidatorsOperation(ValidationOperation).run(IProgressMonitor) line: 642	
	ValidationBuilder.build(int, Map, IProgressMonitor) line: 210	
	BuildManager$2.run() line: 603	
	SafeRunner.run(ISafeRunnable) line: 37	
	BuildManager.basicBuild(int, IncrementalProjectBuilder, Map, MultiStatus, IProgressMonitor) line: 167	
	BuildManager.basicBuild(IProject, int, ICommand[], MultiStatus, IProgressMonitor) line: 201	
	BuildManager$1.run() line: 230	
	SafeRunner.run(ISafeRunnable) line: 37	
	BuildManager.basicBuild(IProject, int, MultiStatus, IProgressMonitor) line: 233	
	BuildManager.basicBuildLoop(IProject[], IProject[], int, MultiStatus, IProgressMonitor) line: 252	
	BuildManager.build(int, IProgressMonitor) line: 285	
	AutoBuildJob.doBuild(IProgressMonitor) line: 149	
	AutoBuildJob.run(IProgressMonitor) line: 216	
	Worker.run() line: 58
Comment 1 David Williams CLA 2006-10-05 15:34:55 EDT
This certainly sounds important! 

So, offhand, seems one approach to a better design is the validation framework (or, each validator?) should  be wise if not polite :) and if its sees a build in in progress, that it not even schedule a new validation job ... I guess it could schedule a "validate later" job that would be schedule for ... oh, 200 msec's? ...  in the future. That job's responsibility would be to just check again if time to validate?

So, I'm pretty sure there is, but we should see if there is a "build job" (or family) that's API? 

I guess to be complete, the ValidationManager should also be a job listener, and if it "sees" a build job start, then it should cancel any validatin jobs it started. 



Comment 2 John Arthorne CLA 2006-10-05 16:31:30 EDT
> So, I'm pretty sure there is, but we should see if there is a "build job" (or
> family) that's API? 

The API for this would be Platform.getJobManager().find(ResourcesPlugin.FAMILY_AUTO_BUILD);

> I guess to be complete, the ValidationManager should also be a job listener,
> and if it "sees" a build job start, then it should cancel any validatin jobs it
> started. 

I don't think this is necessary because the validation job owns a resource rule, and the build job cannot start while that rule is owned.

It seems with the current design, if there are 100 projects in my workspace that contain validators, then there will be 100 of these validation jobs queued up by the build process (unless I'm missing something).  Another option would be to have the ValidationBuilder just contribute to a queue of ValidationOperations that are pending.  Then, an IResourceChangeEvent.POST_BUILD event could schedule a single job to execute this queued set of validations.  Certainly David's suggestion would be simpler as an interim fix, but in the 2.0 timeframe you may also want to consider further streamlining to avoid creating many jobs.
Comment 3 Chris Laffra CLA 2006-10-05 16:44:35 EDT
I *have* seen infinite builds happening many times, when using a huge J2EE workspace on WTP 1.5.1, following exactly this pattern.

This behavior has disruptive side-effects, taking away responsiveness from the
entire machine, and is a critical defect for WTP. I have sometimes been able to break the cycle, but you have to click really fast for a long time in the progress view to cancel the jobs that do validation/building. Normal users would be lost.

I vote a +1 for WTP 1.5.2 investigation
Comment 4 John Arthorne CLA 2006-10-05 17:22:47 EDT
Slightly better than polling every 200ms would be to create and schedule a proxy job that schedules the actual validation job when autobuild is complete. Here is a generic job that takes another job, and schedule it when autobuild is not running:

public class AfterBuildJob extends Job {
	private static ISchedulingRule afterBuildRule = new ISchedulingRule() {
		public boolean contains(ISchedulingRule rule) {
			return rule == this;
		}
		public boolean isConflicting(ISchedulingRule rule) {
			return rule == this;
		}
	};
	private Job job;
	public AfterBuildJob(Job job) {
		super("Waiting for build");
		setSystem(true);
		setRule(afterBuildRule);
		this.job = job;
	}
	protected IStatus run(IProgressMonitor monitor) {
		waitForBuild();
		job.schedule();
		return Status.OK_STATUS;
	}
	protected void waitForBuild() {
		try {
			Platform.getJobManager().join(ResourcesPlugin.FAMILY_AUTO_BUILD, null);
		} catch (OperationCanceledException e) {
		} catch (InterruptedException e) {
		}
	}
}

The afterBuildRule is needed to make sure you don't have a hundred of these jobs running at once.  It's still less than optimal because you end up with one job instance per validation operation.
Comment 5 Chuck Bridgham CLA 2006-10-06 10:06:41 EDT
Hi John,

Thanks for these suggestions, but I'm still a little fuzzy on rules here.

The validators are started from a builder.
I didn't realize we were not following the rules here... what I am
understanding, is builders can't schedule a Job, but "possibly" could schedule
a Job that waits for the build to finish....   right?

What is the reason again for Autobuild aborting?
Comment 6 Chuck Bridgham CLA 2006-10-06 10:09:48 EDT
Wait - I re-read you initial comments....    

We do use scheduling rules in some validators to "prevent" change from happening during the process....    validators should never affect any resources...

Sounds like this is bad practice, and we should "always" use a null scheduling rule?
Comment 7 John Pitman CLA 2006-10-06 10:41:04 EDT
Allowing other jobs to concurrently update the resources that the validators are validating will probably cause Undesirable Consequences.
Comment 8 Tim deBoer CLA 2006-10-06 10:51:01 EDT
Affiliation: IBM
Release: WTP 1.5.2
Justification: Besides the nastiness described in Chris' comment, this bug was found by John A while helping us identify a behavioural problem in our adopter product. This bug appears to be the cause of a regression and breaks one of our headless applications that worked on WTP 1.0.x.
Comment 9 John Arthorne CLA 2006-10-06 10:55:50 EDT
The validation job is a bit unusual in that it requires locking the whole workspace but doesn't actually modify anything. I think it is important that you have this lock because validators likely are not able to handle a workspace that is changing under its feet while validation is happening. So, I think you are using the correct scheduling rule here.

The autobuild behaviour is designed under the assumption that jobs with resource scheduling rules are likely to modify the workspace.  If a job is going to modify the workspace, then it's a wasted effort to continue building.  For example, if a CVS update is waiting to run, it's much more efficient for the autobuild to back off, and the CVS update to go ahead first.  Also, it's very important to allow the user to continue editing the workspace while autobuild is running.  When the user makes a change, we need to silently restart the autobuild.

What is a bit unusual in this case is that there is a job forked from the build job. I didn't expect this when I introduced the autobuild job in the first place (I thought by putting autobuild in a job I would prevent builders from needing to do so). For cases like this, there really needs to be a way for someone to flag a job so that it should not interrupt autobuild.  I have entered platform bug 160024 for this.  In the short term, I am suggesting we find a solution to solve this particular case in WTP.
Comment 10 John Arthorne CLA 2006-10-06 11:11:21 EDT
An even simpler solution just occurred to me.  The autobuild cancels by calling the Job.isBlocking() method, which specifies that it returns true if there is a *non-system* job being blocked.  Therefore, it would sufficie to create a system job from the validation builder, whose sole task is to launch the validation job:

public class ValidationLauncherJob extends Job {
	private Job validationJob;
	public ValidationLauncherJob(Job validationJob) {
		super("Waiting for build");
		setSystem(true);
		setRule(ResourcesPlugin.getWorkspace().getRoot());
		this.validationJob= validationJob;
	}
	protected IStatus run(IProgressMonitor monitor) {
		validationJob.schedule();
		return Status.OK_STATUS;
	}
}

Again, this is just a short term workaround.  A better solution would involve new support from platform to specify when a job should interrupt autobuild, combined with a streamlining of the validators to avoiding forking N validation jobs for N projects.
Comment 11 Chuck Bridgham CLA 2006-10-06 11:26:16 EDT
Thanks John,

I was thinking of a similar solution....   And quickly looking into making a small change that would batch up the n validator jobs per project to reduce the number of active Job's....    Its becoming more obvious the overhead of the Job's themselves is quickly outweighing the benefit.
Comment 12 Chris Laffra CLA 2006-10-06 12:07:15 EDT
Another dimension to this:

I instrumented the SSE ModelManagerImpl and its cache is basically rendered useless on larger workspaces. What we do is let each validator validate a long list of files. We have a race going on. Each one works down its own list and gets models for each files on its list. In practices, the validators never sync up and we constantly have to create the same expensive model over and over. I saw a big validation where 5,562 models were repeatedly created, and only 3 came from the cache. I added a timer-based model cache where models expire later, and it just won't scale, because on the non-synchronized races happening.

Instead, there should be one master validator job that works over the delta set. For each file in the delta set, the master should provide each individual validator with the same file. A timer-based cache in SEE would work wonders then, because all validators would be synced up.

Over 90% of time in validation is spent in SSE model creation.
Comment 13 Gary Karasiuk CLA 2006-10-06 13:27:49 EDT
(In reply to comment #9)
> The validation job is a bit unusual in that it requires locking the whole
> workspace but doesn't actually modify anything. I think it is important that
> you have this lock because validators likely are not able to handle a workspace
> that is changing under its feet while validation is happening. So, I think you
> are using the correct scheduling rule here.

Having to lock the entire workspace, for an operation that is inherently read-only still seems wrong to me. It seems that the preferred approach would be to write validators that are defensive in nature.  Normally you would expect a small number or even zero user files to be changing, so that if a few files need to be revalidated it would be a small price to pay for not having to lock the entire workspace. 

It also seems to me that the core platform is becoming more and more like a database, and like a database perhaps it needs to support stable reads. 
Comment 14 John Arthorne CLA 2006-10-06 14:31:11 EDT
It may be overkill for some validators to lock the entire workspace, but that likely depends on the validator.  Some validators could be checking for referential integrity across multiple files in multiple projects (think of a link checker for HTML), that would have difficulty handling concurrent changes. Other validators that act only on single files in isolation may not require any locking at all (workspace reads require no locks).  One option would certainly be to have a null rule on the validate job, and leave it up to particular validator implementations to lock what they need, but that's likely a breaking change to the contract between validators and the validation framework. I.e., something to improve in 2.0 but doesn't help to solve immediate problems.
Comment 15 David Williams CLA 2006-10-06 19:16:24 EDT
assigning 152 bug to specific owner. Please reassign if needed.
Comment 16 David Williams CLA 2006-10-06 21:33:47 EDT
> assigning 152 bug to specific owner. Please reassign if needed.
I really mean it this time :) 

Comment 17 Hari Shankar CLA 2006-10-09 16:10:49 EDT
Hi John,

I am currently debugging this issue. I implemented the ValidatorJobLauncher
solution as you have described. It solves the problem where AutoBuildJob is
getting cancelled because of EJBValidation waiting (now autobuild runs to
completion before EJB Validation starts). 

However, the issue i am seeing is this:

When AutoBuildJob completes, the following happens:

In Workerpool.endJob, the following is called, 

manager.endJob(job, result, true);

which results in 

JobManager.endJob calling JobManager.changeState()

In JobManager.changeState() the state of AutoBuildJob is set to Job.NONE, and
startTime is set to -1 (indicating this job should not be scheduled again).

Now the remaining jobs that are waiting (including validation) run, and at the
end of each validaor job, markers are created in a workspace runnable
operation, which eventually calls autoBiildJob.build(false). 

In this method, int state = getState(); is called, which returns 0, and a
schedule is called.

I have a couple of questions here:

1. In AutoBuildJob.build(), the delay calculation does not take into account
the startTime that is set to -1, why is this so?
2. in AutoBuildJob.schedule(delay), a shouldSchedule() is called, which always
returns a true (without considering startTime has been set to -1).

As a result, a new AutoBuildJob is getting scheduled at the end of EJB
Validation, which in turn triggers EJB validation, and the cycle continues
infinitely. 

Your insights into this would be much appreciated....

Thanks! 
Comment 18 John Arthorne CLA 2006-10-10 11:06:36 EDT
Hari, you're on the wrong track with your analysis of the startTime field - the field is used for different things depending on the state of the job.  If the workspace has changed in any way (someone has done IWorkspace.run, or directly modified a resource), then the autobuild job will run. The autobuild job then does a computation to determine if a build is actually necessary. In this calculation it ignores changes to IMarkers (obviously markers will be generated by the validator job).  If there are changes (added, removed, or changed resources), then it will perform a build.

From the sound of it, the validator job is modifying the workspace, which is then triggering autobuild.  If this never completes, it suggests to me that there is a validator and possibly also a builder that are not behaving incrementally - i.e., they are blindly making changes regardless of the incoming delta.  That would be bad news...
Comment 19 Chris Laffra CLA 2006-10-10 11:30:53 EDT
John, how easy/hard is it to find out whether a given validator writes to the workspace? Aren't we supposed to launch the validators with a special workspace rule that avoids them from making changes to the workspace?

The JSP Syntax validator creates compilation units (and changes the classpath). They never save the class file in the workspace though. Could changing the classpath cause a rebuild?
Comment 20 John Arthorne CLA 2006-10-10 18:07:18 EDT
I've been playing with the test case (WCS workspace) for a few hours, and it doesn't look like the validators are the source of the problem.  I think it is the cyclic nature of the workspace that is causing autobuild to run multiple times. It does eventually complete building for me...

You can see if validators are modifying the workspace by using a resource change listener (for example there is one in the core resource tools project that dumps delta information into a view in the workspace).  As far as I know, validators are not launched with a rule that prevents them modifying the workspace - perhaps someone more familiar with validators can verify that.

In any case, Hari's current fix looks like a definite improvement over the existing behaviour.
Comment 21 Raj Mandayam CLA 2006-10-10 18:39:38 EDT
The neverending build problem with that workspace happens when you have the ejb validator enabled and perform clean on one project (I think the first one in their workspace)which looks like cascades into all the others 
Chris and I have been able to reproduce it, infact I ran it for more than a day and it had still not finished. With all validators disabled build does complete in  10 min.
Comment 22 Hari Shankar CLA 2006-10-11 10:37:17 EDT
Just to make sure, I once again stepped through the EJB Validator code to ensure we are not modifying anything, and also checked if any modifications are happening in the J2EEComponentClasspathUpdater which is receiving a resourceChanged event, in addition to a bunch of other listeners. It appears that we are not making any resource changes before we call a Workspace.run(), other than the marker creation for the validation errors.

One thought is: When the autoBuildJob is done, its state is set to 0 in by the JobManager. Then, in the autoBuildJob.build(), it is rescheduled if its state is still 0. Hence, are we supposed/can we to change the state in the EJB validators to a state other than waiting and none, so that it doesn't start? Because if we can do that (and also use the current system job patch) then the cycle of Autobuild-EJB might be breakable....
Comment 23 John Arthorne CLA 2006-10-11 11:19:10 EDT
> Because if we can do that (and also use the current system job patch) then the
> cycle of Autobuild-EJB might be breakable....

The autobuild job is always scheduled at the end of a workspace operation or workspace runnable.  Within the autobuild job's run method, it only actually invokes builders if there are changes in the workspace that the builder depends on.  So, stopping the autobuild job from being scheduled isn't the answer - the fix is to make sure the chain of events after autobuild completes doesn't cause workspace changes that in turn require another build.
Comment 24 John Arthorne CLA 2006-10-11 11:53:41 EDT
> The neverending build problem with that workspace happens when you have the ejb
> validator enabled

Does it happen when *only* the EJB validator is enabled, or do you have all validators enabled?  Still trying to reproduce...
Comment 25 Hari Shankar CLA 2006-10-11 12:10:35 EDT
Hi John,

Please find below the steps Raj has described in  RATLC01129326 to reproduce this issue:

Steps
import the commerce workspace PI file,
ftp://perfdata:perfdata@wsperf.torolab.ibm.com/commerce/radv7/migrated_workspace_brian.zip

follow the instructions listed here before importing

2. Create a brand new, empty workspace somewhere.

3. To be able to build the workspace in a more reasonable time, we'll need to
suspend all validators. Under Window->Preferences->Validation, select the
"suspend all validators" check box.

4. The LinksBuilder validator can take quite a while to complete its work so
it's a good idea to disable it. Under
Window->Preferences->Web Tools->Links->Validation and Refactoring,
uncheck the "Enable LinksBuilder" box.

5. Finally some Java compiler build settings will need to be changed to allow
the build to work. Under Window->Preferences->Java->Compiler->Building, set the
following: Uncheck "Treat configurable errors like fatal errors", uncheck
"Abort build when build path errors occur", set "Incopmlete build path" to warning,
and set "Circular dependencies" to Warning.

8. Import the Commerce workspace using the project interchange file import
wizard. choose the file downloaded below.

after import is done, close rad and bring back up

enable only the EJB validator

perform project -clean on the first ejb project in the workspace Catalog*Product*Data,
enable progress view

note that now building will go on forever and ever, i canceled after more than an hour

To find out if validation had some influence on this problem, the next time I suspended all validators
and tried the same scenario, and the entire operaiton was done in 5 min
Comment 26 John Arthorne CLA 2006-10-11 16:52:16 EDT
The cyclic build is a bug in JDT core, and is unrelated to validators. See bug 160550.  I still recommend addressing the bug with validators interrupting the build, since this is a performance issue for any workspace.
Comment 27 Hari Shankar CLA 2006-10-11 17:39:57 EDT
Created attachment 51814 [details]
Patch containing the system validator launcher implementation
Comment 28 Hari Shankar CLA 2006-10-12 10:04:57 EDT
The issue here is that when Autobuild starts, it triggers the EJB Validation. Since EJB Validation is a non-system job, autobuild then politely backs off, and lets EJB validation run. Once the ejb validation runs, it again kicks autobuild off, which triggers EJB validation and the cycle continues for an indefinite period of time.

This issue becomes highly repeatable and affects performance in large complex workspaces that have cyclical dependencies.

In order to fix this, the patch attached above creates a 'ValidatorLauncherJob' which is a system job. The only thing this job does is launch the actual validation job, and we now schedule the validatorlauncherjob instead of directly scheduling the validationjob. Since autobuild does not stop when a system job is waiting, it now runs to completion without interruption by ejb validation, which is followed by the scheduling and execution of the ejb validation jobs.
Comment 29 Hari Shankar CLA 2006-10-12 10:22:33 EDT
Created attachment 51854 [details]
This patch supersedes the previous one
Comment 30 Tim deBoer CLA 2006-10-12 10:43:24 EDT
I've been watching this bug since I too have a job (Server$ResourceChangeJob, which updates server status and triggers auto-publishing) that is triggered on resource changes and requires a resource lock. FWIW, it looks like I don't need a change since I got lucky and it is already a system job.
Comment 31 John Lanuti CLA 2006-10-12 11:56:55 EDT
The changes are straightforward and each validator will get them for free.
Approve.
Comment 32 Chuck Bridgham CLA 2006-10-12 15:58:55 EDT
Straight forward fix, and doesn't change existing Job's behavior, other than waiting for the AutoBuild Job to finish.

Hari - Can you seperate the message:  "Waiting for build"

I approve after this change...
Comment 33 Daniel Berg CLA 2006-10-12 18:15:51 EDT
One question.  Were you planning on addressing the multiple validator jobs problem as well?  Basically reduce the number of validation jobs which is one per validator per project down to just one...period?
Comment 34 Chuck Bridgham CLA 2006-10-13 09:23:52 EDT
Dan,

It hasn't been proven this makes a large impact, and we don't want to make such a large change in behavior at this point in the cycle....

We def should continue to "optimize" how validators are run, including batching
"similar" validators based on content-type, that would reduce file loading time.
Comment 35 John Arthorne CLA 2006-10-13 10:18:56 EDT
> It hasn't been proven this makes a large impact, and we don't want to make such
> a large change in behavior at this point in the cycle....

I definitely agree, but it's something to consider for 2.0.  It would be beneficial to reduce the number of jobs, if only to reduce the clutter and flashing in the progress view when validators are running in a large workspace.
Comment 36 Hari Shankar CLA 2006-10-13 14:56:12 EDT
Created attachment 51968 [details]
Patch incorporating Chuck's review comment to externalize string

A summary of the problem and the solution put forth by the attached patch can be found in comment #28.
Comment 37 Raj Mandayam CLA 2006-10-13 15:53:22 EDT
I just tried the patch posted here on a large workspace (project clean all)
I found one problem/behavior that I want to report

So earlier when I would perform a project clean the build would go on and validation would go on side by side

Now after this patch I find build goes from 0 to 100 % (during which no validation occurs), validation begins after that. This works really well.

However one thing to note,

If I have progress view open, then UI hangs/ CPU is at 100% for at least 2-3 minutes (this is at the stageu when build ends at 100 % and validation has not begun yet) I am assuming its because so many validation jobs are to be run that the progress view cannot report progress without taking a lot of cpu cycles

This was the code that the main thread was executing during the 2-3 minute phase

Thread [main] (Suspended)	
	OS.CreateWindowExW(int, char[], char[], int, int, int, int, int, int, int, int, CREATESTRUCT) line: not available [native method]	
	OS.CreateWindowEx(int, TCHAR, TCHAR, int, int, int, int, int, int, int, int, CREATESTRUCT) line: 1903	
	Label(Control).createHandle() line: 498	
	Label.createHandle() line: 178	
	Label(Control).createWidget() line: 523	
	Label(Control).<init>(Composite, int) line: 98	
	Label.<init>(Composite, int) line: 91	
	ProgressInfoItem.createChildren() line: 202	
	ProgressInfoItem.<init>(Composite, int, JobTreeElement) line: 186	
	DetailedProgressViewer.createNewItem(JobTreeElement) line: 144	
	DetailedProgressViewer.add(Object[]) line: 118	
	DetailedProgressViewer.internalRefresh(Object) line: 329	
	DetailedProgressViewer(StructuredViewer).internalRefresh(Object, boolean) line: 1211	
	StructuredViewer$8.run() line: 1415	
	DetailedProgressViewer(StructuredViewer).preservingSelection(Runnable) line: 1323	
	DetailedProgressViewer(StructuredViewer).refresh(Object, boolean) line: 1413	
	ProgressViewerContentProvider.refresh(Object[]) line: 137	
	ProgressViewUpdater$1.runInUIThread(IProgressMonitor) line: 274	
	UIJob$1.run() line: 94	
	RunnableLock.run() line: 35	
	UISynchronizer(Synchronizer).runAsyncMessages(boolean) line: 123	
	Display.runAsyncMessages(boolean) line: 3325	
	Display.readAndDispatch() line: 2971	
	Workbench.runEventLoop(Window$IExceptionHandler, Display) line: 1914	
	Workbench.runUI() line: 1878	
	Workbench.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 419	
	PlatformUI.createAndRunWorkbench(Display, WorkbenchAdvisor) line: 149	
	IDEApplication.run(Object) line: 95	
	PlatformActivator$1.run(Object) line: 78	
	EclipseAppLauncher.runApplication(Object) line: 92	
	EclipseAppLauncher.start(Object) line: 68	
	EclipseStarter.run(Object) line: 400	
	EclipseStarter.run(String[], Runnable) line: 177	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25	
	Method.invoke(Object, Object...) line: 585	
	Main.invokeFramework(String[], URL[]) line: 336	
	Main.basicRun(String[]) line: 280	
	Main.run(String[]) line: 977	
	Main.main(String[]) line: 952	

After the 2-3 mins of UI getting locked up/CPU at 100% validation runs and
I see everything end.
Clearly there is something in progress view that maybe causing it.

If I have the progress view closed, then everything works fine (ie build ends, validation runs seamlessly)
Comment 38 Raj Mandayam CLA 2006-10-13 16:01:02 EDT
Just want to make it clear that the patch worked, its just that now since all the validation happens at the end of the build, if you have the progress view open at that point of time, main thread hangs for few minutes in trying to update progress view
Comment 39 Raj Mandayam CLA 2006-10-13 16:20:56 EDT
Sorry to go back on my previous remark(In reply to comment #38)
> Just want to make it clear that the patch worked, its just that now since all
> the validation happens at the end of the build, if you have the progress view
> open at that point of time, main thread hangs for few minutes in trying to
> update progress view
> 
When I said the patch works, I meant it does work in the sense, it makes validation kick in only after build is done. However in the commerce workspace you still see build run and then validation run in a never ending cyclic manner. I am told by Hari Shankar that bug 160550 needs to be fixed to address this issue. 

Comment 40 Arthur Ryman CLA 2006-10-13 17:18:04 EDT
+1 for WTP 1.5.2

This is a very thoroughly discussed bug. However, it struck me that John's logic for politeness is actually completely backward for validators. The auto-build aborts because it thinks the validator will change a resource. However, if a builder is modifying the workspace, then the validator should actually abort since the change may alter the validity of the workspace.

John - seems like the job scheduling systems needs to have more information about the jobs, i.e. what they might modify, and what they might depend on. Then some dependency analysis could be used to schedule the jobs. (make does that :-)
Comment 41 David Williams CLA 2006-10-13 18:15:23 EDT
+1

BTW, bug 151547 has been open for a while to run only one validation job (at a time). You can read it for my reasons, but I admit, "no hard proof" of increased threading and deadlock problems, so, am not saying it has to be fixed in 1.5.2. 
(And, as stated, there could be other problems that show up once we do that, such as one slow validator preventing any from finishing quickly, thus hurting some UI responsiveness?). 

Also, I agree with Arthur, validators should exhibit the same "polite" behavior as incremental builds ... if resources really are changed, the current validation cycle should be canceled, and wait until the next incremental build finishes. 

In other words ... much room for improvement :) But these fixes improve some obvious performance hits, so that's great to have in 1.5.2. 

Comment 42 Chris Laffra CLA 2006-10-13 18:30:31 EDT
Regarding the single-job validation, bug 160941 shows how many jobs tend to choke the Progress View and substantially slow down validation at this point in time.
Comment 43 Chuck Bridgham CLA 2006-10-16 10:30:10 EDT
Hey guys,

We are not disputing that we need to look at consolidating the exceesive jobs problem. but focusing on the problem at hand,  (Interrupting the autobuild job), this resolves the issue, and is a safe fix at this point in the cycle.

As David pointed out we already have  https://bugs.eclipse.org/bugs/show_bug.cgi?id=151547
for this seperate issue.
Comment 44 Tim Wagner CLA 2006-10-16 16:25:30 EDT
+1
Comment 45 John Lanuti CLA 2006-10-16 16:48:00 EDT
This is released for the 101706 WTP 1.5.1 and 2.0 builds.
Comment 46 John Lanuti CLA 2006-10-20 09:17:52 EDT
Verified 101906.
Comment 47 Chris Laffra CLA 2006-10-20 09:45:27 EDT
verified by me also