Bug 68452 - I200406221600: Null monitor being passed to Jobs
Summary: I200406221600: Null monitor being passed to Jobs
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P1 major (vote)
Target Milestone: 3.0 RC4   Edit
Assignee: John Arthorne CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-24 02:55 EDT by Neil Swingler CLA
Modified: 2004-06-24 14:12 EDT (History)
4 users (show)

See Also:


Attachments
Patch to JobManager.java (932 bytes, patch)
2004-06-24 12:19 EDT, John Arthorne CLA
no flags Details | Diff
Improved patch (3.09 KB, patch)
2004-06-24 14:02 EDT, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Neil Swingler CLA 2004-06-24 02:55:45 EDT
I200406221600
Sun JDK 1.4.2_04

Not sure what effect this had on overall stability since I immediately restarted
eclipse.

log follows:

!ENTRY org.eclipse.core.runtime 4 2 Jun 24, 2004 08:45:13.820
!MESSAGE An internal error occurred during: "Decoration Calculation".
!STACK 0
java.lang.NullPointerException
	at
org.eclipse.ui.internal.decorators.DecorationScheduler$1.run(DecorationScheduler.java:222)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:66)
Comment 1 Tod Creasey CLA 2004-06-24 08:21:16 EDT
Did you have any more in you log?
Comment 2 Tod Creasey CLA 2004-06-24 08:29:01 EDT
Line 222 is 

	monitor.beginTask(WorkbenchMessages.getString
("DecorationScheduler.CalculatingTask"), 100); //$NON-NLS-1$

This is a monitor passed to me by the JobManager.

Moving to Core
Comment 3 Tod Creasey CLA 2004-06-24 08:31:27 EDT
This is suspicous code in JobManager:

	/**
	 * Attempts to immediately start a given job.  Returns true if the job 
was
	 * successfully started, and false if it could not be started 
immediately
	 * due to a currently running job with a conflicting rule.  Listeners 
will never
	 * be notified of jobs that are run in this way.
	 */
	protected boolean runNow(InternalJob job) {
		synchronized (lock) {
			//cannot start if there is a conflicting job
			if (findBlockingJob(job) != null)
				return false;
			changeState(job, Job.RUNNING);
			job.setProgressMonitor(new NullProgressMonitor());
			job.run(null);
		}
		return true;
Comment 4 Tod Creasey CLA 2004-06-24 08:40:47 EDT
Looks like it was introduced as fix to Bug 62368 last month.
Comment 5 Neil Swingler CLA 2004-06-24 09:09:25 EDT
Nothing else in the log. It happened to me again and this time I kept running
with no adverse effects.
Comment 6 DJ Houghton CLA 2004-06-24 09:20:42 EDT
The spec for Job#run says that the progress monitor is never null. I will check
with John this morning but I believe that a good change would be:

	...
	job.setProgressMonitor(new NullProgressMonitor());
	job.run(job.getProgressMonitor());
	...
Comment 7 DJ Houghton CLA 2004-06-24 10:04:55 EDT
We have reviewed the code and that was a red herring. The #runNow method is only
called on ThreadJob instances which have no-ops for their #run methods. 

We will investigate further.
Comment 8 John Arthorne CLA 2004-06-24 11:13:47 EDT
I can reproduce this in a test case. If a job change listener reschedules the
same job from within the scheduled() callback, we can get into a state where the
same job is being executed in two worker threads at once. When the first thread
finishes, it nulls the progress monitor, thus causing the NPE in the second
thread. Thus the NPE is a sympton of an extremely severe problem. Having the
same job running in multiple threads at once can potentially have many adverse
side-effects.
Comment 9 John Arthorne CLA 2004-06-24 12:19:09 EDT
Created attachment 12785 [details]
Patch to JobManager.java

This is the proposed fix. The old code did this:

1) enter sync block
2) assert scheduling preconditions
3) exit sync block
4) call job change listeners
5) enter sync block
6) change job state and add to wait queue
7) exit sync block

Step 2) (assertion preconditions) is the guard code to ensure that the same job
is not scheduled twice. If this occurs, the second schedule() attempt is
ignored (this is the specified behaviour).

During 4), another thread (or the same thread in a recursive schedule() call),
could reschedule the identical job. Step 6) would then occur twice on the same
job, causing the NPE, or much worse problems.

The patch adds a new step to re-assert the preconditions after entering the
sync block a second time:

1) enter sync block
2) assert scheduling preconditions
3) exit sync block
4) call job change listeners
5) enter sync block
5a) NEW: assert scheduling preconditions
6) change job state and add to wait queue
7) exit sync block

Why check the preconditions twice? We want to avoid calling job change
listeners and telling them the job is scheduled if in fact it will be
discarded. This reduces notifications that would cause unnecessary flicker.
With the patched code, it is still possible for a listener to get the
scheduled() notification twice for the same job. This is not great, but much
better than actually running the job twice simultaneously.
Comment 10 John Arthorne CLA 2004-06-24 12:19:41 EDT
We need to get the fix approval process started.
Comment 11 John Arthorne CLA 2004-06-24 14:02:27 EDT
Created attachment 12795 [details]
Improved patch

Improved patch. This patch avoids the possibility of multiple scheduling
notifications. It adds a new internal state to indicate that a job has been
scheduled but not yet added to the wait queue. Thus a second call to schedule()
can distinguish this case properly and immediately bail out.
Comment 12 John Arthorne CLA 2004-06-24 14:05:35 EDT
Fix released. Fix reviewed by Jean-Michel, DJ, McQ, MVM, and Kevin H.
Comment 13 John Arthorne CLA 2004-06-24 14:12:55 EDT
It is quite likely that the following bugs are duplicates of this one:

bug 57656
bug 63635
bug 65415
bug 68240