68452 – I200406221600: Null monitor being passed to Jobs

Bug 68452 - I200406221600: Null monitor being passed to Jobs

Summary: I200406221600: Null monitor being passed to Jobs

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Resources (show other bugs)
Version:	3.0
Hardware:	PC Windows XP

Importance:	P1 major (vote)
Target Milestone:	3.0 RC4
Assignee:	John Arthorne
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2004-06-24 02:55 EDT by Neil Swingler
Modified:	2004-06-24 14:12 EDT (History)
CC List:	4 users (show)

See Also:

Attachments
Patch to JobManager.java (932 bytes, patch) 2004-06-24 12:19 EDT, John Arthorne	no flags	Details \| Diff
Improved patch (3.09 KB, patch) 2004-06-24 14:02 EDT, John Arthorne	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Neil Swingler

2004-06-24 02:55:45 EDT

I200406221600
Sun JDK 1.4.2_04

Not sure what effect this had on overall stability since I immediately restarted
eclipse.

log follows:

!ENTRY org.eclipse.core.runtime 4 2 Jun 24, 2004 08:45:13.820
!MESSAGE An internal error occurred during: "Decoration Calculation".
!STACK 0
java.lang.NullPointerException
	at
org.eclipse.ui.internal.decorators.DecorationScheduler$1.run(DecorationScheduler.java:222)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:66)

Comment 1 Tod Creasey

2004-06-24 08:21:16 EDT

Did you have any more in you log?

Comment 2 Tod Creasey

2004-06-24 08:29:01 EDT

Line 222 is 

	monitor.beginTask(WorkbenchMessages.getString
("DecorationScheduler.CalculatingTask"), 100); //$NON-NLS-1$

This is a monitor passed to me by the JobManager.

Moving to Core

Comment 3 Tod Creasey

2004-06-24 08:31:27 EDT

This is suspicous code in JobManager:

	/**
	 * Attempts to immediately start a given job.  Returns true if the job 
was
	 * successfully started, and false if it could not be started 
immediately
	 * due to a currently running job with a conflicting rule.  Listeners 
will never
	 * be notified of jobs that are run in this way.
	 */
	protected boolean runNow(InternalJob job) {
		synchronized (lock) {
			//cannot start if there is a conflicting job
			if (findBlockingJob(job) != null)
				return false;
			changeState(job, Job.RUNNING);
			job.setProgressMonitor(new NullProgressMonitor());
			job.run(null);
		}
		return true;

Comment 4 Tod Creasey

2004-06-24 08:40:47 EDT

Looks like it was introduced as fix to Bug 62368 last month.

Comment 5 Neil Swingler

2004-06-24 09:09:25 EDT

Nothing else in the log. It happened to me again and this time I kept running
with no adverse effects.

Comment 6 DJ Houghton

2004-06-24 09:20:42 EDT

The spec for Job#run says that the progress monitor is never null. I will check
with John this morning but I believe that a good change would be:

	...
	job.setProgressMonitor(new NullProgressMonitor());
	job.run(job.getProgressMonitor());
	...

Comment 7 DJ Houghton

2004-06-24 10:04:55 EDT

We have reviewed the code and that was a red herring. The #runNow method is only
called on ThreadJob instances which have no-ops for their #run methods. 

We will investigate further.

Comment 8 John Arthorne

2004-06-24 11:13:47 EDT

I can reproduce this in a test case. If a job change listener reschedules the
same job from within the scheduled() callback, we can get into a state where the
same job is being executed in two worker threads at once. When the first thread
finishes, it nulls the progress monitor, thus causing the NPE in the second
thread. Thus the NPE is a sympton of an extremely severe problem. Having the
same job running in multiple threads at once can potentially have many adverse
side-effects.

Comment 9 John Arthorne

2004-06-24 12:19:09 EDT

Created attachment 12785 [details]
Patch to JobManager.java

This is the proposed fix. The old code did this:

1) enter sync block
2) assert scheduling preconditions
3) exit sync block
4) call job change listeners
5) enter sync block
6) change job state and add to wait queue
7) exit sync block

Step 2) (assertion preconditions) is the guard code to ensure that the same job
is not scheduled twice. If this occurs, the second schedule() attempt is
ignored (this is the specified behaviour).

During 4), another thread (or the same thread in a recursive schedule() call),
could reschedule the identical job. Step 6) would then occur twice on the same
job, causing the NPE, or much worse problems.

The patch adds a new step to re-assert the preconditions after entering the
sync block a second time:

1) enter sync block
2) assert scheduling preconditions
3) exit sync block
4) call job change listeners
5) enter sync block
5a) NEW: assert scheduling preconditions
6) change job state and add to wait queue
7) exit sync block

Why check the preconditions twice? We want to avoid calling job change
listeners and telling them the job is scheduled if in fact it will be
discarded. This reduces notifications that would cause unnecessary flicker.
With the patched code, it is still possible for a listener to get the
scheduled() notification twice for the same job. This is not great, but much
better than actually running the job twice simultaneously.

Comment 10 John Arthorne

2004-06-24 12:19:41 EDT

We need to get the fix approval process started.

Comment 11 John Arthorne

2004-06-24 14:02:27 EDT

Created attachment 12795 [details]
Improved patch

Improved patch. This patch avoids the possibility of multiple scheduling
notifications. It adds a new internal state to indicate that a job has been
scheduled but not yet added to the wait queue. Thus a second call to schedule()
can distinguish this case properly and immediately bail out.

Comment 12 John Arthorne

2004-06-24 14:05:35 EDT

Fix released. Fix reviewed by Jean-Michel, DJ, McQ, MVM, and Kevin H.

Comment 13 John Arthorne

2004-06-24 14:12:55 EDT

It is quite likely that the following bugs are duplicates of this one:

bug 57656
bug 63635
bug 65415
bug 68240