160111 – A blocking RSE job "hangs" the entire RSE system

Bug 160111 - A blocking RSE job "hangs" the entire RSE system

Summary: A blocking RSE job "hangs" the entire RSE system

Status:	RESOLVED FIXED

Alias:	None

Product:	Target Management
Classification:	Tools
Component:	RSE (show other bugs)
Version:	unspecified
Hardware:	PC Windows XP

Importance:	P2 major (vote)
Target Milestone:	1.0
Assignee:	David McKnight
QA Contact:	Martin Oberhuber

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-10-06 21:31 EDT by Michael Scharf
Modified:	2008-08-13 13:07 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
Thread dump of the blocking non cancelable job (34.81 KB, text/plain) 2006-10-06 21:33 EDT, Michael Scharf	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Scharf

2006-10-06 21:31:59 EDT

I don't know exactly what I did, but suddenly I have a job called "Resolve filter stings (RSE Subsystem Operation:)" that does not want to end. Cancel did not help. Other RSE job (even on other subsystems) are waiting. A wait cursor is showed on all views.

See: http://scharf.gr/eclipse/rse/problems/blocking-job/

Comment 1 Michael Scharf

2006-10-06 21:33:04 EDT

Created attachment 51592 [details]
Thread dump of the blocking non cancelable job

Thread dump of the blocking non cancelable job

Comment 2 Michael Scharf

2006-10-06 21:51:24 EDT

This looks dangerous to me in SubSystem.scheduleJob:
            while (!job.hasStarted())
            {
				while (display!=null && display.readAndDispatch()) {
					//Process everything on event queue
				}
                if (!job.hasStarted()) Thread.sleep(200);
            }
            
 Because job.hasStarted() seems not to deal with cancellation...
 I also think it is dangerous to have a secondary event loop here. I would not be surprised if this is the cause of many threading and UI blocking related bugs. 
 
SubSystem.scheduleJob is called from the main thread (the UI thread) and is essentially blocking! The idea of jobs is to run in parallel to the UI thread and *not* to block the UI thread.

We had some severe problems in our products with secondary event loops. They are *very* bad...

Now I also understand the deep stack-trace of bug 160084

Comment 3 David McKnight

2006-10-10 14:02:36 EDT

In the stack provided, the main thread uses SubSystem.scheduleJob().  That approach is a hack to be used in the old way of RSE queries - where deferred queries were not supported (there's a preference for this).  The SystemTableTreeView (the one the monitor view uses) was still using the old approach to get at children, which is problemmatic.  I've added support for deferred queries to that view now so that we avoid the main thread scheduleJob problem.  I know that this won't address all cases of scheduleJob() but at least it will deal with the case described in the stack.

I'll leave this open so that it can be used on other problemmatic scheduleJob() scenarios if they're encountered.

Comment 4 David McKnight

2006-10-11 15:46:36 EDT

This should be fixed now.  All resolveFilterStrings methods that are called directly on main thread should no longer be using jobs.  Also, I've taken out the readAndDispatch() that is used in the DStoreStatusMonitor - in normal cases, the waitForUpdate() should not be called on the main thread anyway.

SubSystem.scheduleJob still exists although the important cases should be avoiding it now (disconnect, connect and the resolveFilterStrings methods).

Reopen if these scenarios continue to occur with the fix.

Comment 5 Martin Oberhuber

2008-08-13 13:07:53 EDT

[target cleanup] 1.0 RC2 was the original target milestone for this bug