Bug 160111 - A blocking RSE job "hangs" the entire RSE system
Summary: A blocking RSE job "hangs" the entire RSE system
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P2 major (vote)
Target Milestone: 1.0   Edit
Assignee: David McKnight CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-06 21:31 EDT by Michael Scharf CLA
Modified: 2008-08-13 13:07 EDT (History)
1 user (show)

See Also:


Attachments
Thread dump of the blocking non cancelable job (34.81 KB, text/plain)
2006-10-06 21:33 EDT, Michael Scharf CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Scharf CLA 2006-10-06 21:31:59 EDT
I don't know exactly what I did, but suddenly I have a job called "Resolve filter stings (RSE Subsystem Operation:)" that does not want to end. Cancel did not help. Other RSE job (even on other subsystems) are waiting. A wait cursor is showed on all views.

See: http://scharf.gr/eclipse/rse/problems/blocking-job/
Comment 1 Michael Scharf CLA 2006-10-06 21:33:04 EDT
Created attachment 51592 [details]
Thread dump of the blocking non cancelable job

Thread dump of the blocking non cancelable job
Comment 2 Michael Scharf CLA 2006-10-06 21:51:24 EDT
This looks dangerous to me in SubSystem.scheduleJob:
            while (!job.hasStarted())
            {
				while (display!=null && display.readAndDispatch()) {
					//Process everything on event queue
				}
                if (!job.hasStarted()) Thread.sleep(200);
            }
            
 Because job.hasStarted() seems not to deal with cancellation...
 I also think it is dangerous to have a secondary event loop here. I would not be surprised if this is the cause of many threading and UI blocking related bugs. 
 
SubSystem.scheduleJob is called from the main thread (the UI thread) and is essentially blocking! The idea of jobs is to run in parallel to the UI thread and *not* to block the UI thread.

We had some severe problems in our products with secondary event loops. They are *very* bad...

Now I also understand the deep stack-trace of bug 160084
Comment 3 David McKnight CLA 2006-10-10 14:02:36 EDT
In the stack provided, the main thread uses SubSystem.scheduleJob().  That approach is a hack to be used in the old way of RSE queries - where deferred queries were not supported (there's a preference for this).  The SystemTableTreeView (the one the monitor view uses) was still using the old approach to get at children, which is problemmatic.  I've added support for deferred queries to that view now so that we avoid the main thread scheduleJob problem.  I know that this won't address all cases of scheduleJob() but at least it will deal with the case described in the stack.

I'll leave this open so that it can be used on other problemmatic scheduleJob() scenarios if they're encountered.
Comment 4 David McKnight CLA 2006-10-11 15:46:36 EDT
This should be fixed now.  All resolveFilterStrings methods that are called directly on main thread should no longer be using jobs.  Also, I've taken out the readAndDispatch() that is used in the DStoreStatusMonitor - in normal cases, the waitForUpdate() should not be called on the main thread anyway.

SubSystem.scheduleJob still exists although the important cases should be avoiding it now (disconnect, connect and the resolveFilterStrings methods).

Reopen if these scenarios continue to occur with the fix.
Comment 5 Martin Oberhuber CLA 2008-08-13 13:07:53 EDT
[target cleanup] 1.0 RC2 was the original target milestone for this bug