Bug 423205 - Eclipse UI freezes completely when debugging with TCF and the agent gets suspended
Summary: Eclipse UI freezes completely when debugging with TCF and the agent gets susp...
Status: NEW
Alias: None
Product: TCF
Classification: Tools
Component: Debug (show other bugs)
Version: 1.1   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Eugene Tarassov CLA
URL:
Whiteboard:
Keywords:
Depends on: 440343
Blocks:
  Show dependency tree
 
Reported: 2013-12-04 12:38 EST by Martin Oberhuber CLA
Modified: 2015-06-04 13:41 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2013-12-04 12:38:03 EST
+++ This bug was initially created as a clone of Bug #405444 +++

I'd like to follow-up on the discussion from bug #405444 since we see more and more situations where we are debugging a system through a TCF agent (call it the run-mode agent) and that system gets suspended. For example:

1. Run-mode agent for launching processes, but then a JTAG debugger suspends
2. Run-mode agent inside a Linux KVM system but then the hypervisor suspends
3. Run-mode agent inside a Simulator but then the simulator suspends

The point is that we see more and more debug scenarios where _both_ a run-mode agent inside the system (for tracing, OS awareness, process launch...) as well as a stop-mode agent outside the system are desirable. The whole Eclipse UI freezing up just because the debug agent gets suspended (maybe accidentally) is generally not acceptable to our clients.

In https://bugs.eclipse.org/bugs/show_bug.cgi?id=405444#c2 I had suggested a solution on a low layer, where either a heartbeat/timeout, or an external event would cancel all outstanding commands. I could imagine this as a mix-in to all Service Proxies, which would ensure that a proper TIMEOUT or CANCELED error is sent for any outstanding request. Advantage of this approach would be that it works for any client that's too synchronous, and not just Debug. An "AgentSuspendService" could be responsible for receiving and processing suspend events in its respective domain.

As an alternative, https://bugs.eclipse.org/bugs/show_bug.cgi?id=405444#c5 said that there are only few synchronous interfaces in Platform/Debug, so perhaps the size of the problem is limited and the undesired lock-up of the Display Thread (or Event Dispatch Thread) can be avoided there. 

Thoughts and comments are welcome.
Comment 1 Martin Oberhuber CLA 2013-12-05 17:23:50 EST
I should add that if we want to shoot for more asynchronous interfaces in Platform/Debug, now is the time to do this.

API Freeze in Platform is 7-March-2014 for Luna, so December and January are the months where API work could reasonably be accepted by Platform. I can certainly help lobby for quality patches on the Platform PMC.
Comment 2 Eugene Tarassov CLA 2013-12-05 18:27:18 EST
(In reply to Martin Oberhuber from comment #0)

> In https://bugs.eclipse.org/bugs/show_bug.cgi?id=405444#c2 I had suggested a
> solution on a low layer, where either a heartbeat/timeout, or an external
> event would cancel all outstanding commands. I could imagine this as a
> mix-in to all Service Proxies, which would ensure that a proper TIMEOUT or
> CANCELED error is sent for any outstanding request. Advantage of this
> approach would be that it works for any client that's too synchronous, and
> not just Debug.

Disadvantage of this approach is that it would break every client in situation like slow network.

> An "AgentSuspendService" could be responsible for receiving
> and processing suspend events in its respective domain.

Not sure what you mean. When the agent is suspended, no service can receive or process anything.

> As an alternative,
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=405444#c5 said that there are
> only few synchronous interfaces in Platform/Debug, so perhaps the size of
> the problem is limited and the undesired lock-up of the Display Thread (or
> Event Dispatch Thread) can be avoided there.

I suggest to investigate and decide about blocking Platform/Debug interfaces on case by case basis. I don't believe a generic, one-size-fits-all solution exists.
Comment 3 Martin Oberhuber CLA 2013-12-06 01:17:31 EST
(In reply to Eugene Tarassov from comment #2)
> Not sure what you mean. When the agent is suspended, no service can receive
> or process anything.

The idea was that in many cases we would know upfront, before the agent is going to be suspended. In such a case we could tell the agent "please flush your event queue, mark all outstanding requests as CANCELED and don't accept any new requests".

Something similar could perhaps be done in a value-add in case the agent is already suspended.

> I suggest to investigate and decide about blocking Platform/Debug interfaces
> on case by case basis. I don't believe a generic, one-size-fits-all solution
> exists.

Can we create a list of blocking Platform/Debug interfaces ?
Comment 4 Eugene Tarassov CLA 2013-12-06 12:35:02 EST
(In reply to Martin Oberhuber from comment #3)
> (In reply to Eugene Tarassov from comment #2)
> > Not sure what you mean. When the agent is suspended, no service can receive
> > or process anything.
> 
> The idea was that in many cases we would know upfront, before the agent is
> going to be suspended. In such a case we could tell the agent "please flush
> your event queue, mark all outstanding requests as CANCELED and don't accept
> any new requests".

Client can just close communication channel, it does exactly that: marks all requests as CANCELED and don't accept any new requests.

> 
> Something similar could perhaps be done in a value-add in case the agent is
> already suspended.
> 
> > I suggest to investigate and decide about blocking Platform/Debug interfaces
> > on case by case basis. I don't believe a generic, one-size-fits-all solution
> > exists.
> 
> Can we create a list of blocking Platform/Debug interfaces ?

There are 150 synchronization points in the TCF code where a calling thread has to wait for TCF dispatch thread because of synchronous interface. In most cases Eclipse calls into the code using a Job, which is OK, but in some cases calling thread is UI thread. It is impossible to tell all cases of UI thread used for such calls. Eclipse threading is not documented, and it changes from version to version. I see only one option: investigate each case of UI lock-up and fix it one at a time.