Bug 312817 - [Launch] Stack frames don't show up in Debug View for a DSF-GDB debug session
Summary: [Launch] Stack frames don't show up in Debug View for a DSF-GDB debug session
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-debug-dsf-gdb (show other bugs)
Version: 7.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-13 12:36 EDT by Navid Mehregani CLA
Modified: 2020-09-04 15:24 EDT (History)
6 users (show)

See Also:


Attachments
Debug view (409.97 KB, image/bmp)
2010-05-13 12:36 EDT, Navid Mehregani CLA
no flags Details
Error Snapshot (24.35 KB, image/jpeg)
2010-05-13 12:37 EDT, Navid Mehregani CLA
no flags Details
gdb traces (14.50 KB, text/plain)
2010-05-14 07:44 EDT, Axel Mueller CLA
no flags Details
Patch to re-add flushing of cache of list of threads upon suspend in view model (see bug 310992). (1.38 KB, patch)
2010-05-18 01:17 EDT, Pawel Piech CLA
no flags Details | Diff
Prototype of fix (3.13 KB, patch)
2010-05-18 06:27 EDT, Marc Khouzam CLA
marc.khouzam: iplog-
Details | Diff
Fix using service caching (3.36 KB, patch)
2010-05-18 11:57 EDT, Marc Khouzam CLA
marc.khouzam: iplog-
Details | Diff
Avoid caching failed updates (partial fix) (1.85 KB, patch)
2010-05-19 04:47 EDT, Anton Leherbauer CLA
aleherb+eclipse: iplog-
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Navid Mehregani CLA 2010-05-13 12:36:04 EDT
Created attachment 168412 [details]
Debug view

Reproduced on M7 build.

This is an intermittent problem.  
When a DSF-GDB debug session is started, in some cases the stack frames do not appear in Debug view (see attached snapshot).  Clicking on Refresh button usually resolves the problem.  If I terminate the session without refreshing, workbench hangs.  However, this could be a separate issue (see bug#312813)

I also got the following error message when I started a session.  This only happened once:
"An internal error occurred during: "Launching Hello_GDB.exe"
org.eclipse.cdt.dsf.gdb.launching.GDBDebugger cannot be cast to org.eclipse.cdt.debug.core.ICDebugger.
 
Stack trace:

java.lang.ClassCastException: org.eclipse.cdt.dsf.gdb.launching.GDBDebugger cannot be cast to org.eclipse.cdt.debug.core.ICDebugger
            at org.eclipse.cdt.debug.internal.core.DebugConfiguration.createDebugger(DebugConfiguration.java:61)
            at org.eclipse.cdt.launch.internal.LocalCDILaunchDelegate.createCDISession(LocalCDILaunchDelegate.java:466)
            at org.eclipse.cdt.launch.internal.LocalCDILaunchDelegate.launchLocalDebugSession(LocalCDILaunchDelegate.java:145)
            at org.eclipse.cdt.launch.internal.LocalCDILaunchDelegate.launchDebugger(LocalCDILaunchDelegate.java:112)
            at org.eclipse.cdt.launch.internal.LocalCDILaunchDelegate.launch(LocalCDILaunchDelegate.java:72)
            at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:853)
            at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:702)
            at org.eclipse.debug.internal.ui.DebugUIPlugin.buildAndLaunch(DebugUIPlugin.java:923)
            at org.eclipse.debug.internal.ui.DebugUIPlugin$8.run(DebugUIPlugin.java:1126)
            at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Comment 1 Navid Mehregani CLA 2010-05-13 12:37:31 EDT
Created attachment 168413 [details]
Error Snapshot
Comment 2 Navid Mehregani CLA 2010-05-13 12:38:00 EDT
Related discussion on cdt-dev: http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg18803.html
Comment 3 Axel Mueller CLA 2010-05-14 07:44:52 EDT
Created attachment 168526 [details]
gdb traces

I encounter the same problem. Most of the time when I start a DSF-GDB session the Stack and the Variables View is empty. 
gdb traces are attached.

After pressing F8 the debug session continues normally.

I am using M7 with CVS updates on Linux x86_64, GDB 7.0.
Comment 4 Pawel Piech CLA 2010-05-14 13:25:04 EDT
Could this bug be related to this issue reported by Dobrin?
http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg18740.html
Comment 5 Marc Khouzam CLA 2010-05-14 22:28:38 EDT
(In reply to comment #0)

About the exception:

> I also got the following error message when I started a session.  This only
> happened once:
> "An internal error occurred during: "Launching Hello_GDB.exe"
> org.eclipse.cdt.dsf.gdb.launching.GDBDebugger cannot be cast to
> org.eclipse.cdt.debug.core.ICDebugger.
> 
> Stack trace:
> 
> java.lang.ClassCastException: org.eclipse.cdt.dsf.gdb.launching.GDBDebugger
> cannot be cast to org.eclipse.cdt.debug.core.ICDebugger
[...]

First, let me mention that the code that may have triggered this bug has drastically changed after M7 as part of bug 281970.  However, after thinking about it for a long time, I came up with a theory that may explain this, and that could still happen with the new code.

I have opened Bug 312997 about that issue.
Comment 6 Axel Mueller CLA 2010-05-16 08:15:56 EDT
(In reply to comment #4)
> Could this bug be related to this issue reported by Dobrin?
> http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg18740.html
No. The workaround suggested there (replace VMViewerUpdate.java with an old revision) does not help.
To be on the safe side I just deleted my worksapce completely and created a new C++ Hello World project using the project wizard. Debugging the application works flawless for the first 3 or 4 times (hitting debug, stop, debug, stop,...). Then debugging stops working correctly (as mentioned in comment #3).
Comment 7 Marc Khouzam CLA 2010-05-17 22:15:09 EDT
I made a bit of progress here and maybe someone with better knowledge of the view model can make more progress.

I ran on Windows 7 with MinGw 6.8, (although we know it happens on Linux and with GDB 7.0 also).

I was able to trigger the bug 100% of the time by doing:

1- add a sleep of 2 seconds at the beginning of your pogram
2- do _not_ stop at main
3- add a breakpoint after the sleep

The problem seems to be that AbstractThreadVMNode#updateElementsInSessionThread() is called too early, and because the target is unavailable, it somehow cannot recover.

Here are some GDB traces of the failed case, that I added comments too:

153,764 [MI]  7-exec-run
153,786 [MI]  7^running
153,786 [MI]  (gdb) 
153,805 [CHE][N/A] -thread-list-ids   <==== This one seems acceptable as I see in the success case too
154,004 [MI]  ~"[New thread 6876.0x1774]\n"
154,094 [CHE][N/A] -thread-list-ids   <=== Why is AbstractThreadVMNode triggering again?
154,309 [CHE][N/A] -thread-list-ids   <=== and again?

== notice the two second delay that allow us to notice the two above failed calls ===

156,089 [MI]  7*stopped,reason="breakpoint-hit",bkptno="1",thread-id="1",frame={addr="0x004014bb",fu\
nc="main",args=[],file="..\\src\\TestDSFGDB.cpp",fullname="C:/Users/Marc/runtime-DSFGDB/TestDSFGDB/.\
settings/..\\src\\TestDSFGDB.cpp",line="107"}
156,090 [MI]  (gdb) 

==== Now that we got a *stopped event, it makes sense that AbstractThreadVMNode would requests the threads  =========
156,319 [MI]  8-thread-list-ids
156,324 [MI]  8^done,thread-ids={thread-id="1"},number-of-threads="1"
156,324 [MI]  (gdb) 

=== Apparently, the two failure preceding this call were too much to recover from ===


Here is a trace that works:

840,156 [MI]  7-exec-run
840,165 [MI]  7^running
840,165 [MI]  (gdb) 
840,277 [CHE][N/A] -thread-list-ids <=== One failed called

840,346 [MI]  ~"[New thread 7132.0x197c]\n"
840,431 [MI]  7*stopped,thread-id="1",frame={addr="0x0040147c",func="main",args=[],file="..\\src\\Te\
stDSFGDB.cpp",fullname="C:/Users/Marc/runtime-DSFGDB/TestDSFGDB/.settings/..\\src\\TestDSFGDB.cpp",l\
ine="90"}
840,432 [MI]  (gdb) 
840,842 [MI]  8-thread-list-ids   <=== second call works
840,848 [MI]  8^done,thread-ids={thread-id="1"},number-of-threads="1"
840,848 [MI]  (gdb) 
840,851 [CHE][SNT] -thread-list-ids  <=== third call works.
840,868 [MI]  9-thread-select 1

We can see that if the second call to -thread-list-ids (from AbstractThreadVMNode#updateElementsInSessionThread()) works, everything is fine, but if it fails, we are dead in the water.
Comment 8 Marc Khouzam CLA 2010-05-17 22:32:30 EDT
I'm sure I'm missing something, but when I revert the fix for Bug 291628, this bug kind of goes away.  I've falsely accused Bug 291628 before, so please take this comment with a grain of salt ;-)

With my 2 seconds sleep and not stopping on main, the problem happens all the time.  When removing the fix of Bug 291628, after 2 seconds, the thread list can be expanded in the debug view, although it does not happen automatically.

The automatic expansion (or lack of) may be a separate issue as mentioned in comment #4.
Comment 9 Pawel Piech CLA 2010-05-18 01:14:49 EDT
(In reply to comment #8)
> I'm sure I'm missing something, but when I revert the fix for Bug 291628, this
> bug kind of goes away.  I've falsely accused Bug 291628 before, so please take
> this comment with a grain of salt ;-)
> 
> With my 2 seconds sleep and not stopping on main, the problem happens all the
> time.  When removing the fix of Bug 291628, after 2 seconds, the thread list
> can be expanded in the debug view, although it does not happen automatically.
> 
> The automatic expansion (or lack of) may be a separate issue as mentioned in
> comment #4.

Thank you for providing a reliable way to reproduce.  I see the following trace in console when I enable Delta tracing:

------------------------------------------------------
---> Updating content in DV after install.  Note that the 
...

RECEIVED DELTA: Model Delta Start
	Element: org.eclipse.debug.internal.core.LaunchManager@89c0a9
		Flags: NO_CHANGE
		Index: 0 Child Count: 1
		Element: org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11
			Flags: NO_CHANGE
			Index: 0 Child Count: -1
			Element: gdb[4].proc[].threadGroup[]
				Flags: CONTENT | 
				Index: 0 Child Count: -1
Model Delta End

---> Viewer requests whether there are any threads.  And the model responds: no.
updateHasChildren(gdb[4].proc[].threadGroup[]
setHasChildren(gdb[4].proc[].threadGroup[] >> false

replace(org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11, modelIndex: 1 viewIndex: 1, org.eclipse.cdt.dsf.gdb.launching.GDBProcess@6c13ee)
updateHasChildren(org.eclipse.cdt.dsf.gdb.launching.GDBProcess@6c13ee
replace(org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11, modelIndex: 2 viewIndex: 2, org.eclipse.debug.core.model.RuntimeProcess@f3c353)
updateHasChildren(org.eclipse.debug.core.model.RuntimeProcess@f3c353
setHasChildren(org.eclipse.cdt.dsf.gdb.launching.GDBProcess@6c13ee >> false
setHasChildren(org.eclipse.debug.core.model.RuntimeProcess@f3c353 >> false
RECEIVED DELTA: Model Delta Start
	Element: org.eclipse.debug.internal.core.LaunchManager@89c0a9
		Flags: NO_CHANGE
		Index: 0 Child Count: 1
		Element: org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11
			Flags: EXPAND | SELECT | 
			Index: 0 Child Count: 3
			Element: gdb[4].proc[].threadGroup[]
				Flags: EXPAND | SELECT | 
				Index: 0 Child Count: 0
Model Delta End

[expand] setChildCount(org.eclipse.debug.internal.core.LaunchManager@89c0a9, (model) 1 (view) 1
[expand] replace(org.eclipse.debug.internal.core.LaunchManager@89c0a9, (model) 0 (view) 0, org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11
[expand] setChildCount(org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11, (model) 3 (view) 3
[select] setChildCount(org.eclipse.debug.internal.core.LaunchManager@89c0a9, (model) 1 (view) 1
[select] replace(org.eclipse.debug.internal.core.LaunchManager@89c0a9, (model) 0 (view) 0, org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11
[expand] replace(org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11, (model) 0 (view) 0, gdb[4].proc[].threadGroup[]
[select] setChildCount(org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11, (model) 3 (view) 3
[select] replace(org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11, (model) 0 (view) 0, gdb[4].proc[].threadGroup[]

---> The following is the delta for the suspended event.  It's bug 310992 which removed flushing the list of threads upon container suspended event.  Because of it calculating this delta takes the list of threads from cache, which is none.

RECEIVED DELTA: Model Delta Start
	Element: org.eclipse.debug.internal.core.LaunchManager@89c0a9
		Flags: NO_CHANGE
		Index: 0 Child Count: 1
		Element: org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11
			Flags: NO_CHANGE
			Index: 0 Child Count: 3
			Element: gdb[4].proc[].threadGroup[]
				Flags: NO_CHANGE
				Index: 0 Child Count: 0
Model Delta End

---> This delta is in response to the FullStackRefreshEvent, it clears the cache for list of threads, but there is no CONTENT event for the container.  Even if there was, the select and expand delta was already lost above.
RECEIVED DELTA: Model Delta Start
	Element: org.eclipse.debug.internal.core.LaunchManager@89c0a9
		Flags: NO_CHANGE
		Index: 0 Child Count: 1
		Element: org.eclipse.cdt.dsf.gdb.launching.GdbLaunch@c03c11
			Flags: NO_CHANGE
			Index: 0 Child Count: -1
			Element: gdb[4].proc[].threadGroup[]
				Flags: NO_CHANGE
				Index: 0 Child Count: -1
				Element: (gdb[4].proc[].threadGroup[],gdb[4].proc[].OSthread[1]).thread[1]
					Flags: CONTENT | 
					Index: -1 Child Count: -1
Model Delta End


-----------------------------------------------

We could restore the automatic flushing of the list of threads upon suspended event (patch to follow), but this could be inefficient.  The alternative would be to have the run control service generate a threads changed event.
Comment 10 Pawel Piech CLA 2010-05-18 01:17:33 EDT
Created attachment 168858 [details]
Patch to re-add flushing of cache of list of threads upon suspend in view model (see bug 310992).
Comment 11 Anton Leherbauer CLA 2010-05-18 04:10:14 EDT
(In reply to comment #9)
> We could restore the automatic flushing of the list of threads upon suspended
> event (patch to follow), but this could be inefficient.  The alternative would
> be to have the run control service generate a threads changed event.

I believe the correct fix is to trigger a IStartedDMEvent when a thread comes into play which was not reported before.
Comment 12 Marc Khouzam CLA 2010-05-18 06:01:01 EDT
(In reply to comment #11)
> I believe the correct fix is to trigger a IStartedDMEvent when a thread comes
> into play which was not reported before.

We send this event when the thread is reported (see traces below), which is while the target is unavailable. 

Maybe the service should be able to return the list of threads even if the debugger is unavailable.  We can do this since we get the [New thread} event anyway.  We would have to keep an internal cache of threads in the service.  Exactly like we had to do for Bug 303503, for the list of processes.  I'll give it a try.


456,600 [MI]  7-exec-run
456,604 [MI]  7^running
456,604 [MI]  (gdb) 
456,611 [CHE][N/A] -thread-list-ids
456,892 [MI]  ~"[New thread 7068.0x11f0]\n"

 ===== Sending IStartedDMEvent for thread (gdb[4].proc[].threadGroup[],gdb[4].proc[].OSthread[1]).thread[1] ========


456,966 [MI]  &"Error: dll starting at 0x772c0000 not found.\n"
456,968 [MI]  &"Error: dll starting at 0x76010000 not found.\n"
456,969 [MI]  &"Error: dll starting at 0x772c0000 not found.\n"
456,971 [MI]  &"Error: dll starting at 0x771c0000 not found.\n"
456,988 [CHE][N/A] -thread-list-ids
456,989 [CHE][N/A] -thread-list-ids
457,171 [CHE][N/A] -thread-list-ids
459,016 [MI]  7*stopped,reason="breakpoint-hit",bkptno="1",thread-id="1",frame={addr="0x004014bb",fu\
nc="main",args=[],file="..\\src\\TestDSFGDB.cpp",fullname="C:/Users/Marc/runtime-DSFGDB/TestDSFGDB/.\
settings/..\\src\\TestDSFGDB.cpp",line="107"}
459,017 [MI]  (gdb) 
459,335 [MI]  8-thread-list-ids
459,340 [MI]  8^done,thread-ids={thread-id="1"},number-of-threads="1"
459,340 [MI]  (gdb)
Comment 13 Marc Khouzam CLA 2010-05-18 06:27:27 EDT
Created attachment 168893 [details]
Prototype of fix

This patch seems to be fixing things.  I still have to

- update getExecutionData()
- clear and fill the new cache when we get a proper answer from GDB
- check if we need this fix in GDBProcesses_7_0

The kids are waking up... gotta go :-)
Comment 14 Marc Khouzam CLA 2010-05-18 11:57:38 EDT
Created attachment 168960 [details]
Fix using service caching

This patch fixes the biggest symptoms of the problem. There are two problems stills:

1- The container still shows as Running (the icon of the DV).  I'm guessing the view cache is still thinking the container is running; but since there was an ISuspendedDMEvent, shouldn't the cache know better?

2- On my windows box (slower), the running thread that now displays is not always expanded or selected; I've had to open the container by hand sometimes.  On my Linux (faster) I don't see this.

Note that the same solution should also be done for GDB >= 7.0, but before getting to that I would like to understand if this is the right solution.

I don't understand why the view cache is caching the fact that a request to the service failed.  When the view asks for the thread list and the service returns an Error, how come we need a new IStartedDMEvent to clear the cache?

I'm worried that this could happen to other situations where the backend is not available.  When GDB is running (all-stop mode), the majority of view requests will fail. If the cache stores all these failures, couldn't we end up with problems for other requests?
Comment 15 Pawel Piech CLA 2010-05-18 12:40:27 EDT
(In reply to comment #14)
> I'm worried that this could happen to other situations where the backend is not
> available.  When GDB is running (all-stop mode), the majority of view requests
> will fail. If the cache stores all these failures, couldn't we end up with
> problems for other requests?
Interesting.  I didn't realize that the request to GDB was failing.  

The reason why error results are not cached by the VM is because the the view itself doesn't handle errors from the model very well.  The view changes to an error pane (large gray rectangle instead of the tree with the text of the error.  To avoid this, the VM simply returns no data in response to the error from the model, and the cache saves that empty set of data.  However, this logic fails in this case, as the empty data set from the model should not be cached... I think.  

The counter-point is that if GDB does fail certain requests for data while running, then the view cache should be cleared upon suspended event more aggressively.  This will result in loss of optimization for GDB, but that's just the reality of using GDB.  OTOH, EDC and TI are reporting seeing this problem as well, are those also due to caching failed requests for data?  BTW, we could accomplish this by having a customized version of cache policy.

> This patch fixes the biggest symptoms of the problem. There are two problems
> stills:
> 
> 1- The container still shows as Running (the icon of the DV).  I'm guessing the
> view cache is still thinking the container is running; but since there was an
> ISuspendedDMEvent, shouldn't the cache know better?
Good point.  Yes the VM cache should clear the state of the container upon suspend and the list of children upon FullStackRefershEvent.


> 2- On my windows box (slower), the running thread that now displays is not
> always expanded or selected; I've had to open the container by hand sometimes. 
> On my Linux (faster) I don't see this.
Could you attach a trace of this failure?  Enable:
org.eclipse.debug.ui/debug=true
org.eclipse.debug.ui/debug/viewers/delta=true
org.eclipse.debug.ui/debug/viewers/contentProvider=true
org.eclipse.debug.ui/debug/viewers/viewId=org.eclipse.debug.ui.DebugView
Comment 16 Marc Khouzam CLA 2010-05-18 14:29:44 EDT
(In reply to comment #15)
> (In reply to comment #14)
> > I'm worried that this could happen to other situations where the backend is not
> > available.  When GDB is running (all-stop mode), the majority of view requests
> > will fail. If the cache stores all these failures, couldn't we end up with
> > problems for other requests?
> Interesting.  I didn't realize that the request to GDB was failing. 

Sorry, the request don't fail from GDB, but get rejected by the CommandCache because of targetUnavailable.
 
> The reason why error results are not cached by the VM is because the the view
> itself doesn't handle errors from the model very well.  The view changes to an
> error pane (large gray rectangle instead of the tree with the text of the
> error.  To avoid this, the VM simply returns no data in response to the error
> from the model, and the cache saves that empty set of data.  However, this
> logic fails in this case, as the empty data set from the model should not be
> cached... I think.

Ok, that makes sense.
Can we make it so that the cache only stores empty data resulting of a success?  Or, more appropriately, empty data should not be cached if it was triggered by an error.

Maybe that is a better fix than what I posted?

> Good point.  Yes the VM cache should clear the state of the container upon
> suspend and the list of children upon FullStackRefershEvent.

In which classes should this happen?  I'll have a look.

> Could you attach a trace of this failure?  Enable:

I'll give it a try from home where I have my Windows box.
Comment 17 Axel Mueller CLA 2010-05-18 15:59:34 EDT
(In reply to comment #14)
> Created an attachment (id=168960) [details]
> Fix using service caching
That fixes my problem (see comment #3 and#6).
Comment 18 Anton Leherbauer CLA 2010-05-19 04:47:33 EDT
Created attachment 169067 [details]
Avoid caching failed updates (partial fix)

I have debugged the problem and the root cause is indeed that failed updates might get cached.  The reason in this specific case is that in AbstractThreadVMNode.updateElementsInSessionThread() the !isSuccess() branch does not set the status in the update.  That's why the request monitor from the AbstractCachingVMProvider thinks this was a successful update and caches the empty list.
Attached patch is the minimal fix which works for me.  The other change in DefaultVMModelProxyStrategy was necessary as I started to run into NPEs.

This seems to be a general problem of how updates are handled in the failure case.  We should check all code locations with similar patterns.
Comment 19 Marc Khouzam CLA 2010-05-19 06:21:17 EDT
(In reply to comment #18)
> Created an attachment (id=169067) [details]
> Avoid caching failed updates (partial fix)

Thanks Toni!
This fix works for me

> I have debugged the problem and the root cause is indeed that failed updates
> might get cached.  The reason in this specific case is that in
> AbstractThreadVMNode.updateElementsInSessionThread() the !isSuccess() branch
> does not set the status in the update.  That's why the request monitor from the
> AbstractCachingVMProvider thinks this was a successful update and caches the
> empty list.

But does that indicate to the view that there is an error?  I'm just concerned about what Pawel put in comment #15:
"The reason why error results are not cached by the VM is because the the view
itself doesn't handle errors from the model very well"

> Attached patch is the minimal fix which works for me.  The other change in
> DefaultVMModelProxyStrategy was necessary as I started to run into NPEs.

I think this is a better solution that the service fix that I posted.

I'm concerned about the DefaultVMModelProxyStrategy.  I don't know that class, so the concern is probably unfounded.  Originally, calling crm.done() meant that we would change error scenarios into successes (which may or may not be wanted here); with the change to calling super.handleCompleted, if any single error is encountered, the entire crm will be marked as failed.  That seems like a big change.  Is that an ok change?

> This seems to be a general problem of how updates are handled in the failure
> case.  We should check all code locations with similar patterns.

I see that you opened 313492 about this.
Comment 20 Marc Khouzam CLA 2010-05-19 06:23:46 EDT
(In reply to comment #14)

With Toni's patch:

> 1- The container still shows as Running (the icon of the DV).  I'm guessing the
> view cache is still thinking the container is running; but since there was an
> ISuspendedDMEvent, shouldn't the cache know better?

This problem is still there.

> 2- On my windows box (slower), the running thread that now displays is not
> always expanded or selected; I've had to open the container by hand sometimes. 
> On my Linux (faster) I don't see this.

This is gone since the list of threads is only created when the breakpoint is hit.
Comment 21 Anton Leherbauer CLA 2010-05-19 07:50:46 EDT
(In reply to comment #19)
> I'm concerned about the DefaultVMModelProxyStrategy.  I don't know that class,
> so the concern is probably unfounded.  Originally, calling crm.done() meant
> that we would change error scenarios into successes (which may or may not be
> wanted here); with the change to calling super.handleCompleted, if any single
> error is encountered, the entire crm will be marked as failed.  That seems like
> a big change.  Is that an ok change?

Good catch.  I am also not sure what the consequences might be.  We should use a more conservative fix.

In reply to comment #20)
> (In reply to comment #14)
> 
> With Toni's patch:
> 
> > 1- The container still shows as Running (the icon of the DV).  I'm guessing the
> > view cache is still thinking the container is running; but since there was an
> > ISuspendedDMEvent, shouldn't the cache know better?
> 
> This problem is still there.

We need to flush the container properties to solve this (Pawel mentioned this already).  This requires a new cache update flag to do that.
Comment 22 Pawel Piech CLA 2010-05-19 13:34:46 EDT
I think I'm the minority view here, but I don't think there's anything wrong with caching results of an error if the error is due to INVALID_STATE.  That's because once the state changes, the model should send an event, view cache should get cleared and then valid data should be fetched. 

What gets us in trouble is the optimization that we've put in for stepping.  This optimization tries to delay the refreshing of everything but the top stack frame, and then refresh everything else that needs to be updated after a brief delay (unless another step operation is started before then).  Unfortunately, if we don't flush information about the list of children of threads and containers, we are not always able to calculate the correct delta to expand the thread stack and select the top stack frame.  IMO at this point in the release cycle, we should just back out bug 310992 and sort out how to fix the stepping optimization after 7.0 (in 7.1 perhaps).
Comment 23 Marc Khouzam CLA 2010-05-19 14:59:22 EDT
(In reply to comment #22)
> I think I'm the minority view here, but I don't think there's anything wrong
> with caching results of an error if the error is due to INVALID_STATE.  That's
> because once the state changes, the model should send an event, view cache
> should get cleared and then valid data should be fetched. 
> 
> What gets us in trouble is the optimization that we've put in for stepping. 
> This optimization tries to delay the refreshing of everything but the top stack
> frame, and then refresh everything else that needs to be updated after a brief
> delay (unless another step operation is started before then).  Unfortunately,
> if we don't flush information about the list of children of threads and
> containers, we are not always able to calculate the correct delta to expand the
> thread stack and select the top stack frame.  IMO at this point in the release
> cycle, we should just back out bug 310992 and sort out how to fix the stepping
> optimization after 7.0 (in 7.1 perhaps).

I can't speak for the caching issue, as I don't understand the details.

I do agree that we should take the proper time for such a fix so bug 313492 should wait for the next CDT release.

So, +1 to back out bug 310992 because we need to fix this very soon.
Comment 24 Marc Khouzam CLA 2010-05-19 15:38:20 EDT
(In reply to comment #22)
> we should just back out bug 310992 and sort out how to fix the stepping
> optimization after 7.0 (in 7.1 perhaps).

I've tested with patch "Patch to re-add flushing of cache of list of threads upon suspend in view model (see bug 310992)."  And everything is back to normal.
Comment 25 Anton Leherbauer CLA 2010-05-20 02:50:29 EDT
Ok, I did not know that it is intentional to cache updates from failed requests.
I do agree that the change for 310992 needs to be backed out.
Comment 26 Anton Leherbauer CLA 2010-05-20 03:21:03 EDT
I backed out the changes, but there is still the same issue when I set the Threads Update Mode to Manual.
Comment 27 Marc Khouzam CLA 2010-05-20 08:44:55 EDT
(In reply to comment #26)
> I backed out the changes, but there is still the same issue when I set the
> Threads Update Mode to Manual.

Thanks Toni.  Things work well again.
I do see the problem in manual Threads Update mode.  I think it is a rare case (it has been there for a long time) and we have the refresh button to clean things up.  I think we should fix it in 7.0.1 if possible.
Comment 29 Anton Leherbauer CLA 2010-08-24 09:01:00 EDT
(In reply to comment #27)
> I do see the problem in manual Threads Update mode.  I think it is a rare case
> (it has been there for a long time) and we have the refresh button to clean
> things up.  I think we should fix it in 7.0.1 if possible.

I just retried this with Galileo and the problem in manual update mode does not occur.  This might be caused by some subtle changes to the delta handling in the platform.