Bug 464388 - Commit bb173cd6d46277f844c0e57934c00ce6730aee4b breaks SkipProlog logic.
Summary: Commit bb173cd6d46277f844c0e57934c00ce6730aee4b breaks SkipProlog logic.
Status: RESOLVED FIXED
Alias: None
Product: TCF
Classification: Tools
Component: Agent (show other bugs)
Version: 1.3   Edit
Hardware: PC Linux
: P3 major (vote)
Target Milestone: 1.3   Edit
Assignee: Project Inbox CLA
QA Contact: Eugene Tarassov CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-10 08:47 EDT by Frederic Leger CLA
Modified: 2015-06-16 03:17 EDT (History)
1 user (show)

See Also:


Attachments
Mthread application source (2.63 KB, text/x-csrc)
2015-04-15 11:02 EDT, Frederic Leger CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frederic Leger CLA 2015-04-10 08:47:33 EDT
After commit bb173cd6d46277f844c0e57934c00ce6730aee4b, it seems like the logic for breakpoints and the SkipProlog options is broken.

I run a 32-bit linux application, and try to set a breakpoint at the sharedCode symbol. Objdump says :

080486f9 <sharedCode>:
 80486f9:	55                   	push   %ebp
 80486fa:	89 e5                	mov    %esp,%ebp
 80486fc:	a1 44 a0 04 08       	mov    0x804a044,%eax
 8048701:	83 c0 01             	add    $0x1,%eax
 8048704:	a3 44 a0 04 08       	mov    %eax,0x804a044
 8048709:	5d                   	pop    %ebp
 804870a:	c3                   	ret    

So I would think that the prologue is 80486f9 and 80486fa

With the latest agent, wheter I specify "SkipProlog" or not, I have the following TCF log :

2015-04-10 14:04:44,644 : ---> C 18 Breakpoints getCapabilities null
2015-04-10 14:04:44,644 : <--- R 18 None {"ID":"","BreakpointType":true,"Location":true,"FileLine":true,"FileMapping":true,"IgnoreCount":true,"Condition":true,"ContextIds":true,"ContextNames":true,"ContextQuery":true,"StopGroup":true,"ClientData":true,"Temporary":true,"SkipPrologue":true}
2015-04-10 14:04:44,657 : ---> C 19 Breakpoints add {"StopGroup":["P24954"],"SkipPrologue":true,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}
2015-04-10 14:04:44,657 : <--- E None Breakpoints contextAdded [{"StopGroup":["P24954"],"SkipPrologue":true,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}]
2015-04-10 14:04:44,657 : <--- R 19 None
2015-04-10 14:04:44,657 : <--- E None Breakpoints status "wrdbg:1" {"Instances":[{"LocationContext":"P24954","HitCount":0,"Address":134514425,"Size":1,"BreakpointType":"Software"}]}

or

2015-04-10 14:06:29,921 : ---> C 18 Breakpoints getCapabilities null
2015-04-10 14:06:29,923 : <--- R 18 None {"ID":"","BreakpointType":true,"Location":true,"FileLine":true,"FileMapping":true,"IgnoreCount":true,"Condition":true,"ContextIds":true,"ContextNames":true,"ContextQuery":true,"StopGroup":true,"ClientData":true,"Temporary":true,"SkipPrologue":true}
2015-04-10 14:06:29,929 : ---> C 19 Breakpoints add {"StopGroup":["P25041"],"SkipPrologue":false,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}
2015-04-10 14:06:29,929 : <--- E None Breakpoints contextAdded [{"StopGroup":["P25041"],"SkipPrologue":false,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}]
2015-04-10 14:06:29,929 : <--- R 19 None
2015-04-10 14:06:29,930 : <--- E None Breakpoints status "wrdbg:1" {"Instances":[{"LocationContext":"P25041","HitCount":0,"Address":134514425,"Size":1,"BreakpointType":"Software"}]}

Breakpoint is planted at address 134514425, which is 80486f9 (address of the symbol itself).

If I checkout the previous version : 176d763e936db165561e71ef377fd0d2b09d03ea, then do the same tests, I have :

2015-04-10 14:23:25,014 : ---> C 18 Breakpoints getCapabilities null
2015-04-10 14:23:25,014 : <--- R 18 None {"ID":"","BreakpointType":true,"Location":true,"FileLine":true,"FileMapping":true,"IgnoreCount":true,"Condition":true,"ContextIds":true,"ContextNames":true,"ContextQuery":true,"StopGroup":true,"ClientData":true,"Temporary":true,"SkipPrologue":true}
2015-04-10 14:23:25,026 : ---> C 19 Breakpoints add {"StopGroup":["P25714"],"SkipPrologue":true,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}
2015-04-10 14:23:25,026 : <--- E None Breakpoints contextAdded [{"StopGroup":["P25714"],"SkipPrologue":true,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}]
2015-04-10 14:23:25,026 : <--- R 19 None
2015-04-10 14:23:25,027 : <--- E None Breakpoints status "wrdbg:1" {"Instances":[{"LocationContext":"P25714","HitCount":0,"Address":134514428,"Size":1,"BreakpointType":"Software"}]}

If I say "SkipPrologue":false, I have :

2015-04-10 14:21:53,463 : ---> C 19 Breakpoints getCapabilities null
2015-04-10 14:21:53,463 : <--- R 19 None {"ID":"","BreakpointType":true,"Location":true,"FileLine":true,"FileMapping":true,"IgnoreCount":true,"Condition":true,"ContextIds":true,"ContextNames":true,"ContextQuery":true,"StopGroup":true,"ClientData":true,"Temporary":true,"SkipPrologue":true}
2015-04-10 14:21:53,469 : ---> C 20 Breakpoints add {"StopGroup":["P25642"],"SkipPrologue":false,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}
2015-04-10 14:21:53,469 : <--- E None Breakpoints contextAdded [{"StopGroup":["P25642"],"SkipPrologue":false,"Enabled":true,"ID":"wrdbg:1","Location":"sharedCode"}]
2015-04-10 14:21:53,469 : <--- R 20 None
2015-04-10 14:21:53,469 : <--- E None Breakpoints status "wrdbg:1" {"Instances":[{"LocationContext":"P25642","HitCount":0,"Address":134514425,"Size":1,"BreakpointType":"Software"}]}

Hope this helps
Comment 1 Frederic Leger CLA 2015-04-10 10:27:38 EDT
According to Xavier, I must complete what I explained with the problem encountered once I try to hit that breakpoint :

With new paradigm, when BP is hit :

2015-04-10 10:56:40,744 : <--- E None Breakpoints status "wrdbg:2" {"Instances":[{"LocationContext":"P19726","HitCount":1,"Address":134514425,"Size":1,"BreakpointType":"Software"}]}
2015-04-10 10:56:40,745 : <--- E None RunControl contextSuspended "P19726.19726" 3078202416 "Step" {}
2015-04-10 10:56:40,746 : <--- E None RunControl contextSuspended "P19726.19731" 3078202416 "Step" {}
2015-04-10 10:56:40,746 : <--- E None RunControl contextSuspended "P19726.19730" 3078202416 "Step" {}
2015-04-10 10:56:40,753 : <--- E None RunControl contextSuspended "P19726.19729" 134514428 "Breakpoint" {"BPs":["wrdbg:2"]}

Why would the threads be in "Step" and not "Breakpoint" status ?

With old paradigm :

015-04-10 10:55:28,924 : <--- E None Breakpoints status "wrdbg:2" {"Instances":[{"LocationContext":"P19324","HitCount":2,"Address":134514428,"Size":1,"BreakpointType":"Software"}]}
2015-04-10 10:55:28,925 : <--- E None RunControl contextSuspended "P19324.19327" 134514428 "Breakpoint" {"Signal":5,"SignalName":"SIGTRAP","SignalDescription":"Trace/breakpoint trap","BPs":["wrdbg:2"]}
2015-04-10 10:55:28,940 : <--- E None RunControl contextSuspended "P19324.19328" 134514428 "Breakpoint" {"Signal":5,"SignalName":"SIGTRAP","SignalDescription":"Trace/breakpoint trap","BPs":["wrdbg:2"]}
2015-04-10 10:55:28,957 : <--- E None RunControl containerSuspended "P19324.19324" 3077985328 "Suspended" {"Signal":19,"SignalName":"SIGSTOP","SignalDescription":"Stop executing temporarily"} ["P19324.19324","P19324.19329"]
Comment 2 Eugene Tarassov CLA 2015-04-10 14:08:33 EDT
Skipping prologue when planting was bad idea. If, for example, first line of the function is "for(;;)" statement, the breakpoint would be hit on every iteration of the loop - not a correct behavior for a function breakpoint.

Currently, the agent plants breakpoint at the function entry, and executes "step over prologue" when the breakpoint is hit. This works very well. 

Reporting "Step" as suspend reason is not the best here, but "Step" is exactly what happens. I'll look into changing suspend reason to "Breakpoint".
Comment 3 Frederic Leger CLA 2015-04-13 03:06:26 EDT
Hmmm ... when I set a breakpoint after the prologue and wait for the breakpoint to be hit, "Step" is not at all what happens. I happens with this implementation.

To me, if I want to skip the prologue, the address returned by the breakpoint set should be the address after the prologue. If it is a "for (;;)", then ok, this is hit everytime in the loop.

My understanding of "set a breakpoint at function x, and skip prologue" is "set breakpoint at the first line of C code of function x".

At least that's what gdb does.
Comment 4 xavier pouyollon CLA 2015-04-13 12:18:11 EDT
Hi Eugene,

Concerning sharedCode it is common code run by different threads:

2015-04-10 10:56:40,745 : <--- E None RunControl contextSuspended "P19726.19726" 3078202416 "Step" {}
2015-04-10 10:56:40,746 : <--- E None RunControl contextSuspended "P19726.19731" 3078202416 "Step" {}
2015-04-10 10:56:40,746 : <--- E None RunControl contextSuspended "P19726.19730" 3078202416 "Step" {}
2015-04-10 10:56:40,753 : <--- E None RunControl contextSuspended "P19726.19729" 
134514428 "Breakpoint" {"BPs":["wrdbg:2"]}

> Reporting "Step" as suspend reason is not the best here, but "Step" is exactly what happens. I'll look into changing suspend reason to "Breakpoint".

I'm not 100% sure but probably one thread hits the prolog, agent starts the "stepping logic" and other thread hits the prolog (this is sharedCode). 
It seems these "internals steps event" should be filtered.

I would only expect this event:
2015-04-10 10:56:40,753 : <--- E None RunControl contextSuspended "P19726.19729" 
134514428 "Breakpoint" {"BPs":["wrdbg:2"]}

Frederic is annoyed by the new semantic because it breaks our test suite.

I know this fix was introduced to solve a quite marginal issue. If the "internal steps events" filtering isn't easy in multithreaded processes, what about reverting the fix ?

Best Regards,
Xavier.
Comment 5 Frederic Leger CLA 2015-04-15 11:02:31 EDT
Created attachment 252417 [details]
Mthread application source
Comment 6 Frederic Leger CLA 2015-04-15 11:03:14 EDT
Eugene,

in order to make it clearer about where the problem is, we have a GDB-like shell which uses a TCF connection as the debugger backend. Trying to run the mthread example (attached the source file), and doing the sequence :

<connect>
file ~/tmp/mthread
b sharedCode
run

I have the following output:

Connecting...
The debug shell is now configured to run in GDB stop mode.
Connected to TCP:127.0.0.1:1534
Reading symbols from /home/fredl/tmp/mthread...Thread 1 stopped.
done
[New Thread 1]
[New inferior 2 (4369)]
Switching to inferior 2 [4369] (P4369)
Breakpoint 1 at 0x40087f: file src/mthread.c, line 84.
(wrdbg) run
Starting program: /home/fredl/tmp/mthread 
[New Thread 2]
[New Thread 3]
[New Thread 4]
Main Process :	ThreadID =1002313536 ParentID =    30348 ID_self =     4369
Main thread create one thread (0)
Main thread create one thread (1)
Thread 0 :	ThreadID =994072320 ParentID =    30348 ID_self =     4369
Thread 0 is awake (pid 4369).
In main process...
Thread 1 :	ThreadID =985679616 ParentID =    30348 ID_self =     4369
Thread 1 is awake (pid 4369).
In alone (pid 4369) 
Could not find host path /build/buildd/eglibc-2.19/sysdeps/unix/syscall-template.S
Thread 3 stopped with the reason "Step".
0x00007f473b4c6f3d in ??? at ../sysdeps/unix/syscall-template.S:81
Source file /build/buildd/eglibc-2.19/posix/../sysdeps/unix/syscall-template.S not found.

[Switching to Thread 3]
Could not find host path /build/buildd/eglibc-2.19/sysdeps/unix/syscall-template.S
Thread 4 stopped with the reason "Step".
0x00007f473b4c6f3d in ??? at ../sysdeps/unix/syscall-template.S:81
Source file /build/buildd/eglibc-2.19/posix/../sysdeps/unix/syscall-template.S not found.

[Switching to Thread 4]
Could not find host path /build/buildd/eglibc-2.19/sysdeps/unix/syscall-template.S
0x00007f473b4c6f3d in ??? at ../sysdeps/unix/syscall-template.S:81
Source file /build/buildd/eglibc-2.19/posix/../sysdeps/unix/syscall-template.S not found.
(wrdbg) 
Breakpoint 1, sharedCode () at src/mthread.c:87
87	   counter++;

[Switching to Thread 2]


You see that we get some "Step" events from the threads in unknown locations.

As for the initial issue the commit was inteded to fix, I wrote a simple C program which does :

#include <stdio.h>
#include <unistd.h>

int sharedCode (void)
    {
    while (1)
        {
	printf("sleeping for 1 second");
	sleep(1);
	}

    return 0;
    }

int main
    (
    int		argc,
    char *	argv[]
    )
    {
    sharedCode();
    return 0;
    }

Using gdb on it gave :

(gdb) file forever
Reading symbols from forever...done.
(gdb) b sharedCode
Breakpoint 1 at 0x400581: file forever.c, line 8.
(gdb) run
Starting program: /home/fredl/tmp/forever 

Breakpoint 1, sharedCode () at forever.c:8
8               printf("sleeping for 1 second");
(gdb) c
Continuing.

Breakpoint 1, sharedCode () at forever.c:8
8               printf("sleeping for 1 second");
(gdb)

So, to me, the initial bug was not a bug (I consider gdb as the reference debugger).

As Xavier also suggested to revert the fix, it would really be helpful to revert it, and let us know when this may happen.

Thanks

DERF
Comment 7 Eugene Tarassov CLA 2015-04-15 23:19:50 EDT
I cannot revert the changes: users did not like older behavior, I have got complains.

I have added new configuration option: ENABLE_SkipPrologueWhenPlanting. Now you can get back "skip prologue when planting" behavior if compile with ENABLE_SkipPrologueWhenPlanting=1.
Comment 8 Frederic Leger CLA 2015-04-16 02:54:12 EDT
Thanks Eugene.
But you mean that you have users who prefer to see the UI flicker when hitting a breakpoint, and complain about missing libraries ?
If I am correct, the UI will select each thread hitting stopping, and try to get its stack ...