Bug 314554 - [launch] Error in Launch sequence leaves Debug view with a hanging launch
Summary: [launch] Error in Launch sequence leaves Debug view with a hanging launch
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-debug-dsf-gdb (show other bugs)
Version: 7.0   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-26 14:52 EDT by Marc Khouzam CLA
Modified: 2020-09-04 15:26 EDT (History)
4 users (show)

See Also:


Attachments
Reproduce problem outside of eclipse (1.20 KB, text/x-csrc)
2010-05-28 10:00 EDT, Marc Khouzam CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Khouzam CLA 2010-05-26 14:52:12 EDT
Putting an error in the FinalLaunchSequence leaves the launch hanging around in the debug view witout any way to kill it.  It seems to be that the inferior process is in a weird state that shows it is running when it is actually terminated.
Comment 1 Marc Khouzam CLA 2010-05-26 15:34:40 EDT
This could be the same thing as what John saw in
https://bugs.eclipse.org/bugs/show_bug.cgi?id=311813#c13
Comment 2 Marc Khouzam CLA 2010-05-27 11:39:04 EDT
(In reply to comment #1)
> This could be the same thing as what John saw in
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=311813#c13

Not the same things as it only happens on Linux, while John was on Windows.
Comment 3 Marc Khouzam CLA 2010-05-27 12:07:21 EDT
The problem is with PTY.
I need help from someone that understands the PTY JNI library.
The bug affects both DSF-GDB and CDI.

In summary PTYInputStream#read(...) calls the native read0() but this call blocks forever if the launch failed before GDB has started the inferior.  There is no way to kill it.

Here is what is happening in more details:

1- we create a PTY and give it to GDB to have the inferior write to it.
2- we add the inferior RuntimeProcess to the launch which triggers a native read from the PTY input stream.  This read is blocking.

The call trace for this is:

"Output Stream Monitor" daemon prio=1 tid=0x6898f2a0 nid=0x3778 runnable [0x615fe000..0x615ff0d0]
   at java.io.FileInputStream.readBytes(Native Method) (from the native read0)
   at java.io.FileInputStream.read(Unknown Source) (from PTYInputStream#read)
   at java.io.BufferedInputStream.read1(Unknown Source) 
   at java.io.BufferedInputStream.read(Unknown Source) 
   - locked <0xabeebb38> (a java.io.BufferedInputStream)
   at java.io.BufferedInputStream.read1(Unknown Source) 
   at java.io.BufferedInputStream.read(Unknown Source) 
   - locked <0xabee8a30> (a java.io.BufferedInputStream)
   at java.io.FilterInputStream.read(Unknown Source) 
   at org.eclipse.debug.internal.core.OutputStreamMonitor.read(OutputStreamMonitor.java:144)
   at org.eclipse.debug.internal.core.OutputStreamMonitor.access$1(OutputStreamMonitor.java:134)
   at org.eclipse.debug.internal.core.OutputStreamMonitor$1.run(OutputStreamMonitor.java:207)
   at java.lang.Thread.run(Unknown Source) 

3- if the launch fails _before_ the inferior is started (before -exec-run), GDB does not hook the inferior to the PTY and somehow, the read of step #2 never unblocks and leaves the launch inferior process hung.

I believe the pty.inputstream.close is properly called but still, the read0 never comes back.

What I found using 'lsof' on a shell, is that until -exec-run is executed, the PTY is not shown as an open file, which may explain why it is not actually 'closed' and does not release the read0.

The question becomes, how does read0 block on a fileDescriptor that is not open?

If we execute --exec-run before failing the launch, the read0 returns with -1 and everything gets cleaned up.


This is hard to explain so if anyone has the courage to jump in, please ask whatever clarification you need.

Thanks for any help.
Comment 4 Marc Khouzam CLA 2010-05-27 16:24:15 EDT
After trying lots of things I believe the problem is that if the inferior is not started, then nobody does an open() nor a close() on the slave pts.  Because of that, when our PTY library tries to read from the master's fileDescriptor, it blocks forever until someone actually opens the other side (the slave).

Closing the master fileDescriptor (which we already do) does not seem to unblock the read() call.

In PTY.java we have the string id of the slave.  Maybe we need to open it to get its fileDescripter and then close it to unlock the master.  This does not sound very clean....  I'll have to think more about it.
Comment 5 Anton Leherbauer CLA 2010-05-28 06:58:51 EDT
Could it be that GDB has an open file descriptor on the slave?  Just a vague idea, though.
Comment 6 Marc Khouzam CLA 2010-05-28 08:00:58 EDT
(In reply to comment #5)
> Could it be that GDB has an open file descriptor on the slave?  Just a vague
> idea, though.

Turns out that is not it.  I commented out the code that sends the pty name to GDB and the problem still occurs.

Here is a test you can try that sheds some light:
1- launch your eclipse with DSF-GDB debug traces enabled (the first printout to GDB will be the tty id to use, which is not printed in the 'gdb traces' console)
2- in the launch Debugger tab, select the ShareLibraries subtab and check the "use shared lib symbols for debugged applications" option.  This will make the launch fail before the inferior starts (bug 314536).
3- Notice how the launch is hanging because of the inferior process and that you can't kill it from Eclipse
4- on a linux shell you can list the pty 'ls /dev/pts/<pts id given to gdb>'
5- from the shell do 'echo hello > /dev/pts/<pts id>'
6- notice that the inferior is suddenly marked as terminated in the launch

Step 5 actually writes to the pty (you can see 'hello' in the eclipse inferior console) and unblocks the read0(), which allows for the cleanup to proceed.

Based on this, I've also written a two line C program that does
fd = open("/dev/pts/<id>", 0);
close(fd);

and when I run this, the read0() also gets unblocked.

It really seems that the read0() blocks until something actually opens the pts at least once.
Comment 7 Marc Khouzam CLA 2010-05-28 10:00:54 EDT
Created attachment 170347 [details]
Reproduce problem outside of eclipse

Here is a little C program that does what our PTY library does.
It opens a master/slave pty, starts reading from the master from another thread, then closes the master without ever opening the slave.

You can compile this using
gcc -g -lpthread openpty.c

Once you run the program it will say:

slave is /dev/pts/0  (<--- this may change for each run)
starting reading thread
-->press enter close master fd
About to read

once you press enter, the reading thread will not unblock (no new printout),
you can then do in a shell 'echo hello >/dev/pts/0', which will unlock the reading thread.

I think this shows that unless someone opens the slave pts, our read never unblocks.

I noticed from the PTY libary openpty.c file a method called openpty() which does both create the master using ptym_open() and then open the slave.  We don't call this method in CDT, but instead call Java_org_eclipse_cdt_utils_pty_PTY_openMaster() which only calls ptym_open() but does not open the slave.  I wonder if this could be a hint of what we should do...
Comment 8 Anton Leherbauer CLA 2010-05-28 10:48:55 EDT
I think it's too late to tinker with the PTY native code for 7.0, but as a workaround maybe you could create a FileOutputStream on the slave, write one byte and close again (just in case it does not get to the -exec-run).
Comment 9 Marc Khouzam CLA 2010-05-29 05:17:11 EDT
(In reply to comment #8)
> I think it's too late to tinker with the PTY native code for 7.0, but as a
> workaround maybe you could create a FileOutputStream on the slave, write one
> byte and close again (just in case it does not get to the -exec-run).

I think this bug is all too risky for 7.0.  Furthermore, we never really noticed it before, although it's been there for a very long time.

The risk is that bug 314536 describes an easy way to make a DSF-GDB launch fail; fixing that is less risky, so that's what I will do for 7.0.

Let's target this bug for 8.0 (and maybe 7.0.1)
Comment 10 Marc Khouzam CLA 2011-05-03 10:43:18 EDT
Too late for 8.0