Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion

Greg
The only process left hanging around on the remote side is a sshd process that I think gets created as part of the process of opening the profile configuration dialog, after I'm prompted to connect to the remote node. I'm guessing that Eclipse code is keeping that sshd process as part of maintaining the connection to the remote system, instead of creating a new sshd process for every interaction with the remote system. I'm further guessing that there's some 'ssh' message interchanges where something is missing the fact that the remote bqueues command exited, and failed to post or recognize a completion status.

When the bqueues command gets invoked I see a 'bash' process and a 'bqueues' process.

I killed the sshd process and that didn't change anything.

I'm not sure if the sshd process is using a pty. I think it isn't since 'ps -u dwootton' shows '?' in the TTY column. I have a second sshd process because I ssh to the node and that process has 'pts/40' in the TTY column.

I looked at the /proc/<pid>/cmdline file for both sshd processes. My sshd process has 'dwootton@pts/40' and the Eclipse initiated sshd process has 'dwootton@notty.

I tried looking at other /proc/<pid> files to see what else I could discover, but sshd apparently is setuid root, starts as root and then setuid to dwootton after setting everything up, so I can't view many of the useful /proc files, and I can't gdb attach to the sshd process to see what it's doing..

I tried killing just the Eclipse thread that looked like it was the sshd connection thread, but I apparently can't kill just a single thread. Clicking 'Terminate' in the popup menu I got when I right clicked over the process, but it killed the entire Eclipse runtime instance.

Doing something like sending a 'bqueues -w ; echo 'EOF' probably isn't going to work for me. The LSF target system configuration code issues a few LSF commands, and it appears that LSF is not particularly consistent about where it sends normal messages and error messages. I see some messages I would consider as normal progress messages in the stderr stream. Also, I have some normal completions where nothing is apparently written to stdout, for instance if I have no LSF reservations

Because of this, I'm trying to determine success or failure by getting the process exit status and checking for zero or non-zero. If I'm looking for text in the stderr and/or stdout streams to determine successful completion, I suspect that's going to be a problem since I can't tell from the returned text if I got success or failure. Maybe if I do something like 'bqueues -w ; echo "EOF:$?" then I get the status. Maybe I'll try that if we can't figure out what';s going wrong with sshd.

Finally, I don't know if I have a consistent pattern, but I seem to get a hang the first time I issue a bqueues command in a session and then intermittently after that.

Dave


Inactive hide details for Greg Watson ---01/24/2018 12:10:09 PM---Dave, Is there anything still running on the remote end? e.g.Greg Watson ---01/24/2018 12:10:09 PM---Dave, Is there anything still running on the remote end? e.g. is there a shell process? You could tr

From: Greg Watson <g.watson@xxxxxxxxxxxx>
To: Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date: 01/24/2018 12:10 PM
Subject: Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
Sent by: ptp-dev-bounces@xxxxxxxxxxx





Dave,

Is there anything still running on the remote end? e.g. is there a shell process? You could try killing it to see if that terminates the session.

Another thought. Do you know if the remote process is using a PTY or not?

You might ultimately need to do something hackish, like adding 'echo FOO' to the command and checking to see when FOO comes back.

Greg

      On Jan 24, 2018, at 7:24 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

      Greg
      I suspended each thread in the Eclipse debugger once I had a hung run configuration dialog


      Both my reader threads are waiting

      <17443150.gif>

      I expected these threads had exited at this point since the remote process was gone and the associated write-side file descriptors should have been closed, causing the pending read to end, at least on Linux. I'm running Eclipse on windows, so maybe file descriptor behavior there is different.


      The thread that looks like it might be a connection thread seems to be looping in PipedImputStream.awaitSpace, since I can single step thru it. There is a wait there, with a 1 second timeout.

      <17931618.gif>

      The Session class is com.jcraft.jsch.Session


      I suspended a few other threads and did not see anything that looked like Jsch. I avoided classes that had labels/names that looked like internal Eclipse threads or other unrelated plugins.


      Dave




      <graycol.gif>
      Greg Watson ---01/23/2018 10:57:10 PM---Hi Dave, Off the top of my head I don't know, but Jsch is a nasty piece of work. Can you see if it's

      From:
      Greg Watson <g.watson@xxxxxxxxxxxx>
      To:
      Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
      Date:
      01/23/2018 10:57 PM
      Subject:
      Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
      Sent by:
      ptp-dev-bounces@xxxxxxxxxxx





      Hi Dave,


      Off the top of my head I don't know, but Jsch is a nasty piece of work. Can you see if it's stuck in the Jsch code somewhere?


      Regards,
      Greg
              On Jan 23, 2018, at 3:00 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

              I'm fixing the hangs using the LSF target configuration and have it mostly fixed. One problem I'm running into is that occasionally, the remote process (bqueues -w) exits but the IRemoteProcess.isCompleted() method still returns false, and as a result, my code loops forever waiting for process completion and the run configuation dialog is locked. I can clear the locked state by clicking the red cancel button at the bottom of the dialog.

              The loop I have to wait for process completion is

              for (;;) {
              if (process.isCompleted()) {
              break;
              }
              if (monitor.isCanceled()) {
              process.destroy();
              return new Status(IStatus.
              CANCEL, Activator.PLUGIN_ID, CANCELED, Messages.CommandCancelMessage, null);
              }
              try {
              Thread.
              sleep(1000);
              } catch (InterruptedException e) {
              // Do nothing, sleep just ends early
              }
              }

              I see comments in the IRemoteProcess source that warn that isCompleted() and waitFor() may not work correctly if the calling thread does not read the stderr or stdout streams and the JSch process implementation is used (which appears to be my case since I see that the process builder os a JSchProcessBuilder) . However, in my case I have reads pending on both the stderr and stdout streams for at least one byte, but I am issuing those reads on a different threads from where the remote process was created. (I'm reading on separate threads to avoid my code blocking if the remote process writes so much data to either stream that the stream buffers fill and the process blocks until something reads from these streams to empty the buffer , and that fixes most of the hangs)

              I'm not sure what's going on here to cause the hang. I'm wondering if my InputStream objects need a synchronized attribute because it's being used on a different thread, but that also makes no sense since my InputStream veriable is not visible to anythig other than my code reading the stream.

              Any thoughts or suggestions about what might be going on?

              Thanks

              Dave



              }
              }

              _______________________________________________
              ptp-dev mailing list

              ptp-dev@xxxxxxxxxxx
              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

              https://dev.eclipse.org/mailman/listinfo/ptp-dev


      _______________________________________________
      ptp-dev mailing list

      ptp-dev@xxxxxxxxxxx
      To change your delivery options, retrieve your password, or unsubscribe from this list, visit

      https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=fVUXCw2ExwmeU4_X7N4n8fB0D-ofzaT4utx-FgX1OeQ&s=qcbLhC7oTOwG7MzIAy-Ku8f_jyIynezOE0RedWwOedY&e=


      _______________________________________________
      ptp-dev mailing list

      ptp-dev@xxxxxxxxxxx
      To change your delivery options, retrieve your password, or unsubscribe from this list, visit

      https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=8O4hByBVrIyHgzABRKbnES8nsH1kLsMLTJt_Qw6wD2o&s=vBKrQL8fUjRiPZrE-ZRwzVZFs5y7Iq_hFzW5jxMa4TU&e=



Back to the top