Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion

Greg
I always got a response from bqueues when I ran it as a ssh command, either a queue list of a timeout message.

I tried to create a proxy connection, but I'm not sure I know how to do that. I specified a ost and username as for a regular ssh connection, then in the advanced settings I clicked the 'remote' button under SSH proxy settings and picked an existing ssh connection. However, I'm thinking I'm still using the original ssh session in this case. I did get the same problem with no notification of command completion once with this connection though.

Dave

Inactive hide details for Greg Watson ---02/09/2018 09:12:14 AM---Hi Dave, I'm not sure what else to suggest. Have you tried suGreg Watson ---02/09/2018 09:12:14 AM---Hi Dave, I'm not sure what else to suggest. Have you tried submitting the script multiple times with

From: Greg Watson <g.watson@xxxxxxxxxxxx>
To: Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date: 02/09/2018 09:12 AM
Subject: Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
Sent by: ptp-dev-bounces@xxxxxxxxxxx





Hi Dave,

I'm not sure what else to suggest. Have you tried submitting the script multiple times with a regular ssh command (i.e. ssh host -c your_script)? Does this work perfectly every time? You could also try using the PROXY connection type rather than SSH. This uses ssh only for the initial connection, but after that downloads a small agent on the remote machine to handle running the remote commands. If you see different behavior at least that will tell you that it is the JSch implementation that is causing the problem. If not, then it must be somewhere higher in the stack. If you want to try this, you'll need to update from the latest Oxygen build [1] as I fixed a number of bugs recently.

We can get your changes into Oxygen.3 which is early March. When you submit them let me know and I'll run another build and promote that for Oxygen.3.

Regards,
Greg

[1] http://download.eclipse.org/tools/ptp/builds/oxygen/milestones
      On Feb 8, 2018, at 2:59 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

      Greg
      Any other ideas about what's not working when the bqueues command hangs? I thought there might be something going on if the bqueues command was killed by 'kill -9', but when I tried that I still got notification back to my Eclipse code most times. It did fail once where there was no notification whatsoever.


      I'm not surprised, since even in the 'kill -9' case, wherever invoked bqueues should get a return code back, in that case indicating bqueues was terminated.


      At this point, I'm thinking I commit what changes I have since they do improve the LSF target system configuration behavior. Previously, the bqueues command would consistently block due to the command and the stdio/stderr stream readers all running on the same thread , and now it's only when the bqueues command fails to return status.


      Also, any possibility of getting a rebuild of PTP once my changes are merged in? We need this to ship with our plugins since we depend on LSF target system configurations.


      Thanks


      Dave

      ----- Forwarded by David Wootton/Poughkeepsie/Contr/IBM on 02/08/2018 02:50 PM -----


      From:
      David Wootton/Poughkeepsie/Contr/IBM
      To:
      Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
      Date:
      02/05/2018 09:49 AM
      Subject:
      Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion




      Greg
      I had an echo to a file in the local filesystem, to my hack wrapper script, following the bqueues command, including the bqueues return code. In the case where everything hung the echo reported the bqueues command return code was zero, so the commands were definitely running.


      Usually the hang lasted for something like 30 seconds. I logged into a second console session on the node where the bqueues command was running and repeatedly issue 'ps -u dwootton' commands and see the bqueues command and my wrapper until it eventually terminated with no notification back to my Eclipse session.


      Dave


      <graycol.gif>Greg Watson ---02/02/2018 03:42:31 PM---Dave, It's quite an involved path from the thread you have reading from the input stream to the stdo

      From:
      Greg Watson <g.watson@xxxxxxxxxxxx>
      To:
      Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
      Date:
      02/02/2018 03:42 PM
      Subject:
      Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
      Sent by:
      ptp-dev-bounces@xxxxxxxxxxx




      Dave,

      It's quite an involved path from the thread you have reading from the input stream to the stdout of the command on the remote machine. It's possible that the command could complete on the remote machine before the thread even starts running, though I would have thought that isCompleted() would be false if that happened. Can you add something at the end of the script to check that it ran successfully (e.g 'echo "bqueue finished with status $?" > /tmp/script.out')?

      There's not really anything that can "lose track", so I want to establish that the command is actually being run each time.

      Regards,
      Greg
              On Jan 30, 2018, at 7:36 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

              Greg
              I added a sleep just before the exit in the script and that makes no difference. I didn't expect any difference since this execution path should be all non-asynchrouous code. I expect sshd is issuing a fork, exec, and wait to invoke the hack script and then bash does the same when invoking the bqueues command.

              The only inconsistent behavior I'm seeing is that sometimes the bqueues command itself times out because LSF daemons apparently aren't responding. But that's all internal to the bqueues command and I do get completion status reported all the way back to my Eclipse code where the return status says the bqueues command exited with rc=255.

              I realize the bqueues command could be exiting with some off return code so added an echo statement to my hack script to write the return code to a file on the remote system and the return code was always zero.

              Dave

              <graycol.gif>
              Greg Watson ---01/29/2018 03:53:08 PM---Maybe it's a timing issue. What happens if you add 'sleep 5' to the end of the script? Greg

              From:
              Greg Watson <g.watson@xxxxxxxxxxxx>
              To:
              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
              Date:
              01/29/2018 03:53 PM
              Subject:
              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
              Sent by:
              ptp-dev-bounces@xxxxxxxxxxx






              Maybe it's a timing issue. What happens if you add 'sleep 5' to the end of the script?

              Greg
                              On Jan 29, 2018, at 2:57 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

                              Greg
                              I have a console session open on the login node where the bqueues command runs. Once I click the List button in the run configuration dialog, I periodically issue a 'ps -u dwootton' command to see what processes are running for me on the login node. I see the bqueues command and my hack script running for a while, which is what I expect. But then I ussue the ps command again and see that both the bqueues command and the hack script have terminated, but no output from stdout or stderr is displayed to my Eclipse console view. When the bqueues command works correctly, I see stdout or stderr, sometimes both, getting text back from the remote command. That's why I'm thinking something is losing track of the command invocation since I should see at least the messages from my hack script, which are issued unconditionally before and after the bqueues command runs.

                              Dave

                              <graycol.gif>
                              Greg Watson ---01/29/2018 12:17:24 PM---Dave, What do you mean "when the bqueues command disappears"?

                              From:
                              Greg Watson <g.watson@xxxxxxxxxxxx>
                              To:
                              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
                              Date:
                              01/29/2018 12:17 PM
                              Subject:
                              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
                              Sent by:
                              ptp-dev-bounces@xxxxxxxxxxx





                              Dave,

                              What do you mean "when the bqueues command disappears"?

                              Greg
                                                              On Jan 29, 2018, at 9:30 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

                                                              Greg
                                                              That doesn't work. The result is that the name of the remote command is the complete string 'bqueues -l; echo EOF:$?'

                                                              I thought I could make this work by running a wrapper script, for instance /home/dwootton/hack on the remote node, where the script is
                                                              #!/bin/sh
                                                              echo "Execute: " $*
                                                              $*
                                                              echo "EOF:$?"

                                                              And then changing the invocation command in my Eclipse code to 'private static final String bqueuesCommand[] = {"/home/dwootton/hack", "bqueues", "-l"};'

                                                              The idea is that the hack script just executes exactly what it is passed.

                                                              This works correctly most of the time. However, when the bqueues command disappears, I still get absolutely no output to stdout, not even the text from my hack script.

                                                              It looks like something is just completely losing track of the remote command request.in this case.

                                                              Dave



                                                              <graycol.gif>
                                                              Greg Watson ---01/26/2018 11:39:03 AM---What happens if you try a single quoted argument, e.g 'bqueues -l; echo EOF:$?' Greg

                                                              From:
                                                              Greg Watson <g.watson@xxxxxxxxxxxx>
                                                              To:
                                                              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
                                                              Date:
                                                              01/26/2018 11:39 AM
                                                              Subject:
                                                              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
                                                              Sent by:
                                                              ptp-dev-bounces@xxxxxxxxxxx







                                                              What happens if you try a single quoted argument, e.g 'bqueues -l; echo EOF:$?'

                                                              Greg
                                                                                                                              On Jan 25, 2018, at 5:05 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

                                                                                                                              Greg
                                                                                                                              I tried adding an echo command to the bqueues command and I am not having any success. My original bqueues command that I was passing to the IRemoteProcessBuilder was a String array {"bqueues", "-l"}.

                                                                                                                              I changed that to {"bqueues", "-l", ";", "echo", "\"EOF:$?\""} and that failed with a LSF error message that there was no such queue as ";", where the semicolon is being passed as a command parameter to the bqueues command instead of as a command separator for bash.

                                                                                                                              I tried changing ';' tp "\\;" to escape the semicolon and it was still passed as a bqueues command parameter, this time '\;'.

                                                                                                                              I was able to get the pid of the bash process started to run the bqueues command one time with my original bqueues command hanging and it looks like the command being passed across is actually /bin/bash -l -c cd /autofs/home/dwootton && bqueues -l where "cd /autofs/home/dwootton && bqueues -l" is probably a string parameter to the bash -c option (which tells bash to use the string as the bash command")

                                                                                                                              So I'm not sure how I can get this hack to work. I think I have a way to deal with the return status in my Java code, but I'm stuck at getting a working command to pass across to the remote system.

                                                                                                                              Dave

                                                                                                                              <graycol.gif>
                                                                                                                              Greg Watson ---01/24/2018 12:10:09 PM---Dave, Is there anything still running on the remote end? e.g. is there a shell process? You could tr

                                                                                                                              From:
                                                                                                                              Greg Watson <g.watson@xxxxxxxxxxxx>
                                                                                                                              To:
                                                                                                                              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
                                                                                                                              Date:
                                                                                                                              01/24/2018 12:10 PM
                                                                                                                              Subject:
                                                                                                                              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
                                                                                                                              Sent by:
                                                                                                                              ptp-dev-bounces@xxxxxxxxxxx






                                                                                                                              Dave,

                                                                                                                              Is there anything still running on the remote end? e.g. is there a shell process? You could try killing it to see if that terminates the session.

                                                                                                                              Another thought. Do you know if the remote process is using a PTY or not?

                                                                                                                              You might ultimately need to do something hackish, like adding 'echo FOO' to the command and checking to see when FOO comes back.

                                                                                                                              Greg
On Jan 24, 2018, at 7:24 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

Greg
I suspended each thread in the Eclipse debugger once I had a hung run configuration dialog

Both my reader threads are waiting
<17443150.gif>
I expected these threads had exited at this point since the remote process was gone and the associated write-side file descriptors should have been closed, causing the pending read to end, at least on Linux. I'm running Eclipse on windows, so maybe file descriptor behavior there is different.

The thread that looks like it might be a connection thread seems to be looping in PipedImputStream.awaitSpace, since I can single step thru it. There is a wait there, with a 1 second timeout.
<17931618.gif>
The Session class is com.jcraft.jsch.Session

I suspended a few other threads and did not see anything that looked like Jsch. I avoided classes that had labels/names that looked like internal Eclipse threads or other unrelated plugins.

Dave



<graycol.gif>
Greg Watson ---01/23/2018 10:57:10 PM---Hi Dave, Off the top of my head I don't know, but Jsch is a nasty piece of work. Can you see if it's

From:
Greg Watson <g.watson@xxxxxxxxxxxx>
To:
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date:
01/23/2018 10:57 PM
Subject:
Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
Sent by:
ptp-dev-bounces@xxxxxxxxxxx






Hi Dave,

Off the top of my head I don't know, but Jsch is a nasty piece of work. Can you see if it's stuck in the Jsch code somewhere?

Regards,
Greg
      On Jan 23, 2018, at 3:00 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

      I'm fixing the hangs using the LSF target configuration and have it mostly fixed. One problem I'm running into is that occasionally, the remote process (bqueues -w) exits but the IRemoteProcess.isCompleted() method still returns false, and as a result, my code loops forever waiting for process completion and the run configuation dialog is locked. I can clear the locked state by clicking the red cancel button at the bottom of the dialog.

      The loop I have to wait for process completion is

      for (;;) {
      if (process.isCompleted()) {
      break;
      }
      if (monitor.isCanceled()) {
      process.destroy();
      return new Status(IStatus.
      CANCEL, Activator.PLUGIN_ID, CANCELED, Messages.CommandCancelMessage, null);
      }
      try {
      Thread.
      sleep(1000);
      } catch (InterruptedException e) {
      // Do nothing, sleep just ends early
      }
      }

      I see comments in the IRemoteProcess source that warn that isCompleted() and waitFor() may not work correctly if the calling thread does not read the stderr or stdout streams and the JSch process implementation is used (which appears to be my case since I see that the process builder os a JSchProcessBuilder) . However, in my case I have reads pending on both the stderr and stdout streams for at least one byte, but I am issuing those reads on a different threads from where the remote process was created. (I'm reading on separate threads to avoid my code blocking if the remote process writes so much data to either stream that the stream buffers fill and the process blocks until something reads from these streams to empty the buffer , and that fixes most of the hangs)

      I'm not sure what's going on here to cause the hang. I'm wondering if my InputStream objects need a synchronized attribute because it's being used on a different thread, but that also makes no sense since my InputStream veriable is not visible to anythig other than my code reading the stream.

      Any thoughts or suggestions about what might be going on?

      Thanks

      Dave



      }
      }

      _______________________________________________
      ptp-dev mailing list

      ptp-dev@xxxxxxxxxxx
      To change your delivery options, retrieve your password, or unsubscribe from this list, visit

      https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list

ptp-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit

https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=fVUXCw2ExwmeU4_X7N4n8fB0D-ofzaT4utx-FgX1OeQ&s=qcbLhC7oTOwG7MzIAy-Ku8f_jyIynezOE0RedWwOedY&e=


_______________________________________________
ptp-dev mailing list

ptp-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit

https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=cwIChJiVZRX8rEoTVATd-PKhV9JSMHJ8TH4LBZa_Y8I&s=UhgYkOS6_O0GjZJ-6_UpiIWb-oaCg8XApw0RTYRtXmc&e=



Back to the top