Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion

Dave,

It's quite an involved path from the thread you have reading from the input stream to the stdout of the command on the remote machine. It's possible that the command could complete on the remote machine before the thread even starts running, though I would have thought that isCompleted() would be false if that happened. Can you add something at the end of the script to check that it ran successfully (e.g 'echo "bqueue finished with status $?" > /tmp/script.out')?

There's not really anything that can "lose track", so I want to establish that the command is actually being run each time.

Regards,
Greg

On Jan 30, 2018, at 7:36 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

Greg
I added a sleep just before the exit in the script and that makes no difference. I didn't expect any difference since this execution path should be all non-asynchrouous code. I expect sshd is issuing a fork, exec, and wait to invoke the hack script and then bash does the same when invoking the bqueues command.

The only inconsistent behavior I'm seeing is that sometimes the bqueues command itself times out because LSF daemons apparently aren't responding. But that's all internal to the bqueues command and I do get completion status reported all the way back to my Eclipse code where the return status says the bqueues command exited with rc=255.

I realize the bqueues command could be exiting with some off return code so added an echo statement to my hack script to write the return code to a file on the remote system and the return code was always zero.

Dave

<graycol.gif>Greg Watson ---01/29/2018 03:53:08 PM---Maybe it's a timing issue. What happens if you add 'sleep 5' to the end of the script? Greg

From:  Greg Watson <g.watson@xxxxxxxxxxxx>
To:  Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date:  01/29/2018 03:53 PM
Subject:  Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
Sent by:  ptp-dev-bounces@xxxxxxxxxxx





Maybe it's a timing issue. What happens if you add 'sleep 5' to the end of the script?

Greg
      On Jan 29, 2018, at 2:57 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

      Greg
      I have a console session open on the login node where the bqueues command runs. Once I click the List button in the run configuration dialog, I periodically issue a 'ps -u dwootton' command to see what processes are running for me on the login node. I see the bqueues command and my hack script running for a while, which is what I expect. But then I ussue the ps command again and see that both the bqueues command and the hack script have terminated, but no output from stdout or stderr is displayed to my Eclipse console view. When the bqueues command works correctly, I see stdout or stderr, sometimes both, getting text back from the remote command. That's why I'm thinking something is losing track of the command invocation since I should see at least the messages from my hack script, which are issued unconditionally before and after the bqueues command runs.


      Dave


      <graycol.gif>
      Greg Watson ---01/29/2018 12:17:24 PM---Dave, What do you mean "when the bqueues command disappears"?

      From:  
      Greg Watson <g.watson@xxxxxxxxxxxx>
      To:  
      Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
      Date:  
      01/29/2018 12:17 PM
      Subject:  
      Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
      Sent by:  
      ptp-dev-bounces@xxxxxxxxxxx






      Dave,


      What do you mean "when the bqueues command disappears"?


      Greg
              On Jan 29, 2018, at 9:30 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

              Greg
              That doesn't work. The result is that the name of the remote command is the complete string 'bqueues -l; echo EOF:$?'

              I thought I could make this work by running a wrapper script, for instance /home/dwootton/hack on the remote node, where the script is
              #!/bin/sh
              echo "Execute: " $*
              $*
              echo "EOF:$?"

              And then changing the invocation command in my Eclipse code to 'private static final String bqueuesCommand[] = {"/home/dwootton/hack", "bqueues", "-l"};'

              The idea is that the hack script just executes exactly what it is passed.

              This works correctly most of the time. However, when the bqueues command disappears, I still get absolutely no output to stdout, not even the text from my hack script.

              It looks like something is just completely losing track of the remote command request.in this case.

              Dave



              <graycol.gif>
              Greg Watson ---01/26/2018 11:39:03 AM---What happens if you try a single quoted argument, e.g 'bqueues -l; echo EOF:$?' Greg

              From:  
              Greg Watson <g.watson@xxxxxxxxxxxx>
              To:  
              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
              Date:  
              01/26/2018 11:39 AM
              Subject:  
              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
              Sent by:  
              ptp-dev-bounces@xxxxxxxxxxx






              What happens if you try a single quoted argument, e.g 'bqueues -l; echo EOF:$?'

              Greg
                              On Jan 25, 2018, at 5:05 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

                              Greg
                              I tried adding an echo command to the bqueues command and I am not having any success. My original bqueues command that I was passing to the IRemoteProcessBuilder was a String array {"bqueues", "-l"}. 

                              I changed that to {"bqueues", "-l", ";", "echo", "\"EOF:$?\""} and that failed with a LSF error message that there was no such queue as ";", where the semicolon is being passed as a command parameter to the bqueues command instead of as a command separator for bash.

                              I tried changing ';' tp "\\;" to escape the semicolon and it was still passed as a bqueues command parameter, this time '\;'.

                              I was able to get the pid of the bash process started to run the bqueues command one time with my original bqueues command hanging and it looks like the command being passed across is actually /bin/bash -l -c cd /autofs/home/dwootton && bqueues -l where "cd /autofs/home/dwootton && bqueues -l" is probably a string parameter to the bash -c option (which tells bash to use the string as the bash command")

                              So I'm not sure how I can get this hack to work. I think I have a way to deal with the return status in my Java code, but I'm stuck at getting a working command to pass across to the remote system.

                              Dave

                              <graycol.gif>
                              Greg Watson ---01/24/2018 12:10:09 PM---Dave, Is there anything still running on the remote end? e.g. is there a shell process? You could tr

                              From: 
                              Greg Watson <g.watson@xxxxxxxxxxxx>
                              To: 
                              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
                              Date: 
                              01/24/2018 12:10 PM
                              Subject: 
                              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
                              Sent by: 
                              ptp-dev-bounces@xxxxxxxxxxx







                              Dave,

                              Is there anything still running on the remote end? e.g. is there a shell process? You could try killing it to see if that terminates the session.

                              Another thought. Do you know if the remote process is using a PTY or not?

                              You might ultimately need to do something hackish, like adding 'echo FOO' to the command and checking to see when FOO comes back.

                              Greg
                                                              On Jan 24, 2018, at 7:24 AM, David Wootton <dwootton@xxxxxxxxxx> wrote:

                                                              Greg
                                                              I suspended each thread in the Eclipse debugger once I had a hung run configuration dialog

                                                              Both my reader threads are waiting
                                                              <17443150.gif>
                                                              I expected these threads had exited at this point since the remote process was gone and the associated write-side file descriptors should have been closed, causing the pending read to end, at least on Linux. I'm running Eclipse on windows, so maybe file descriptor behavior there is different.

                                                              The thread that looks like it might be a connection thread seems to be looping in PipedImputStream.awaitSpace, since I can single step thru it. There is a wait there, with a 1 second timeout.
                                                              <17931618.gif>
                                                              The Session class is com.jcraft.jsch.Session

                                                              I suspended a few other threads and did not see anything that looked like Jsch. I avoided classes that had labels/names that looked like internal Eclipse threads or other unrelated plugins.

                                                              Dave



                                                              <graycol.gif>
                                                              Greg Watson ---01/23/2018 10:57:10 PM---Hi Dave, Off the top of my head I don't know, but Jsch is a nasty piece of work. Can you see if it's

                                                              From: 
                                                              Greg Watson <g.watson@xxxxxxxxxxxx>
                                                              To: 
                                                              Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
                                                              Date: 
                                                              01/23/2018 10:57 PM
                                                              Subject: 
                                                              Re: [ptp-dev] IRemoteProcess.isCompleted occaisionally fails to report process completion
                                                              Sent by: 
                                                              ptp-dev-bounces@xxxxxxxxxxx







                                                              Hi Dave,

                                                              Off the top of my head I don't know, but Jsch is a nasty piece of work. Can you see if it's stuck in the Jsch code somewhere?

                                                              Regards,
                                                              Greg
                                                                                                                              On Jan 23, 2018, at 3:00 PM, David Wootton <dwootton@xxxxxxxxxx> wrote:

                                                                                                                              I'm fixing the hangs using the LSF target configuration and have it mostly fixed. One problem I'm running into is that occasionally, the remote process (bqueues -w) exits but the IRemoteProcess.isCompleted() method still returns false, and as a result, my code loops forever waiting for process completion and the run configuation dialog is locked. I can clear the locked state by clicking the red cancel button at the bottom of the dialog.

                                                                                                                              The loop I have to wait for process completion is

                                                                                                                              for (;;) {
                                                                                                                              if (process.isCompleted()) {
                                                                                                                              break;
                                                                                                                              }
                                                                                                                              if (monitor.isCanceled()) {
                                                                                                                              process.destroy();
                                                                                                                              return new Status(IStatus.
                                                                                                                              CANCEL, Activator.PLUGIN_ID, CANCELED, Messages.CommandCancelMessage, null);
                                                                                                                              }
                                                                                                                              try {
                                                                                                                              Thread.
                                                                                                                              sleep(1000);
                                                                                                                              } catch (InterruptedException e) {
                                                                                                                              // Do nothing, sleep just ends early 
                                                                                                                              }
                                                                                                                              }

                                                                                                                              I see comments in the IRemoteProcess source that warn that isCompleted() and waitFor() may not work correctly if the calling thread does not read the stderr or stdout streams and the JSch process implementation is used (which appears to be my case since I see that the process builder os a JSchProcessBuilder) . However, in my case I have reads pending on both the stderr and stdout streams for at least one byte, but I am issuing those reads on a different threads from where the remote process was created. (I'm reading on separate threads to avoid my code blocking if the remote process writes so much data to either stream that the stream buffers fill and the process blocks until something reads from these streams to empty the buffer , and that fixes most of the hangs) 

                                                                                                                              I'm not sure what's going on here to cause the hang. I'm wondering if my InputStream objects need a synchronized attribute because it's being used on a different thread, but that also makes no sense since my InputStream veriable is not visible to anythig other than my code reading the stream.

                                                                                                                              Any thoughts or suggestions about what might be going on?

                                                                                                                              Thanks

                                                                                                                              Dave



                                                                                                                              }
                                                                                                                              }

                                                                                                                              _______________________________________________
                                                                                                                              ptp-dev mailing list

                                                                                                                              ptp-dev@xxxxxxxxxxx
                                                                                                                              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

                                                                                                                              https://dev.eclipse.org/mailman/listinfo/ptp-dev


                                                              _______________________________________________
                                                              ptp-dev mailing list

                                                              ptp-dev@xxxxxxxxxxx
                                                              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

                                                              https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=fVUXCw2ExwmeU4_X7N4n8fB0D-ofzaT4utx-FgX1OeQ&s=qcbLhC7oTOwG7MzIAy-Ku8f_jyIynezOE0RedWwOedY&e=


                                                              _______________________________________________
                                                              ptp-dev mailing list

                                                              ptp-dev@xxxxxxxxxxx
                                                              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

                                                              https://dev.eclipse.org/mailman/listinfo/ptp-dev

                              _______________________________________________
                              ptp-dev mailing list

                              ptp-dev@xxxxxxxxxxx
                              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

                              https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=8O4hByBVrIyHgzABRKbnES8nsH1kLsMLTJt_Qw6wD2o&s=vBKrQL8fUjRiPZrE-ZRwzVZFs5y7Iq_hFzW5jxMa4TU&e=


                              _______________________________________________
                              ptp-dev mailing list

                              ptp-dev@xxxxxxxxxxx
                              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

                              https://dev.eclipse.org/mailman/listinfo/ptp-dev

              _______________________________________________
              ptp-dev mailing list

              ptp-dev@xxxxxxxxxxx
              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

              https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=8UKZOWGHFEQK7peFbb27Sq7TzqUU8yKSGcWEPbpCK58&s=6zr4BkegolvkKbdUDs170pjhjmktMVWsj4ZMU0eXrCY&e=


              _______________________________________________
              ptp-dev mailing list

              ptp-dev@xxxxxxxxxxx
              To change your delivery options, retrieve your password, or unsubscribe from this list, visit

              https://dev.eclipse.org/mailman/listinfo/ptp-dev

      _______________________________________________
      ptp-dev mailing list

      ptp-dev@xxxxxxxxxxx
      To change your delivery options, retrieve your password, or unsubscribe from this list, visit

      https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=E8oMYAzOXDWKpRBJw9dEu8Och2zp6CdOx-ECC0T98nY&s=JKBj8UsPwVtMHjwdDIEdSGjZw3P8IODBW3k0gsAo_1Y&e=


      _______________________________________________
      ptp-dev mailing list

      ptp-dev@xxxxxxxxxxx
      To change your delivery options, retrieve your password, or unsubscribe from this list, visit

      https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.eclipse.org_mailman_listinfo_ptp-2Ddev&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=yA1Giwd7Ls577uUKQ3fQWICGHopYggQ46OvlB30WK5M&m=WrhV3arLuvqCGzT4vfToJNjBJpmWdRvnuUBZTz_T_GQ&s=SW2sdviKY2FPCbhZYCXuK04ZUKS4zWq8xEkg_w7sN_0&e=


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/ptp-dev


Back to the top