Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] PTP LauncherFailswithJava.lang.reflect.InvocationTargetException

Ken,

Try setting your PATH and LD_LIBRARY_PATH in your .cshrc rather than .login (I assume you're using csh). Ssh does not create a login shell when running a command on a remote machine, so your .login won't be getting loaded.

MPICH support is only very basic (no debugging) and is highly version specific. I'd suggest sticking with OpenMPI for the moment.

Here's what I added to my TODO list (http://wiki.eclipse.org/PTP/planning/2.0/TODO ):

53. Add support for "mpirun" arguments in addition to application arguments.

Greg

On Dec 21, 2007, at 2:32 PM, Kenneth Evans wrote:

Greg,

I changed my .login to have PATH and LD_LIBRARY_PATH that point to OpenMPI. This should eliminate all the MPICH2 stuff. However, I can't get OpenMPI to work without --prefix. I don't understand why. It says it can't find orted
(see below).  However, if I ssh to the nodes:

1. "which orted" is successful
2. I can start it by typing "orted"
3. PATH and LD_LIBRARY_PATH are the same as in my .login and point to
OpenMPI.

So it seems to be properly on the nodes.

This is what I get:

43 blacklab.aps.anl.gov:openmpitest>mpirun -n 4 helloWorld
orted: Command not found.
orted: Command not found.
orted: Command not found.
orted: Command not found.
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy1 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy2 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy4 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1166
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
errmgr_hnp.c at line 90
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy3 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned
value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
44 blacklab.aps.anl.gov:openmpitest>

If I do the following with --prefix, it works OK:

44 blacklab.aps.anl.gov:openmpitest>mpirun --prefix /clhome/EVANS/ openmpi -n
4 helloWorld
Hello World from process 0 of 4 on puppy1.cluster
Hello World from process 1 of 4 on puppy2.cluster
Hello World from process 2 of 4 on puppy3.cluster
Hello World from process 3 of 4 on puppy4.cluster
45 blacklab.aps.anl.gov:openmpitest>

I don't know enough about OpenMPI and ORTD to know what might be causing
this behavior.

BTW by changing my .login to point to the MPICH installation, I can get MPICH, which is MPI-1, like OpenMPI, to work. I had never used it before,
so I am about in the same position as I am with OpenMPI.

The above is independent of Eclipse. I did check the things you mentioned in the Eclipse preferences, and they are as they ought to be. I notice
there is an MPICH-2 alternative to ORTD.  Does that work?

I work at Argonne, and we will be shutting down until Jan 2. I would like
to get this working today, if possible.

BTW in regard to the TODO list. It would seem that a general way to add arguments to mpiexec would be the way to go (like the Java launcher and the
JVM, as I mentioned) rather than to just implement --prefix.

Thanks for your help,

       -Ken

-----Original Message-----
From: ptp-user-bounces@xxxxxxxxxxx [mailto:ptp-user-bounces@xxxxxxxxxxx ] On
Behalf Of Greg Watson
Sent: Friday, December 21, 2007 10:09 AM
To: PTP User list
Subject: Re: [ptp-user] PTP
LauncherFailswithJava.lang.reflect.InvocationTargetException


On Dec 20, 2007, at 8:06 PM, Kenneth Evans wrote:

Greg,

I'm not sure why you need to use --prefix at all? Did you specify
a prefix when you ran configure on OpenMPI? That should set up
everything correctly so you should just need to use mpirun.

I think that's a different prefix.

...


I apparently need it to prefix the PATH and LD_LIBRARY_PATH on the
remote host.

What other arguments do you need?

(--prefix is one ;-)  In the link above, there are other command-
line arguments to mpiexec, as there are with most other
implementations.  In the Eclipse Java launcher, for example, you
have the option to enter JVM arguments _and_ program arguments.

Ok, so it looks like this prefix is to support a different install
location on remote nodes. Unfortunately, PTP 1.1 does not support
this. I'll add it to the PTP 2.0 TODO list. As a work around, have you
tried setting PATH and LD_LIBRARY_PATH and then running mpiexec
without --path?



Please try running the ptp_orte_proxy program manually and let me
know what output you see. I suspect you have a LD_LIBRARY_PATH
problem.


----------------------------------------------------------------------------
-----------
42 blacklab.aps.anl.gov:EVANS>source ~/bin/setOpenMPI
43 blacklab.aps.anl.gov:EVANS>echo $PATH
.:/clhome/EVANS/bin:/clhome/aps_tools/gcc-4.2/bin:/clhome/EVANS/
openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/
X11R6/bin
44 blacklab.aps.anl.gov:EVANS>echo $LD_LIBRARY_PATH
/clhome/aps_tools/gcc-4.2/lib:/clhome/aps_tools/gcc-4.2/lib64:/
clhome/EVANS/openmpi/lib
45 blacklab.aps.anl.gov:EVANS>/clhome/EVANS/eclipsePlugins/eclipse/
plugins/org.eclipse.ptp.linux.x86_64_1.1.0/bin/ptp_orte_proxy
proxy_svr_connect returned.


----------------------------------------------------------------------------
-----------


Here are a couple of things to try:

1. It looks like you're using PTP 1.1.0. There is a bug fix release
(1.1.1) available from http://www.eclipse.org/ptp/downloads.php. I
would recommend installing it just in case.

2. Click on the PTP preferences (from the main Eclipse preferences).
Make sure the control and monitoring system choices are set to Open
Runtime Environment (ORTE). Open the PTP preferences and click on Open
RTE. Make sure the ORTE proxy service file path is /clhome/EVANS/
eclipsePlugins/eclipse/plugins/org.eclipse.ptp.linux.x86_64_1.1.0/bin/
ptp_orte_proxy. If you need to change any of these, quit out of
Eclipse and restart.

If you still have problems, please send me all the console output again.

Greg



_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user




Back to the top