Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [ptp-user] PTP LauncherFailswithJava.lang.reflect.InvocationTargetException

Greg,

I changed my .login to have PATH and LD_LIBRARY_PATH that point to OpenMPI.
This should eliminate all the MPICH2 stuff.  However, I can't get OpenMPI to
work without --prefix.  I don't understand why.  It says it can't find orted
(see below).  However, if I ssh to the nodes:

1. "which orted" is successful
2. I can start it by typing "orted"
3. PATH and LD_LIBRARY_PATH are the same as in my .login and point to
OpenMPI.

So it seems to be properly on the nodes.

This is what I get:

43 blacklab.aps.anl.gov:openmpitest>mpirun -n 4 helloWorld
orted: Command not found.
orted: Command not found.
orted: Command not found.
orted: Command not found.
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy1 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy2 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy4 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1166
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
errmgr_hnp.c at line 90
[blacklab.aps.anl.gov:14233] ERROR: A daemon on node puppy3 failed to start
as expected.
[blacklab.aps.anl.gov:14233] ERROR: There may be more information available
from
[blacklab.aps.anl.gov:14233] ERROR: the remote shell (see above).
[blacklab.aps.anl.gov:14233] ERROR: The daemon exited unexpectedly with
status 1.
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 188
[blacklab.aps.anl.gov:14233] [0,0,0] ORTE_ERROR_LOG: Timeout in file
pls_rsh_module.c at line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned
value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
44 blacklab.aps.anl.gov:openmpitest> 

If I do the following with --prefix, it works OK:

44 blacklab.aps.anl.gov:openmpitest>mpirun --prefix /clhome/EVANS/openmpi -n
4 helloWorld
Hello World from process 0 of 4 on puppy1.cluster
Hello World from process 1 of 4 on puppy2.cluster
Hello World from process 2 of 4 on puppy3.cluster
Hello World from process 3 of 4 on puppy4.cluster
45 blacklab.aps.anl.gov:openmpitest>

I don't know enough about OpenMPI and ORTD to know what might be causing
this behavior.

BTW by changing my .login to point to the MPICH installation, I can get
MPICH, which is MPI-1, like OpenMPI, to work.  I had never used it before,
so I am about in the same position as I am with OpenMPI.

The above is independent of Eclipse.  I did check the things you mentioned
in the Eclipse preferences, and they are as they ought to be.  I notice
there is an MPICH-2 alternative to ORTD.  Does that work?

I work at Argonne, and we will be shutting down until Jan 2.  I would like
to get this working today, if possible.

BTW in regard to the TODO list.  It would seem that a general way to add
arguments to mpiexec would be the way to go (like the Java launcher and the
JVM, as I mentioned) rather than to just implement --prefix.

Thanks for your help,

        -Ken

-----Original Message-----
From: ptp-user-bounces@xxxxxxxxxxx [mailto:ptp-user-bounces@xxxxxxxxxxx] On
Behalf Of Greg Watson
Sent: Friday, December 21, 2007 10:09 AM
To: PTP User list
Subject: Re: [ptp-user] PTP
LauncherFailswithJava.lang.reflect.InvocationTargetException


On Dec 20, 2007, at 8:06 PM, Kenneth Evans wrote:

> Greg,
>
> >> I'm not sure why you need to use --prefix at all? Did you specify  
> a prefix when you ran configure on OpenMPI? That should set up  
> everything correctly so you should just need to use mpirun.
>
> I think that's a different prefix.
>
...
>

> I apparently need it to prefix the PATH and LD_LIBRARY_PATH on the  
> remote host.
>
> >> What other arguments do you need?
>
> (--prefix is one ;-)  In the link above, there are other command- 
> line arguments to mpiexec, as there are with most other  
> implementations.  In the Eclipse Java launcher, for example, you  
> have the option to enter JVM arguments _and_ program arguments.

Ok, so it looks like this prefix is to support a different install  
location on remote nodes. Unfortunately, PTP 1.1 does not support  
this. I'll add it to the PTP 2.0 TODO list. As a work around, have you  
tried setting PATH and LD_LIBRARY_PATH and then running mpiexec  
without --path?

>
>
> >> Please try running the ptp_orte_proxy program manually and let me  
> know what output you see. I suspect you have a LD_LIBRARY_PATH  
> problem.
>
>
----------------------------------------------------------------------------
-----------
> 42 blacklab.aps.anl.gov:EVANS>source ~/bin/setOpenMPI
> 43 blacklab.aps.anl.gov:EVANS>echo $PATH
> .:/clhome/EVANS/bin:/clhome/aps_tools/gcc-4.2/bin:/clhome/EVANS/ 
> openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/ 
> X11R6/bin
> 44 blacklab.aps.anl.gov:EVANS>echo $LD_LIBRARY_PATH
> /clhome/aps_tools/gcc-4.2/lib:/clhome/aps_tools/gcc-4.2/lib64:/ 
> clhome/EVANS/openmpi/lib
> 45 blacklab.aps.anl.gov:EVANS>/clhome/EVANS/eclipsePlugins/eclipse/ 
> plugins/org.eclipse.ptp.linux.x86_64_1.1.0/bin/ptp_orte_proxy
> proxy_svr_connect returned.
>
>
----------------------------------------------------------------------------
-----------
>

Here are a couple of things to try:

1. It looks like you're using PTP 1.1.0. There is a bug fix release  
(1.1.1) available from http://www.eclipse.org/ptp/downloads.php. I  
would recommend installing it just in case.

2. Click on the PTP preferences (from the main Eclipse preferences).  
Make sure the control and monitoring system choices are set to Open  
Runtime Environment (ORTE). Open the PTP preferences and click on Open  
RTE. Make sure the ORTE proxy service file path is /clhome/EVANS/ 
eclipsePlugins/eclipse/plugins/org.eclipse.ptp.linux.x86_64_1.1.0/bin/ 
ptp_orte_proxy. If you need to change any of these, quit out of  
Eclipse and restart.

If you still have problems, please send me all the console output again.

Greg



_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user



Back to the top