Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Problem with PE proxy


Brett
The libllapi.so library is loaded by dlopen() so won't show up in ldd output. It works this way primarily since the proxy needs to work on systems where LoadLeveler is not installed. There is a field in the proxy invocation options dialog 'Alternate LoadLeveler library path' where you can specify an alternate path to the library and where you can specify /opt/ibmll/LoadL/scheduler/full/lib.

Alternatively, you could modify the Linux libpath[] array in the proxy source  org.eclipse.ptp.rm.ibm.pe.proxy/src/ptp_ibmpe_proxy.c to add /opt/ibmll/LoadL/scheduler/full/lib as an additional directory to search for libllapi.

You also need to be sure the proxy is compiled as a 64 bit executable. On my system I did this by first 'export CFLAGS=-m64' then running make clean, configure and make for each of the proxy directories. Running the BUILD script should accomplish the same thing.

I should add the new library location to the internal library list in the proxy. Can you write a bugzilla report for that too?
Dave


From: Brett Bode <bbode@xxxxxxxxxxxxx>
To: PTP User list <ptp-user@xxxxxxxxxxx>
Date: 08/30/2010 10:08 PM
Subject: Re: [ptp-user] Problem with PE proxy
Sent by: ptp-user-bounces@xxxxxxxxxxx





Dave,
  That helps a lot. I completely missed the option to use LL in the PE proxy. I see that adds several important LL related options. Your suggestions do indeed allow me to get a job running on our AIX box. I see there is an option for MP_LLFILE now too which is what I was really looking for as well. MP_CMDFILE is something a bit different.

I tried the same on our Linux on Power7 node that is running the latest PE and LL and ran into the error below:
Launch command: [/home/bbode/debug-6-10/proxy-bin/ptp_ibmpe_proxy, --proxy=tcp, --host=localhost, --port=64546, --debug=1, --useloadleveler, --multicluster=d, --node_polling_min=30, --node_polling_max=120, --job_polling=30, --trace=None]
IBM PE@BlueDropProxyRuntimeClient: Waiting on accept.
IBM PE@BlueDrop: 08/30 21:01:57 T(256) Error: Search failure: "stat" of LoadLeveler shared library /opt/ibmll/LoadL/full/lib//libllapi.so returned errno=2.
IBM PE@BlueDrop: 08/30 21:01:57 T(256) Error: Search failure: "stat" of LoadLeveler shared library /opt/ibmll/LoadL/so/lib//libllapi.so returned errno=2.
IBM PE@BlueDrop: 08/30 21:01:57 T(256) Fatal: No LoadLeveler shared library found - quitting...

Do you know how the proxy is loading the llapi shared library? ldd doesn't list it as a regular shared object. On our system the object is in:
bbode@bd-login:~> locate libllapi.so
/opt/ibmll/LoadL/scheduler/full/lib/libllapi.so
/usr/lib64/libllapi.so
/usr/lib64/libllapi.so.1
/usr/lib64/libllapi.so.1.0.0

Note the "scheduler" bit in the path that I assume is new to the latest version of LL as a result of the separation of the resource manager and scheduler in the LL product.

Thanks!
Brett
On Aug 30, 2010, at 8:47 PM, Dave Wootton wrote:

>
> Brett
> I solved the problems that were preventing me from running interactive PE jobs using LoadLeveler and can now run a simple two task MPI application interactively as follows:
> 1) Identify the LoadLeveler job class defined for interactive jobs. On my setup it's inter_class.
> 2) Make sure the PE resource manager in PTP is set up to use LoadLeveler by clicking the checkbox in the resource manager options dialog in the resource manager wizard pages marked 'Use LoadLeveler'
> 3) start the resource manager
> 4) Create a run configuration where you change the settings for the following fields in the tabbed widget in the resources pane of teh run configuration
> Tasks Tab: Number of Tasks: # of application tasks - I specified 2.
> Tasks Tab: Number of Nodes: # of nodes to use: I specified 1
> Tasks Tab: Tasks per Node: # application tasks / # nodes. I used 2
> Nodes Tab:Resource Pool: LoadLeveler interactive jobs class. I used inter_class
>
> This should work and is identical to setting the following poe environment variables then running the application manually
> MP_RESD=yes
> MP_PROCS=2
> MP_RESD=yes
> MP_RMPOOL=inter_class
> MP_TASKS_PER_NODE=2
>
> Let me know if that works for you as a starting point.
>    
> Dave
>
>
>                  
> Re: [ptp-user] Problem with PE proxy
>
> Dave Wootton                 to:                  PTP User list                
> 08/30/2010 03:52 PM
>
> Sent by:                 ptp-user-bounces@xxxxxxxxxxx
>
> Please respond to PTP User list                
>
>
>
>
>
>
>
> Brett
>
> The text field for advanced mode is supposed to accept a pathname to a file that contains a list of PE environment variable settings, for instance
> MP_PROCS=2
> MP_HOSTFILE=/tmp/hostfile
>
> At the moment this is broken since the browse button navigates to the file on the remote system but the Eclipse code tries to find it on the local system. Please write a bugzilla bug for this.
>
> If you use basic mode, the PE proxy is supposed to do what you want, where there is a field in the 'Nodes' tab of the resources pane where you can enter the pathname of a LoadLeveler command file. You should make sure that if you use this that any of the other PE option settings related to node selection or resource allocation are cleared so you don't have conflicting PE environment variable settings. Alternatively, leave the command file field blank and use the other node allocation parameters to specify your setup. The idea is that you set up the run configuration as if you were invoking the poe process interactively after setting the correct MP_* environment variables.
>
> Unfortunately, my test system has a problem with it's LoadLeveler setup where I can't run any LoadLeveler/PE jobs. If what I suggest above doesn't help, then I will need to get someone from our LoadLeveler team to help me sort this out.
>
> Dave
>
> From:                 Brett Bode <bbode@xxxxxxxxxxxxx>
> To:                 PTP User list <ptp-user@xxxxxxxxxxx>
> Date:                 08/27/2010 09:56 AM
> Subject:                 [ptp-user] Problem with PE proxy
> Sent by:                 ptp-user-bounces@xxxxxxxxxxx
>
>
>
>
>
> Hello,
>   I am attempting to use the PE proxy to run tasks remotely on a LOP system running PE and LL. This system is setup to discourage interactive jobs and thus appears to disallow poe invocations that specify a hostlist. I can run poe interactively via the command line using the -llfile keyword to specify an LL script file with a few LL commands as follows:
> #@ job_type = parallel
> #@ node_usage = not_shared
> #@ environment = COPY_ALL
> #@ tasks_per_node = 2
> #@ node = 1
> #@ wall_clock_limit = 0:15:00
> #@ queue
>
> The problem is I can't seem to figure out how to make this work via the PE proxy. I have tried various setups using the basic mode as well as using advanced mode. When I use advanced mode I have a simple file (on the remote system) containing a single PE keyword:
> MP_LLFILE=llfile
>
> This fails as well. Here is a debug trace from the Eclipse application that seems to indicate that it can't find the script file. However, the path specified is correct so I am not sure why it can't locate the file. Alternatively is there an "advanced" mode that simple uses a LL script file? By the way, I need to run interactively for other reasons so I don't think the LL proxy will work for my situation.
>
> PE Environment setup file /home/bbode/WorkSpace/Test-Aug/script not found.
> SEND:[0000013d 0005:00000002:00000008 00000009:queueId=2 00000027:execPath=/home/bbode/WorkSpace/Test-Aug 00000014:debugStopInMain=true 00000024:env=LD_LIBRARY_PATH=/opt/ibmcmp/lib/ 0000001b:jobSubId=JOB_12829167182584 00000012:launchedByPTP=true 00000018:execName=mpi-comm-test.x 00000029:workingDir=/home/bbode/WorkSpace/Test-Aug] -> Worker-27
> RECEIVE:[000000aa ->  00dc:00000001:00000007 00000001:2 00000001:1 00000001:4 00000001:3 0000001b:jobSubId=JOB_12829167182584 0000001d:name=bbode.JOB_12829167182584 00000011:jobState=STARTING] -> Proxy Client Event Thread
> RECEIVE:[00000017 ->  0000:00000002:00000000] -> Proxy Client Event Thread
> RECEIVE:[000000be ->  00df:00000001:00000009 00000001:4 00000001:1 00000001:0 00000001:5 00000008:name=poe 00000014:processState=RUNNING 0000000f:processNodeId=0 0000000e:processIndex=0 00000010:processPID=25673] -> Proxy Client Event Thread
> RECEIVE:[000000b4 ->  00e9:00000001:00000005 00000001:4 00000001:1 00000001:0 00000001:1 00000067:processStderr=ERROR: 0031-121  Invalid combination of settings for MP_EUILIB, MP_HOSTFILE, and MP_RESD
> ] -> Proxy Client Event Thread
> RECEIVE:[00000054 ->  00e6:00000001:00000004 00000001:1 00000001:4 00000001:1 00000012:jobState=COMPLETED] -> Proxy Client Event Thread
> RECEIVE:[00000063 ->  00e9:00000001:00000005 00000001:4 00000001:1 00000001:0 00000001:1 00000016:processState=COMPLETED] -> Proxy Client Event Thread
>
>
> Thanks,
> Brett
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
>
https://dev.eclipse.org/mailman/listinfo/ptp-user
>
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
>
https://dev.eclipse.org/mailman/listinfo/ptp-user
>
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
>
https://dev.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user



Back to the top