Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Debugging with an interactive job

Hi Dave,

Unfortunately I don't think you can do it this way. As you discovered, when you request an interactive session using PBS, you get a new shell on one of the nodes. You then need run the mpi command from that node so that it can query PBS for the nodes to run on. In the current version of PTP, the new PBS-Generic-Interactive resource manager supports this type of launching. 

I added support for debugging using the new RM framework in 5.0.2, and it seems to work ok for direct Open MPI launches (i.e. without PBS in the way). What I haven't tested yet is the combination of PBS interactive launch and debugging. 

For the direct Open MPI case, I'm effectively doing this:

1. Launching the sdm server processes using the mpirun command
2. Using the ompi-ps command to determine which node each task is running on
3. Creating a routing file using the information from #2
4. Launching the sdm frontend

For the interactive PBS/Open MPI case, the procedure will need to be:

1. Request an interactive partition using 'pbs -I'
2. When the partition is allocated, launch the sdm server processes using the mpirun command 
3. Create a routing file somehow
4. Launch the sdm frontend

Steps 1 & 2, and 4 are no problem. Step 3 is the unknown at the moment. I need to know if the ompi-ps command can be used to obtain the node information in the same manner as the direct Open MPI case, or if we need to use something else. In particular, does the ompi-ps command need to be run in the same shell created by 'pbs -I' or can it be run from any shell on the system? I don't currently have access to a machine with everything set up correctly to test this.

You can see how the direct Open MPI case generates the routing file by looking at .eclipsesettings/rms/OPENMPI/generate_routing_table.pl.

Regards,
Greg

On Sep 28, 2011, at 2:26 PM, David E Hudak wrote:

> Hi All,
> 
> We are rolling out a new service called OSC OnDemand, which uses dedicated nodes to provide a remote desktop via VNC to users.  These viz nodes are similar in configuration to the login nodes of a cluster (they're not available for scheduling by the batch system), except that they have graphics cards and are provisioned to support a number of interactive desktop sessions.
> 
> I have been experimenting with Eclipse on these remote desktops.  I am considering providing Eclipse via OnDemand as an alternative to forcing every user configure their personal systems for Eclipse PTP (although I certainly plan to support that case).
> 
> To start off, I would like to give the use case I am trying to support, and then get to my questions:
> 	1.  User connects to a remote desktop via OSC OnDemand.
> 	2.  User launches eclipse on the remote desktop.  Opens his/her project and gets it to build.  Now, the user wants to debug.
> 	3.  User opens an xterm and submits an interactive job to reserve a set of compute nodes for debugging (WLOG, let's say 4 nodes).
> 	4.  The interactive job starts, effectively giving the user an on-demand cluster for repeatedly running/debugging without going through the queue each time.
> 	5.  User configures a resource manager to talk to the on-demand cluster.
> 	6.  User creates a run configuration for the project using the new resource manager.
> 	7.  The user can debug his project in Eclipse via SDM.
> 
> So, to see how close I could get to supporting this use case, I grabbed an example "Hello World" that prints the node name in addition to the rank and size:
> http://mpi.deino.net/mpi_functions/MPI_Get_processor_name.html 
> 
> At the command line, I got an interactive job and ran the code like this:
> dhudak@opt2650 575%> qsub -l walltime=3:00:00,nodes=4:ppn=4:olddual -X -I 
> qsub: waiting for job 6459764.opt-batch.osc.edu to start
> qsub: job 6459764.opt-batch.osc.edu ready
> 
> dhudak@opt0492 573%> cd workspace/mpiHello
> dhudak@opt0492 577%> mpiexec -pernode ./hello
> Hello, world.  I am 0 of 4 on opt0492.ten.osc.edu
> Hello, world.  I am 1 of 4 on opt0619.ten.osc.edu
> Hello, world.  I am 3 of 4 on opt0003.ten.osc.edu
> Hello, world.  I am 2 of 4 on opt0588.ten.osc.edu
> 
> Now, I went back to eclipse and created an OpenMPI resource manager for the mother superior node (opt0492).  The OpenMPI resource manager has no idea that the other nodes are allocated to the job (they do not show up in the Parallel Runtime machines list).  I hoped (naively) that the net effect of starting the job would be similar to running mpiexec at the shell, so I ran it anyways hoping that mpiexec would "do the right thing" and talk to Torque to launch one process per node, but it launched all 4 processes on the one node:
> 
> Hello, world.  I am 2 of 4 on opt0492.ten.osc.edu
> 
> Hello, world.  I am 3 of 4 on opt0492.ten.osc.edu
> 
> Hello, world.  I am 1 of 4 on opt0492.ten.osc.edu
> 
> Hello, world.  I am 0 of 4 on opt0492.ten.osc.edu
> 
> Any recommendations on the "right" way to do this?  I am not sure of the support SDM has with other resource manager types (like MPICH or Remote Launch).  And, I am dealing with the batch system external to eclipse, so I don't think a resource manager that talks to a batch system (like PBS) is appropriate.  
> 
> Thanks,
> Dave
> ---
> David E. Hudak, Ph.D.          dhudak@xxxxxxx
> Program Director, HPC Engineering
> Ohio Supercomputer Center
> http://www.osc.edu
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user



Back to the top