Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-user] Debugging with an interactive job

Hi All,

We are rolling out a new service called OSC OnDemand, which uses dedicated nodes to provide a remote desktop via VNC to users.  These viz nodes are similar in configuration to the login nodes of a cluster (they're not available for scheduling by the batch system), except that they have graphics cards and are provisioned to support a number of interactive desktop sessions.

I have been experimenting with Eclipse on these remote desktops.  I am considering providing Eclipse via OnDemand as an alternative to forcing every user configure their personal systems for Eclipse PTP (although I certainly plan to support that case).

To start off, I would like to give the use case I am trying to support, and then get to my questions:
	1.  User connects to a remote desktop via OSC OnDemand.
	2.  User launches eclipse on the remote desktop.  Opens his/her project and gets it to build.  Now, the user wants to debug.
	3.  User opens an xterm and submits an interactive job to reserve a set of compute nodes for debugging (WLOG, let's say 4 nodes).
	4.  The interactive job starts, effectively giving the user an on-demand cluster for repeatedly running/debugging without going through the queue each time.
	5.  User configures a resource manager to talk to the on-demand cluster.
	6.  User creates a run configuration for the project using the new resource manager.
	7.  The user can debug his project in Eclipse via SDM.

So, to see how close I could get to supporting this use case, I grabbed an example "Hello World" that prints the node name in addition to the rank and size:
http://mpi.deino.net/mpi_functions/MPI_Get_processor_name.html 

At the command line, I got an interactive job and ran the code like this:
dhudak@opt2650 575%> qsub -l walltime=3:00:00,nodes=4:ppn=4:olddual -X -I 
qsub: waiting for job 6459764.opt-batch.osc.edu to start
qsub: job 6459764.opt-batch.osc.edu ready

dhudak@opt0492 573%> cd workspace/mpiHello
dhudak@opt0492 577%> mpiexec -pernode ./hello
Hello, world.  I am 0 of 4 on opt0492.ten.osc.edu
Hello, world.  I am 1 of 4 on opt0619.ten.osc.edu
Hello, world.  I am 3 of 4 on opt0003.ten.osc.edu
Hello, world.  I am 2 of 4 on opt0588.ten.osc.edu

Now, I went back to eclipse and created an OpenMPI resource manager for the mother superior node (opt0492).  The OpenMPI resource manager has no idea that the other nodes are allocated to the job (they do not show up in the Parallel Runtime machines list).  I hoped (naively) that the net effect of starting the job would be similar to running mpiexec at the shell, so I ran it anyways hoping that mpiexec would "do the right thing" and talk to Torque to launch one process per node, but it launched all 4 processes on the one node:

Hello, world.  I am 2 of 4 on opt0492.ten.osc.edu

Hello, world.  I am 3 of 4 on opt0492.ten.osc.edu

Hello, world.  I am 1 of 4 on opt0492.ten.osc.edu

Hello, world.  I am 0 of 4 on opt0492.ten.osc.edu

Any recommendations on the "right" way to do this?  I am not sure of the support SDM has with other resource manager types (like MPICH or Remote Launch).  And, I am dealing with the batch system external to eclipse, so I don't think a resource manager that talks to a batch system (like PBS) is appropriate.  

Thanks,
Dave
---
David E. Hudak, Ph.D.          dhudak@xxxxxxx
Program Director, HPC Engineering
Ohio Supercomputer Center
http://www.osc.edu











Back to the top