Re: [ptp-dev] Customizing ptp to support MPICH+SLURM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [ptp-dev] Customizing ptp to support MPICH+SLURM

From: Greg Watson <g.watson@xxxxxxxxxxxx>
Date: Fri, 27 Jul 2007 23:16:32 -0400
Delivered-to: ptp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>
List-help: <mailto:ptp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=unsubscribe>

Hi,

Thanks for providing a very detailed description of you work with PTP. It's great to see people customizing PTP to suit their environments. Are you planning to update to PTP 2.0, or will you continue using 1.0? Would you be interested in contributing any of your changes back to the PTP community?

See other comments below....

Regards,

GReg

On Jul 27, 2007, at 6:10 AM, jiangyangtz wrote:

Hi folks,

About one year ago, we started to customize ptp 1.0 to our platform, which is an IA64 Linux cluster with SLURM resource manager. By now, most of the customization work has been finished and our modified verison of ptp works well. Here I first describe some major work we have done and then share some experience with you.

The first step of our customization work was to implement ptp runtime system based on SLURM. In the beginning, we rewrote the runtime proxy using slurm command (srun,sinfo,squeue,scancel etc.) to launch job, to query job and machine/node status, to terminate/kill job and obtain job/process outputs etc. As SLURM itself and its output is rapidly changing, we finally re-implemented the runtime proxy with SLURM API. The API-based implementation is much more efficient than the one implemented based on command interface. To support the slurm based runtime proxy, we also modified the ptp GUI front end heavily, such as provided the necessary srun options to lauch parallel job, added more icons to present job/node state, etc. By now, the runtime proxy works well with SLURM resource manager based on the proxy protocol defined in ptp 1.0.

The second step was to rewrite the parallel debugger, SDM. Our platform uses MPICH2+SLURM, without OpenMPI. And for each MPI program to be launched by slurm, it must be linked with the pmi (process management interface) library, which is implemented by slurm to provide the process management API. However, if both the sdm server and the target MPI program (debuggee) are linked with the pmi library, the debug session CANNOT be setup because the PMI interface would definitely fail when used in this case. The use of PMI forces us to rewrite the communication infrastructure of SDM with TCP socket. In our modified version of SDM, all sdm servers are launched by srun command, and each sdm server connects to the sdm client and sends a registration message to it in the initial phase before sdm server does anything else. The TCP connection remains alive throughout the lifetime of the debug session. All the debug command from sdm client to sdm server and the responses from sdm server to sdm client are transferred through the established tcp connections. However, this flat communication structure is not very scalable. Currently, it can support a debug session of at most 512 processes.

Although we have to abandon the MPI communication structure of existing SDM server due to the constraint of the PMI interface, the SLURM also give us a chance to implement the ATTACH debug facility. When launching a parallel job with srun, srun will record the process topology information of the launched job in its internal date structure. The process topology information includes each process's PID and the node hostname(address) on which it is run. The data structure used by srun to record such information is MPIR_proctable_size and MPIR_proctable. The MPIR_proctable_size describs how many entries are included in the proctalbe while each entry contains the process PID and hostname(address) of the launched job. Using srun's "--jobid=xx" option, we can launch sdm server on the allocated resources of the target job. Then these sdm servers can connect and register to the sdm client. After that, the sdm client sends ATTACH command to all sdm servers, the ATTACH command is forware d to GDB with the target process PID. So the ATTACH debug session can be established. By now, we have tested the ATTACH funciton and it also supports the debug session with 512 processes.

Here are some experience and lessons possibly useful for the development of new version of ptp:

1. The communication infrastructure must be rewritten when using MPICH+SLURM. The PMI interface can't support mpi-based SDM server and target MPI applications at the same time. I'm not sure whether this problem still exists with OpenMPI+SLURM. But it does exist for MPICH+SLURM. Since MPICH is widely used, we have to consider this problem and design a new, efficient and scalable communication infrastructure for parallel debug.

You've probably seen the announcement for a scalable communications infrastructure meeting in August. The objective of the meeting is to establish the design parameters for a generic framework that can be used by debuggers and other tools that need to operate on applications running in HPC environments. My hope is that the infrastructure eventually developed as a result of this meeting will replace the current SDM. This should go a long way to solving many of the issues associated with the current MPI version.

2. Currently, most of the efforts of ptp development lie in adding support for new resource manager. However, the parallel debugger should also get lots of attention! By now, sdm debugger only support C language, no Fortran support is built in. Even for the C language, there still exist some problems. For example, the current sdm debugger can't support the "typedef" usage in C program. If there is a "MPI_Status status" statement in the MPI program, the debugger will fail to obtain its type and the debug session will terminate unexpectedly. And the AIF data structure used in sdm server can't parse any Fortran data type, even the most premitive data type of Fortran 77, such as integer, logical, complex etc. As we know, most of parallel programs are still written in Fortran language, how can the parallel debugger be useful without (complete) support for Fortran language?

Language support is to some extent limited by the GDB interface. The typedef problem you mention is indicative of this. I'm exploring some functionality that could be added to GDB to improve the functionality of the PTP debugger.

I agree that Fortran support is currently pretty primitive. Some changes were made recently to improve gfortran support, but these may not have been back ported to 1.0. Have you made any modifications to AIF that you'd like to contribute? I'd be happy to take a look at these to see if they could be merged into the current code base.

探索 Windows Vista 的世界了解更多信息！
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

References:
- [ptp-dev] Customizing ptp to support MPICH+SLURM
  - From: jiangyangtz

Prev by Date: Re: [ptp-dev] Problems after update to latest head
Next by Date: Re: [ptp-dev] proxy interface
Previous by thread: [ptp-dev] Customizing ptp to support MPICH+SLURM
Next by thread: [ptp-dev] Problems after update to latest head
Index(es):
- Date
- Thread

Breadcrumbs