Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] Fwd: [O-MPI devel] Re: Identifying MPI programming problems

I believe the following message didn't make it to the PTP list....

Begin forwarded message:

From: Edgar Gabriel <gabriel@xxxxxxx>
Date: June 2, 2005 2:12:47 AM MDT
To: Open MPI Developers <devel@xxxxxxxxxxxx>
Cc: Matthias Mueller <mueller@xxxxxxx>, Beth Tibbitts <tibbitts@xxxxxxxxxx>, Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>, Bettina Krammer <krammer@xxxxxxx>, Justin Xue <xue@xxxxxxxxxx>
Subject: Re: [O-MPI devel] Re: Identifying MPI programming problems
Reply-To: Open MPI Developers <devel@xxxxxxxxxxxx>

May I point you to a couple of projects, which do more or less what you
discussed here ?

There is for example the MARMOT project at my home institution. MARMOT
is an MPI Analysis and Checking tool that checks at run-time if an MPI
application conforms to the standard, if it uses non-portable
constructs, if MPI resources such as communicators, datatypes, etc. are
used in a correct way and it also checks for situations such as
deadlocks or race conditions. MARMOT makes use of the profiling
interface to intercept the MPI calls for analysis and makes use of an
additional process for checks that have to be done globally and not locally.

There is furthermore the Umpire project, and a project called MPI-Check, which do similar things. Furthermore, some of the MPI implementations (e.g. NEC MPI, look at their EuroPVM/MPI paper last year) do already all kinds of checks, including checks for group operations, deadlock detection etc...

Thanks
Edgar

Jeffrey Squyres wrote:

I guess it depends on the exact definition of "MPI flow diagrams"...? (I'm not familiar with the term) Note that there are tools that do message tracing for MPI applications (as I understand the problem) -- they generate tracefiles of all MPI message activity. Some generate tracefiles that can be viewed in real time, others generate tracefiles that can be viewed post-mortem. When viewed in a nice GUI and/or are applied in analysis tools, these kinds of tracefiles can show things like deadlock, livelock, tag mismatches, etc. The nice thing about these tools is that many of them are implemented at the MPI profiling layer, which means that they can be used with any number of MPI implementations -- they're not tied to any specific implementation.
Is this what you're talking about?
That being said, it would certainly be nice if the MPI implementation (or a tool) could print out at run-time "Hey, you just entered a deadlock situation and I'm going to hang until you hit ctrl-C". In many cases, as Nathan mentioned, this is quite difficult to determine (some entity would need to maintain a global state of all message passing). Needless to say, such runtime analysis would incur a performance cost, but would probably be acceptable for debugging scenarios. Hence, this run-time notification of at least some common types of MPI programming errors is "difficult but not impossible" for single-threaded MPI application scenarios. It could even be done at the same level as the tools described above -- at the MPI profiling layer, enabling the tool to work with any MPI implementation. But this kind of strategy becomes much more problematic in multi-threaded application scenarios -- if the tool determines that it's in a deadlock situation (I'm waving my hands a bit here), it's impossible for it to know that another thread won't come along and break the deadlock. Hence, it's impossible for the tool to know when it's *really* in a deadlock situation (for example). You might be able to come up with some reasonable hueristics (e.g., all threads in all processes are blocking in MPI calls and no one is making any progress), but I don't can't think of any ways to do this conclusively off the top of my head (who knows if a signal handler won't create a new thread and break the deadlock, what's a reasonable timeout for "no progress", etc.).
On Jun 1, 2005, at 3:52 PM, Donald P Pazel wrote:

On Jun 1, 2005, at 4:31 PM, Craig Rasmussen wrote:
 >
>I think Nathan has hit on a great idea (MPI flow diagrams).  Do you
 >Open MPI guys think this would be possible?

I'd like to mention, that what would be most interesting is to see how MPI flow diagrams are represented from the practitioner viewpoint, as opposed high-level design diagrams. I find that the kind of "white board" diagrams that engineers draw daily (e.g. blocks and arrows) and use to capture the essence of code problems are extremely interesting and helpful, and derive from extended experience. (Then of course we usually erase those drawings, or leave them until than dry hard to the board.)

In any case, I think seeing these paradigmic drawings, and the problems they address, would be very helpful as input to think about for tools' features.

Thanks,

Don Pazel,






Craig Rasmussen <crasmussen@xxxxxxxx>

06/01/2005 04:31 PM
               To:        Nathan DeBardeleben <ndebard@xxxxxxxx>
cc: Greg Watson <gwatson@xxxxxxxx>, Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>, Donald P Pazel/Watson/IBM@IBMUS, Justin Xue/Watson/Contr/IBM@IBMUS, Beth Tibbitts/Watson/IBM@IBMUS, Open MPI Developers <devel@xxxxxxxxxxxx>
        Subject:        Re: Identifying MPI programming problems




 On Jun 1, 2005, at 11:47 AM, Nathan DeBardeleben wrote:
 >
> There are definitely things that can be done, and there are definitely
 > real codes out there that could take advantage of it.  But like
> anything else it can get exceedingly complicated. I personally think
 > any steps that can be made towards making MPI flow diagrams (even
> partially accurate ones) would be huge steps in the right direction.

 I think Nathan has hit on a great idea (MPI flow diagrams).  Do you
 Open MPI guys think this would be possible?

 Cheers,
 Craig


_______________________________________________
devel mailing list
devel@xxxxxxxxxxxx
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
======================================================================
Dr.-Ing. Edgar Gabriel
Clusters and Distributed Units
High Performance Computing Center Stuttgart (HLRS)
University of Stuttgart
Tel: +49 711 685 8039                http://www.hlrs.de/people/gabriel
Fax: +49 711 678 7626                e-mail:gabriel@xxxxxxx
======================================================================


_______________________________________________
devel mailing list
devel@xxxxxxxxxxxx
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Back to the top