[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[ptp-dev] Fwd: [O-MPI devel] Re: Identifying MPI programming problems
|
I believe the following message didn't make it to the PTP list....
Begin forwarded message:
From: Edgar Gabriel <gabriel@xxxxxxx>
Date: June 2, 2005 2:12:47 AM MDT
To: Open MPI Developers <devel@xxxxxxxxxxxx>
Cc: Matthias Mueller <mueller@xxxxxxx>, Beth Tibbitts
<tibbitts@xxxxxxxxxx>, Parallel Tools Platform general developers
<ptp-dev@xxxxxxxxxxx>, Bettina Krammer <krammer@xxxxxxx>, Justin Xue
<xue@xxxxxxxxxx>
Subject: Re: [O-MPI devel] Re: Identifying MPI programming problems
Reply-To: Open MPI Developers <devel@xxxxxxxxxxxx>
May I point you to a couple of projects, which do more or less what you
discussed here ?
There is for example the MARMOT project at my home institution. MARMOT
is an MPI Analysis and Checking tool that checks at run-time if an MPI
application conforms to the standard, if it uses non-portable
constructs, if MPI resources such as communicators, datatypes, etc. are
used in a correct way and it also checks for situations such as
deadlocks or race conditions. MARMOT makes use of the profiling
interface to intercept the MPI calls for analysis and makes use of an
additional process for checks that have to be done globally and not
locally.
There is furthermore the Umpire project, and a project called
MPI-Check, which do similar things. Furthermore, some of the MPI
implementations (e.g. NEC MPI, look at their EuroPVM/MPI paper last
year) do already all kinds of checks, including checks for group
operations, deadlock detection etc...
Thanks
Edgar
Jeffrey Squyres wrote:
I guess it depends on the exact definition of "MPI flow diagrams"...?
(I'm not familiar with the term)
Note that there are tools that do message tracing for MPI
applications (as I understand the problem) -- they generate
tracefiles of all MPI message activity. Some generate tracefiles
that can be viewed in real time, others generate tracefiles that can
be viewed post-mortem. When viewed in a nice GUI and/or are applied
in analysis tools, these kinds of tracefiles can show things like
deadlock, livelock, tag mismatches, etc. The nice thing about these
tools is that many of them are implemented at the MPI profiling
layer, which means that they can be used with any number of MPI
implementations -- they're not tied to any specific implementation.
Is this what you're talking about?
That being said, it would certainly be nice if the MPI implementation
(or a tool) could print out at run-time "Hey, you just entered a
deadlock situation and I'm going to hang until you hit ctrl-C". In
many cases, as Nathan mentioned, this is quite difficult to determine
(some entity would need to maintain a global state of all message
passing). Needless to say, such runtime analysis would incur a
performance cost, but would probably be acceptable for debugging
scenarios. Hence, this run-time notification of at least some common
types of MPI programming errors is "difficult but not impossible" for
single-threaded MPI application scenarios. It could even be done at
the same level as the tools described above -- at the MPI profiling
layer, enabling the tool to work with any MPI implementation.
But this kind of strategy becomes much more problematic in
multi-threaded application scenarios -- if the tool determines that
it's in a deadlock situation (I'm waving my hands a bit here), it's
impossible for it to know that another thread won't come along and
break the deadlock. Hence, it's impossible for the tool to know when
it's *really* in a deadlock situation (for example). You might be
able to come up with some reasonable hueristics (e.g., all threads in
all processes are blocking in MPI calls and no one is making any
progress), but I don't can't think of any ways to do this
conclusively off the top of my head (who knows if a signal handler
won't create a new thread and break the deadlock, what's a reasonable
timeout for "no progress", etc.).
On Jun 1, 2005, at 3:52 PM, Donald P Pazel wrote:
On Jun 1, 2005, at 4:31 PM, Craig Rasmussen wrote:
>
>I think Nathan has hit on a great idea (MPI flow diagrams). Do you
>Open MPI guys think this would be possible?
I'd like to mention, that what would be most interesting is to see
how MPI flow diagrams are represented from the practitioner
viewpoint, as opposed high-level design diagrams. I find that the
kind of "white board" diagrams that engineers draw daily (e.g.
blocks and arrows) and use to capture the essence of code problems
are extremely interesting and helpful, and derive from extended
experience. (Then of course we usually erase those drawings, or
leave them until than dry hard to the board.)
In any case, I think seeing these paradigmic drawings, and the
problems they address, would be very helpful as input to think about
for tools' features.
Thanks,
Don Pazel,
Craig Rasmussen <crasmussen@xxxxxxxx>
06/01/2005 04:31 PM
To: Nathan DeBardeleben <ndebard@xxxxxxxx>
cc: Greg Watson <gwatson@xxxxxxxx>, Parallel Tools
Platform general developers <ptp-dev@xxxxxxxxxxx>, Donald P
Pazel/Watson/IBM@IBMUS, Justin Xue/Watson/Contr/IBM@IBMUS, Beth
Tibbitts/Watson/IBM@IBMUS, Open MPI Developers <devel@xxxxxxxxxxxx>
Subject: Re: Identifying MPI programming problems
On Jun 1, 2005, at 11:47 AM, Nathan DeBardeleben wrote:
>
> There are definitely things that can be done, and there are
definitely
> real codes out there that could take advantage of it. But like
> anything else it can get exceedingly complicated. I personally
think
> any steps that can be made towards making MPI flow diagrams (even
> partially accurate ones) would be huge steps in the right
direction.
I think Nathan has hit on a great idea (MPI flow diagrams). Do you
Open MPI guys think this would be possible?
Cheers,
Craig
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxx
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
======================================================================
Dr.-Ing. Edgar Gabriel
Clusters and Distributed Units
High Performance Computing Center Stuttgart (HLRS)
University of Stuttgart
Tel: +49 711 685 8039 http://www.hlrs.de/people/gabriel
Fax: +49 711 678 7626 e-mail:gabriel@xxxxxxx
======================================================================
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxx
http://www.open-mpi.org/mailman/listinfo.cgi/devel