Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] ptp-user Digest, Vol 107, Issue 2

Unfortunately parallel debugging is very specific to the system you have. There are many factors that have to be taken into account to get it to work, ranging from the version of gdb, the MPI runtime, the resource manager, the system architecture, and even how you configure your nodes for resolving domain names. Everyone does these things differently, and if any one of them goes wrong, then the debugger will fail to start. We have tried to simplify the process, but without constantly updating PTP every time a new version of MPICH2 comes out, or if an environment variable changes when launching with Torque, then it is impossible to guarantee that it will work.

Since we have no resources to do any of this, the best we can do is provide a few configurations that we know worked, and leave it up to the users to get it working for their system. Other than the online documentation, I have written a few emails to this list (and ptp-dev) describing how it works and what you need to do to get it working if there are problems. There is also this talk[1] I gave at one of our workshops.

Apart from this, the best I can do is answer specific questions. If anyone would like to fund or contribute to more debugger development, let me know!

[1] https://wiki.eclipse.org/images/7/72/PTPUserDev2012_Debugger.pdf

> On Oct 8, 2015, at 2:42 AM, Christoph Pospiech <cpospiech@xxxxxxxxxx> wrote:
> 
> On Wednesday, October 07, 2015 12:00:20 PM Hossein Aghakhani wrote
>> I thought here I can get the answer to my question, do you know where is
>> the best place to ask ptp related questions?
>> 
>> Can anyone please introduce me a step-by-step tutorial about debugging a
>> parallel code on a local computer using ptp debugger.
>> 
>> Regards,
>> Hossein
> 
> Hossein, (et all,)
> 
> the general documentation for parallel debugging can be found 
> in the following place
> http://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.ptp.doc.user%2Fhtml%2Ftoc.html
> 
> Please note the following prerequisites.
> 1. Eclipse parallel debugging uses gdb under the hood, so gdb and gdbserver 
> need to be installed on the remote host.
> 2. You should compile with '-g' and no optimization, otherwise you might 
> find yourself in a mirror cabinet debugging a Fata Morgana. Typically the
> debugger is looking at a memory location while actual value of the
> variable might be kept in a CPU register.
> 3. You should make sure that the effect you want to debug is still 
> reproducible after recompilation with '-g' and no optimization.
> Some errors disappear when lowering the optimization level.
> 4. For every Eclipse debug configuration there is a corresponding run
> configuration (same name, same menus except for the debug tab).
> Make sure the application runs in the run configuration (at least past
> MPI_Init()) and still reproduces the effect you want to debug.
> If it doesn't run without debugger, than it surely won't run
> in the debugger.
> 
> Given all these prerequisites, compiling with GNU compiler and OpenMPI,
> and using the Target System Configuration "Generic OpenMPI Interactive"
> with connection type "local", I was just able to fire up the debugger and
> stop at the first statement of main() - out of the box.
> 
> This was for Mars, and it is the first time since Luna, Kepler,
> Juno, Indigo, ... that it worked at all.
> 
> That said, I didn't touch the Debug tab and used the built-in SDM.
> In particular I kept the box with label "Use built-in..." ticked.
> No meddling with BUILD.sh.
> 
> That said, I couldn't repeat the same success with Intel Compiler
> and Intel MPI, using the Generic MPICH2 Interactive - assuming
> that Intel MPI looks like MPICH2 on the surface.
> 
> ...And eventually I would need parallel debugging with Intel
> compiler, Intel MPI and "Generic Slurm Batch" - which is 
> not in the list of supported TSCs. Do I have to give up, or is there
> any way to sneak through ? Perhaps if I use "Generic Remote 
> Interactive" and do all the Slurm+gdbserver plumbing myself ? 
> What plumbing would be needed ?
> -- 
> 
> Mit freundlichen Grüßen / Kind regards
> 
> Dr. Christoph Pospiech
> High Performance & Parallel Computing
> Phone: +49-351 86269826
> Mobile: +49-171-765 5871
> E-Mail: cpospiech@xxxxxxxxxx
> 
> Lenovo (Deutschland) GmbH
> Meitnerstr. 9
> D-70563 Stuttgart
> 
> Geschäftsführung: Bernhard Fauser
> Sitz der Gesellschaft: Stuttgart
> HRB-Nr.: 25189, AG Stuttgart
> WEEE-Reg.-Nr.: DE79679404
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/ptp-user



Back to the top