Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Parallel Debug hanging on "Writing routing file..."

Matt,

Getting the debugger launched is quite a complex process, which is why it sometime fails. I'm looking at trying to make this more robust, and will hopefully have some of these changes ready for Juno.

The sdm is launched in two parts. The server processes are a started, then the sdm master process is started. Once all the sdm's are connected, the master sdm connects to your Eclipse debug session using the debug session address. The communication of all these sdm processes is via a single file called "routing_file" which is created automatically when you launch your Eclipse debug session. This file must reside in the working directory where the sdm processes are executed, and must be available to all sdm processes (via a shared file system if running on a cluster). The working directory is usually the directory containing the executable you are trying to debug, unless you've changed the working directory setting on the Application tab of the launch configuration. 

Here are some things you can check to find out what is going wrong:

1. If you're running locally, check that your firewall is disabled. If you're not running locally, make sure the port forwarding setting for the resource manager is enabled.
2. Check there is a "routing_file" in the same directory as the application executable
3. Check that this file contains the same number of lines as the number of processes you're trying to debug (the first line is the total number of processes)
4. The second entry on each line should be a valid hostname/IP address
5. This file must be available to all nodes in the cluster (if you're running on a cluster) via a shared filesystem

Regards,
Greg


On Feb 16, 2012, at 4:09 AM, Matt Klein wrote:

> Hello all,
> 
> I am trying to get my MPI development system set up and I'm stuck on
> the final step of getting the debugging of a parallel program to work.
> When I launch my parallel application debug configuration I get a
> progress window that reads "Operation in progress... \\ Writing
> routing file..." and it hangs there until I click the Cancel button,
> at which time I receive an "Error completing debug job launch \\
> Reason: Cannot connect to debugger" message. On the off chance that it
> was just going really slow I've let it sit for over 10 minutes, but
> there was still no progress.
> 
> I am using a local resource manager for Open MPI with OpenMPI 1.5.4
> installed. I am using the "MPI Hello World C Project" and am able to
> successfully launch the program using a Parallel Application run
> configuration.
> 
> I built the sdm program by running the BUILD script in the
> ptp.linux.x86_5.0.2 directory (I'm running Fedora 16 32-bit).  I then
> updated my Parallel Application debug configuration (based on the
> previous run configuration) to use SDM as the debugger, gdb-mi as the
> debugger backend, the sdm program I previously built as the debugger
> executable, and 'localhost' as the debugger session address (as
> indicated in the help file). I then click Apply and then Debug and get
> stuck in the situation mentioned above.
> 
> This is a fresh Fedora 16 install and I started with the "Eclipse IDE
> for Parallel Application Developers" and then built/installed OpenMPI.
> I'm wondering if I may be missing some dependencies, but I haven't
> been able to find anything to suggest other packages that I need to
> install. Any advice or suggestions would be greatly appreciated!
> 
> Thank you,
> Matt
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user



Back to the top