Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Parallel Debug hanging on "Writing routing file..."

Greg, Matt,

Yes indeed getting the debugger launched is really quite a complex process. Even more complex when you write your own LML JAXB files !
And then after you arrived to launch you may experiment (at least I experimented) segmentation fault in sdm. Is there is any way used to debug sdm ?

One thing that may interest me also is to get the debugger working on batch mode?  The routing file would be created when the batch job begins and at the same time the sdm master should start...  Greg, do you think it is possible ?

Regards,

Jean-Christophe.



-----Message d'origine-----
De : ptp-user-bounces@xxxxxxxxxxx [mailto:ptp-user-bounces@xxxxxxxxxxx] De la part de Greg Watson
Envoyé : jeudi 16 février 2012 17:05
À : PTP User list
Objet : Re: [ptp-user] Parallel Debug hanging on "Writing routing file..."

Matt,

Getting the debugger launched is quite a complex process, which is why it sometime fails. I'm looking at trying to make this more robust, and will hopefully have some of these changes ready for Juno.

The sdm is launched in two parts. The server processes are a started, then the sdm master process is started. Once all the sdm's are connected, the master sdm connects to your Eclipse debug session using the debug session address. The communication of all these sdm processes is via a single file called "routing_file" which is created automatically when you launch your Eclipse debug session. This file must reside in the working directory where the sdm processes are executed, and must be available to all sdm processes (via a shared file system if running on a cluster). The working directory is usually the directory containing the executable you are trying to debug, unless you've changed the working directory setting on the Application tab of the launch configuration. 

Here are some things you can check to find out what is going wrong:

1. If you're running locally, check that your firewall is disabled. If you're not running locally, make sure the port forwarding setting for the resource manager is enabled.
2. Check there is a "routing_file" in the same directory as the application executable 3. Check that this file contains the same number of lines as the number of processes you're trying to debug (the first line is the total number of processes) 4. The second entry on each line should be a valid hostname/IP address 5. This file must be available to all nodes in the cluster (if you're running on a cluster) via a shared filesystem

Regards,
Greg


On Feb 16, 2012, at 4:09 AM, Matt Klein wrote:

> Hello all,
> 
> I am trying to get my MPI development system set up and I'm stuck on 
> the final step of getting the debugging of a parallel program to work.
> When I launch my parallel application debug configuration I get a 
> progress window that reads "Operation in progress... \\ Writing 
> routing file..." and it hangs there until I click the Cancel button, 
> at which time I receive an "Error completing debug job launch \\
> Reason: Cannot connect to debugger" message. On the off chance that it 
> was just going really slow I've let it sit for over 10 minutes, but 
> there was still no progress.
> 
> I am using a local resource manager for Open MPI with OpenMPI 1.5.4 
> installed. I am using the "MPI Hello World C Project" and am able to 
> successfully launch the program using a Parallel Application run 
> configuration.
> 
> I built the sdm program by running the BUILD script in the
> ptp.linux.x86_5.0.2 directory (I'm running Fedora 16 32-bit).  I then 
> updated my Parallel Application debug configuration (based on the 
> previous run configuration) to use SDM as the debugger, gdb-mi as the 
> debugger backend, the sdm program I previously built as the debugger 
> executable, and 'localhost' as the debugger session address (as 
> indicated in the help file). I then click Apply and then Debug and get 
> stuck in the situation mentioned above.
> 
> This is a fresh Fedora 16 install and I started with the "Eclipse IDE 
> for Parallel Application Developers" and then built/installed OpenMPI.
> I'm wondering if I may be missing some dependencies, but I haven't 
> been able to find anything to suggest other packages that I need to 
> install. Any advice or suggestions would be greatly appreciated!
> 
> Thank you,
> Matt
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user


Back to the top