Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Parallel Debug hanging on "Writing routing file..."

Greg,

Thank you for the very interesting and informative response, it was an enjoyable read.

Your suspicions that my issue was network related was correct. I didn't think it would be relevant at the time, but I am running my Fedora system inside a VirtualBox (Virtual Machine) and that is where my issue originated. Here are my observations of my situation for posterity:

The "routing_file" was being created and had the proper form that you said to look for. Upon inspecting the file I noticed that even though I was running the entire process locally, my host name was listed as the "target" rather than localhost as I had expected. When attempting to ping my host name I saw that I was getting back an unreachable IP address. Attempting to ping google.com yielded the same results, however, I was able to successfully ping localhost.

After much research I found the issue to lie with the default network settings for VirtualBox. By default, VirtualBox uses Network Address Translation (NAT) because "Usually, it does not require any configuration on the host network and guest system. For this reason, it is the default networking mode in VirtualBox."

Changing the networking mode of my VirtualBox from "NAT" to "Bridged Adapter" (with Promiscuous Mode to "Deny") immediately fixed the issue with pinging by host name. Once that was resolved the parallel debugging process in Eclipse via SDM worked as expected.

Thanks again Greg, I really appreciate your help and time!

Matt

On Thu, Feb 16, 2012 at 9:04 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
Matt,

Getting the debugger launched is quite a complex process, which is why it sometime fails. I'm looking at trying to make this more robust, and will hopefully have some of these changes ready for Juno.

The sdm is launched in two parts. The server processes are a started, then the sdm master process is started. Once all the sdm's are connected, the master sdm connects to your Eclipse debug session using the debug session address. The communication of all these sdm processes is via a single file called "routing_file" which is created automatically when you launch your Eclipse debug session. This file must reside in the working directory where the sdm processes are executed, and must be available to all sdm processes (via a shared file system if running on a cluster). The working directory is usually the directory containing the executable you are trying to debug, unless you've changed the working directory setting on the Application tab of the launch configuration.

Here are some things you can check to find out what is going wrong:

1. If you're running locally, check that your firewall is disabled. If you're not running locally, make sure the port forwarding setting for the resource manager is enabled.
2. Check there is a "routing_file" in the same directory as the application executable
3. Check that this file contains the same number of lines as the number of processes you're trying to debug (the first line is the total number of processes)
4. The second entry on each line should be a valid hostname/IP address
5. This file must be available to all nodes in the cluster (if you're running on a cluster) via a shared filesystem

Regards,
Greg


On Feb 16, 2012, at 4:09 AM, Matt Klein wrote:

> Hello all,
>
> I am trying to get my MPI development system set up and I'm stuck on
> the final step of getting the debugging of a parallel program to work.
> When I launch my parallel application debug configuration I get a
> progress window that reads "Operation in progress... \\ Writing
> routing file..." and it hangs there until I click the Cancel button,
> at which time I receive an "Error completing debug job launch \\
> Reason: Cannot connect to debugger" message. On the off chance that it
> was just going really slow I've let it sit for over 10 minutes, but
> there was still no progress.
>
> I am using a local resource manager for Open MPI with OpenMPI 1.5.4
> installed. I am using the "MPI Hello World C Project" and am able to
> successfully launch the program using a Parallel Application run
> configuration.
>
> I built the sdm program by running the BUILD script in the
> ptp.linux.x86_5.0.2 directory (I'm running Fedora 16 32-bit).  I then
> updated my Parallel Application debug configuration (based on the
> previous run configuration) to use SDM as the debugger, gdb-mi as the
> debugger backend, the sdm program I previously built as the debugger
> executable, and 'localhost' as the debugger session address (as
> indicated in the help file). I then click Apply and then Debug and get
> stuck in the situation mentioned above.
>
> This is a fresh Fedora 16 install and I started with the "Eclipse IDE
> for Parallel Application Developers" and then built/installed OpenMPI.
> I'm wondering if I may be missing some dependencies, but I haven't
> been able to find anything to suggest other packages that I need to
> install. Any advice or suggestions would be greatly appreciated!
>
> Thank you,
> Matt
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user


Back to the top