Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Can't debug in parallel

Thank you for your help and patience, Greg!

John

On Mar 26, 2019, at 1:26 PM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:

Removing .eclipsesettings did the trick.

Before I updated I did edit a perl script in .eclipsesettings, inserting some print statements to try and figure out what was going on. Perhaps that is why deleting .eclipsesettings helped now.

The only other change to my environment (that I can think of) was that I had to add /usr/local to my PATH so that eclipse could find mpicc and mpirun.

John

On Mar 26, 2019, at 1:11 PM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:

Have you made any other changes to your environment? I can't reproduce it on my setup.

One thing you could try is removing the ~/.eclispesettings directory and relaunch Eclipse.

Greg

On Mar 26, 2019, at 1:02 PM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:

Tried again with a new workspace. Same results as before, except that nothing is printed to the console in the new workspace (perhaps I didn't turn on a debug setting). Also, I am still able to run the program (without debugger) from Eclipse 2019-03.

On Tue, Mar 26, 2019 at 12:27 PM John Haiducek <jhaiduce@xxxxxxxxx> wrote:
Hi Greg,

I tried Eclipse 2019-03 for Scientific Computing, and it behaves the same as the previous version (previous version was Eclipse 4.8.0 installed from the Snap Store):

  • Starting the debug job opens the "operation in progress" dialog which stays open until clicking Cancel.
  • After clicking Cancel, "Error connecting to debug job launch, Reason: error connecting to debugger"
  • The following is printed to the console:
    submit-interactive-debug: a6696449-e736-459e-a75d-71c1ee3fde9e: perl /home/jhaiducek/.eclipsesettings/rms/OPENMPI/start_job.pl mpirun -np 7 --use-hwthread-cpus
  • UI is unresponsive until sdm is killed manually
This is using the previous Eclipse workspace, which was updated to the new version. I will try again with a new workspace.

John

On Tue, Mar 26, 2019 at 12:02 PM Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
Hi John,

Can you try with the 2019-03 Eclipse IDE for Scientific Computing package [1]? I can launch a debug job using Open MPI 3.1.3 (I had to add --oversubscribe to the advanced options for the target configuration.)

Regards,
Greg


On Mar 21, 2019, at 10:04 AM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:

And to answer your question, yes I was able to run the program with mpirun --oversubscribe -np 7 …path_to_executable…

On Mar 21, 2019, at 10:03 AM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:

Greg,

That “not enough slots” error occurs with recent versions of openmpi if you try to run more processes than there are CPU cores in the machine (or cluster). If you pass the --oversubscribe option to mpirun it should run without that error.

By the way, I’m at an all-day meeting today so I don’t have access to the machine that’s having the problem. I’ll try anything you suggest tomorrow. Thank you again for your help with this!

John

On Thu, Mar 21, 2019, 09:54 Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
John,

Can you verify that you can run the program with 7 processes using the command "mpirun -np 7 ...path_to_executable...". When I try to run more than 2 processes, I'm getting:

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
  ./test

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------

I haven't tracked down why this is happening yet.

Regards,
Greg

On Mar 21, 2019, at 7:51 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:

John,

Sorry for the delay. I'll take a look at this today.

Regards
Greg

On Mar 14, 2019, at 2:51 PM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:

Ok, I upgraded to openmpi 3.1.3 and am now getting the same behavior as before. Eclipse is stuck at "Operation in progress" when I launch a debug job, says "Cannot connect to debugger" when I click Cancel, and the UI is unresponsive until I manually kill sdm.

Eclipse prints the following to the console:

mpirun -np 7 --use-hwthread-cpus
#PTP job_id=12156
mpirun -np 7 --use-hwthread-cpus /home/jhaiducek/.eclipsesettings/sdm --port=43775 --host=localhost --debugger=gdb-mi --debug=127 --routing_file=/home/jhaiducek/eclipse-workspace/mpi_hello_world/Debug/routes_c2f475e7-63c1-4a8a-ac8b-40171c9f2ec2

And in the shell I opened eclipse from I get the following:

submit-interactive-debug: c2f475e7-63c1-4a8a-ac8b-40171c9f2ec2: perl /home/jhaiducek/.eclipsesettings/rms/OPENMPI/start_job.pl mpirun -np 7 --use-hwthread-cpus


On Thu, Mar 14, 2019 at 2:15 PM John Haiducek <jhaiduce@xxxxxxxxx> wrote:
Thanks! I vaguely recall seeing some messages indicating a segfault but I couldn't tell where they were coming from (eclipse, mpirun, sdm, gdb, or my application). Looks like openmpi 3.1.3 hasn't been packaged for Ubuntu 18.04, so I'll have to build from source. Will message back once I have that.


On Mar 14, 2019, at 9:40 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:

John,

There's a bug in OpenMPI 2.x and 3.1.0 that causes mpirun to segfault. You'll need to update to 3.1.3 or later for it to work.

Here's a link to the OpenMPI issue: https://github.com/open-mpi/ompi/issues/5165

Regards,
Greg

On Mar 12, 2019, at 6:46 PM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:

I'm on just a single machine.

On Tue, Mar 12, 2019 at 6:18 PM Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
Hi John,

This can be tricky to diagnose. Are you running on a cluster or just on a single machine?

Regards,
Greg

> On Mar 12, 2019, at 11:56 AM, John Haiducek <jhaiduce@xxxxxxxxx> wrote:
>
> Hi,
>
> I'm trying to run the PTP parallel debugger on the MPI Hello World example provided with PTP. I can run the code in parallel just fine, and I can debug the same code with eclipse in serial. But when I run the parallel debugger it gets stuck at "Operation in progress..." and the code never starts. If I press Cancel I get "Launch Error: Error completing debug job launch Reason: Cannot connect to debugger." At that point the Eclipse GUI becomes unresponsive until I kill all the sdm processes that were created.
>
> I tried turning on "Enable SDM tracing" (and all the options listed under it), but that seems to do nothing.
>
> I'm running Eclipse Photon (4.8.0) with PTP  9.2.0.201805221500 on Ubuntu Linux 18.04.2 LTS with openmpi 2.1.1 and gdb 8.1.0.20180409-git.
>
> John
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user
_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user
_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/ptp-user



Back to the top