Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Job still in state "running" while already finished

Hmmm. It's possible that something has been changed in OpenMPI 1.2.3 that has broken this. The latest I've tried is 1.2.2. I'll update to 1.2.3 and see if I can repeat the problem. It would be nice to get the fix into the 1.1.1 bugfix version of PTP if possible.

Greg

On Aug 12, 2007, at 3:54 PM, Mateusz Pabis wrote:

hi *,

I'm trying to set up PTP in Eclipse. When I run small hello world
application Job1 is still running - even when all processes have already
finished.

Here is code:

#include <stdio.h>
#include <mpi.h>

int main(argc, argv)
int argc;
char *argv[];
{
        char name[BUFSIZ];
        int length, rank, size;


        MPI_Init(&argc, &argv);
        MPI_Comm_size( MPI_COMM_WORLD, &size);
        MPI_Comm_rank( MPI_COMM_WORLD, &rank);
        MPI_Get_processor_name(name, &length);
        printf("[%d/%d] %s: hello world\n", rank+1, size, name);

        MPI_Finalize();
        printf( "THE END\n" );
        return 0;
}

Each process output contains proper data:
[ / ] hostname: hello world
THE END

Even that, all processes are green (running state), and whole job is
also in running state. PTP's terminate all jobs does not take any effect.

the output from console looks like:

OMPIProxyRuntimeClient got event: EVENT_RUNTIME_JOBSTATE (jobid=1) state=1
*********** JOB STATE CHANGE: starting (job = job1)
++++++++++ ptp_orte_proxy: (debug ? 0) Spawning 5 processes of job
'/home/uranium/workspace/hello_2/Debug/hello_2'
++++++++++ ptp_orte_proxy:      program name
'/home/uranium/workspace/hello_2/Debug/hello_2'
++++++++++ ptp_orte_proxy: SPAWNED [error code 0 = 'Success'], now unlocking
++++++++++ ptp_orte_proxy: NEW JOBID = 2
++++++++++ ptp_orte_proxy: registering IO forwarding - name = ''
++++++++++ ptp_orte_proxy: Returning from ORTERun
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCATTR job=1 {0}:<>
[0]:<ATTRIB_PROCESS_NODE_NAME=menhir> [0]:<ATTRIB_PROCESS_PID=2130>
*********** PROC ATTRIBUTE CHANGE: (job = job1)
setting node[job1_process0]=menhir(0)
setting pid[job1_process0]=2130
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCATTR job=1 {0}:<>
[1]:<ATTRIB_PROCESS_NODE_NAME=menhir> [1]:<ATTRIB_PROCESS_PID=2131>
*********** PROC ATTRIBUTE CHANGE: (job = job1)
setting node[job1_process1]=menhir(0)
setting pid[job1_process1]=2131
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCATTR job=1 {0}:<>
[2]:<ATTRIB_PROCESS_NODE_NAME=menhir> [2]:<ATTRIB_PROCESS_PID=2132>
*********** PROC ATTRIBUTE CHANGE: (job = job1)
setting node[job1_process2]=menhir(0)
setting pid[job1_process2]=2132
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCATTR job=1 {0}:<>
[3]:<ATTRIB_PROCESS_NODE_NAME=menhir> [3]:<ATTRIB_PROCESS_PID=2133>
*********** PROC ATTRIBUTE CHANGE: (job = job1)
setting node[job1_process3]=menhir(0)
setting pid[job1_process3]=2133
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCATTR job=1 {0}:<>
[4]:<ATTRIB_PROCESS_NODE_NAME=menhir> [4]:<ATTRIB_PROCESS_PID=2134>
*********** PROC ATTRIBUTE CHANGE: (job = job1)
setting node[job1_process4]=menhir(0)
setting pid[job1_process4]=2134
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_JOBSTATE (jobid=1) state=2
*********** JOB STATE CHANGE: running (job = job1)
++++++++++ ptp_orte_proxy: JOB STATE CALLBACK: 2
++++++++++ ptp_orte_proxy: state callback returning state=2
XXXXXXXXXXX refreshRuntimeSystems(false), isInitialized():true
++++++++++ ptp_orte_proxy: JOB STATE CALLBACK: 4
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_JOBSTATE (jobid=1) state=2
*********** JOB STATE CHANGE: running (job = job1)
++++++++++ ptp_orte_proxy: JOB STATE CALLBACK: 8
++++++++++ ptp_orte_proxy: state callback returning state=2
++++++++++ ptp_orte_proxy: JOB STATE CALLBACK: 32
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 0 '[1/5]
menhir: hello world'
++++++++++ ptp_orte_proxy: [1/5] menhir: hello world
++++++++++ ptp_orte_proxy: [2/5] menhir: hello world
++++++++++ ptp_orte_proxy: [3/5] menhir: hello world
++++++++++ ptp_orte_proxy: [4/5] menhir: hello world
++++++++++ ptp_orte_proxy: [5/5] menhir: hello world
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 1 '[2/5]
menhir: hello world'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 2 '[3/5]
menhir: hello world'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 3 '[4/5]
menhir: hello world'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 4 '[5/5]
menhir: hello world'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 0 'THE END'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 4 'THE END'
++++++++++ ptp_orte_proxy: JOB STATE CALLBACK: 64
++++++++++ ptp_orte_proxy: THE END
++++++++++ ptp_orte_proxy: THE END
++++++++++ ptp_orte_proxy: THE END
++++++++++ ptp_orte_proxy: THE END
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 1 'THE END'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 3 'THE END'
OMPIProxyRuntimeClient got event: EVENT_RUNTIME_PROCOUT 1 2 'THE END'
++++++++++ ptp_orte_proxy: THE END
++++++++++ ptp_orte_proxy: JOB STATE CALLBACK: 128
++++++++++ ptp_orte_proxy: unregistering IO forwarding - name =


My configuration is:
Gentoo Linux
Eclipse 3.2.2
PTP 1.1.0
CDT 3.1.2
OpenMPI 1.2.3 (with devel-headers)

Compiled version from Debug dir executes smoothly when run from command
line:
mpiexec -n 5 ./hello_2

Any ideas?
Do I have to rollback to OpenMPI 1.2.0?

-- Mateusz Pabis

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user




Back to the top