Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-user] slurm tasks not honoured?

Seeing the email about the release of ptp 5.0.5 I updated eclipse and downloaded the proxy zip file recompiled utils, proxy and sdm
all seems fine, but when I run a job, the num tasks is always 1 it seems. 

Launching with 16 tasks on one node, it outputs this (note the exception every time on job launch)

SLURM@Local: ptp_slurm_proxy: Job step aborted: Waiting up to 2 seconds for job step to finish.
SLURM@Local: Send Job/Process StateChange Event: state=32772
SLURM@Local: job[15974] iothread exit on EOF/ERROR of stdout fd
SLURM@Local: job[15974] iothread exit on Error/EOF of stderr fd.
SLURM@Local: Send Job/Process StateChange Event: state=4
SLURM@Local: Job[15974] no longer exist in SLURM. Romove it!
SLURM@Local: SLURM_SubmitJob (2):
SLURM@Local: job submit commands:
SLURM@Local:    jobTimeLimit=55
SLURM@Local:    launchedByPTP=true
SLURM@Local:    jobNumProcs=16
SLURM@Local:    execPath=/project/csvis/biddisco/eiger/build/pv-os/bin
SLURM@Local:    progArgs=-rc
SLURM@Local:    progArgs=-ch=148.187.14.220
SLURM@Local:    progArgs=--use-offscreen-rendering
SLURM@Local:    jobNumNodes=1
SLURM@Local:    execName=pvserver
SLURM@Local:    jobPartition=stdMem
SLURM@Local:    jobSubId=JOB_13297370315374
SLURM@Local: Job[15975] io thread create done.
SLURM@Local: Send Job/Process StateChange Event: state=1
java.lang.NullPointerException
        at org.eclipse.ptp.ui.views.MachinesNodesView$JobListener.handleEvent(MachinesNodesView.java:111)
        at org.eclipse.ptp.rmsystem.AbstractResourceManagerMonitor.fireJobChanged(AbstractResourceManagerMonitor.java:241)
        at org.eclipse.ptp.rmsystem.AbstractResourceManager.fireJobChanged(AbstractResourceManager.java:510)
        at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManager.fireJobChanged(AbstractRuntimeResourceManager.java:145)
        at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManagerMonitor.doUpdateJobs(AbstractRuntimeResourceManagerMonitor.java:988)
        at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManagerMonitor.handleEvent(AbstractRuntimeResourceManagerMonitor.java:348)
        at org.eclipse.ptp.rtsystem.AbstractRuntimeSystem.fireRuntimeJobChangeEvent(AbstractRuntimeSystem.java:90)
        at org.eclipse.ptp.rtsystem.AbstractProxyRuntimeSystem.handleEvent(AbstractProxyRuntimeSystem.java:368)
        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.fireProxyRuntimeJobChangeEvent(AbstractProxyRuntimeClient.java:249)
        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.processRunningEvent(AbstractProxyRuntimeClient.java:677)
        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.runStateMachine(AbstractProxyRuntimeClient.java:937)
        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient$StateMachineThread.run(AbstractProxyRuntimeClient.java:94)
        at java.lang.Thread.run(Thread.java:736)

and doing a scontrol show job ID --details gives this

JobId=15975 Name=pvserver
   UserId=biddisco(20569) GroupId=csstaff(1000)
   Priority=11025 Account=csstaff QOS=normal
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:01:08 TimeLimit=00:55:00 TimeMin=N/A
   SubmitTime=12:23:51 EligibleTime=12:23:51
   StartTime=12:23:51 EndTime=12:24:59
   PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
   Partition=stdMem AllocNode:Sid=eiger220:4509
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=eiger200
   BatchHost=eiger200
   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
     Nodes=eiger200 CPU_IDs=1 Mem=0
   MinCPUsNode=1 MinMemoryCPU=12000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=(null)

I suspect the generation of the slurm params is fishy. Is it possible to edit them by hand? (I think there was a template somewhere, but I can't remember/find it). 

It's quite possible I'm doing something wrong as I'm new to this.

Any advice welcome. 
thanks

JB




Back to the top