Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] slurm tasks not honoured?

Greg,

I'm away from the office until Friday, but if I can help, then consider me volunteered. I have no clue where to start, so if you can send instructions that I can follow to experiment, then please do and I'll happily play with things when I get back

JB

-----Original Message-----
From: ptp-user-bounces@xxxxxxxxxxx [mailto:ptp-user-bounces@xxxxxxxxxxx] On Behalf Of Greg Watson
Sent: 21 February 2012 15:40
To: PTP User list
Subject: Re: [ptp-user] slurm tasks not honoured?

John,

I'm not sure that the current SLURM resource manager has been very thoroughly tested, so it's possible you're seeing some bugs with this implementation. Ideally we would like to transition from this version to the new RM framework (the one used for PBS), but need someone who has access to a SLURM system to volunteer to write/test a configuration file.

Regards,
Greg

On Feb 20, 2012, at 11:18 AM, Biddiscombe, John A. wrote:

> I think a more general question would be With a slurm resource manager 
> running - which seems to be fine - How do I enter specific mvapich2 settings to ensure that when the job is submitted to slurm, the correct launch procedure is followed?
> 
> the PBS resource manager seems to have all the options for changing the MPI command etc, but I can't find the equivalent using slurm.
> (our system changed from PBS to slurm a few months ago and this is my first attempt to setup things since then).
> (we are using slurm-2.3.0-pre5 by the look of things)
> 
> thanks (hopefully)
> 
> JB
> 
> 
> -----Original Message-----
> From: ptp-user-bounces@xxxxxxxxxxx [mailto:ptp-user-bounces@xxxxxxxxxxx] On Behalf Of Biddiscombe, John A.
> Sent: 20 February 2012 12:29
> To: PTP User list
> Subject: [ptp-user] slurm tasks not honoured?
> 
> Seeing the email about the release of ptp 5.0.5 I updated eclipse and 
> downloaded the proxy zip file recompiled utils, proxy and sdm all seems fine, but when I run a job, the num tasks is always 1 it seems.
> 
> Launching with 16 tasks on one node, it outputs this (note the 
> exception every time on job launch)
> 
> SLURM@Local: ptp_slurm_proxy: Job step aborted: Waiting up to 2 seconds for job step to finish.
> SLURM@Local: Send Job/Process StateChange Event: state=32772
> SLURM@Local: job[15974] iothread exit on EOF/ERROR of stdout fd
> SLURM@Local: job[15974] iothread exit on Error/EOF of stderr fd.
> SLURM@Local: Send Job/Process StateChange Event: state=4
> SLURM@Local: Job[15974] no longer exist in SLURM. Romove it!
> SLURM@Local: SLURM_SubmitJob (2):
> SLURM@Local: job submit commands:
> SLURM@Local:    jobTimeLimit=55
> SLURM@Local:    launchedByPTP=true
> SLURM@Local:    jobNumProcs=16
> SLURM@Local:    execPath=/project/csvis/biddisco/eiger/build/pv-os/bin
> SLURM@Local:    progArgs=-rc
> SLURM@Local:    progArgs=-ch=148.187.14.220
> SLURM@Local:    progArgs=--use-offscreen-rendering
> SLURM@Local:    jobNumNodes=1
> SLURM@Local:    execName=pvserver
> SLURM@Local:    jobPartition=stdMem
> SLURM@Local:    jobSubId=JOB_13297370315374
> SLURM@Local: Job[15975] io thread create done.
> SLURM@Local: Send Job/Process StateChange Event: state=1 
> java.lang.NullPointerException
>        at org.eclipse.ptp.ui.views.MachinesNodesView$JobListener.handleEvent(MachinesNodesView.java:111)
>        at org.eclipse.ptp.rmsystem.AbstractResourceManagerMonitor.fireJobChanged(AbstractResourceManagerMonitor.java:241)
>        at org.eclipse.ptp.rmsystem.AbstractResourceManager.fireJobChanged(AbstractResourceManager.java:510)
>        at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManager.fireJobChanged(AbstractRuntimeResourceManager.java:145)
>        at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManagerMonitor.doUpdateJobs(AbstractRuntimeResourceManagerMonitor.java:988)
>        at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManagerMonitor.handleEvent(AbstractRuntimeResourceManagerMonitor.java:348)
>        at org.eclipse.ptp.rtsystem.AbstractRuntimeSystem.fireRuntimeJobChangeEvent(AbstractRuntimeSystem.java:90)
>        at org.eclipse.ptp.rtsystem.AbstractProxyRuntimeSystem.handleEvent(AbstractProxyRuntimeSystem.java:368)
>        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.fireProxyRuntimeJobChangeEvent(AbstractProxyRuntimeClient.java:249)
>        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.processRunningEvent(AbstractProxyRuntimeClient.java:677)
>        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.runStateMachine(AbstractProxyRuntimeClient.java:937)
>        at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient$StateMachineThread.run(AbstractProxyRuntimeClient.java:94)
>        at java.lang.Thread.run(Thread.java:736)
> 
> and doing a scontrol show job ID --details gives this
> 
> JobId=15975 Name=pvserver
>   UserId=biddisco(20569) GroupId=csstaff(1000)
>   Priority=11025 Account=csstaff QOS=normal
>   JobState=COMPLETED Reason=None Dependency=(null)
>   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
>   DerivedExitCode=0:0
>   RunTime=00:01:08 TimeLimit=00:55:00 TimeMin=N/A
>   SubmitTime=12:23:51 EligibleTime=12:23:51
>   StartTime=12:23:51 EndTime=12:24:59
>   PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
>   Partition=stdMem AllocNode:Sid=eiger220:4509
>   ReqNodeList=(null) ExcNodeList=(null)
>   NodeList=eiger200
>   BatchHost=eiger200
>   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
>     Nodes=eiger200 CPU_IDs=1 Mem=0
>   MinCPUsNode=1 MinMemoryCPU=12000M MinTmpDiskNode=0
>   Features=(null) Gres=(null) Reservation=(null)
>   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>   Command=(null)
>   WorkDir=(null)
> 
> I suspect the generation of the slurm params is fishy. Is it possible to edit them by hand? (I think there was a template somewhere, but I can't remember/find it). 
> 
> It's quite possible I'm doing something wrong as I'm new to this.
> 
> Any advice welcome. 
> thanks
> 
> JB
> 
> 
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user

_______________________________________________
ptp-user mailing list
ptp-user@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-user


Back to the top