Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Questions related to batch job submission


On Jun 18, 2007, at 1:38 PM, Dave Wootton wrote:

Hi
We were discussing details of batch job submission thru PTP and had some
questions about expected PTP behavior and how we should implement our
support

1) The user can submit a job which contains a set of job steps. From our perspective, each job step behaves as if it was a separate job, although
there may be dependency and conditional execution specifications that
require job step 1 to complete before job step 2 canm begin, or that job
step 2 can only run if job step 1 completed successfully, etc. The job
submission/job command file that specifes the individual job steps is a
single file that will be passed to the proxy in a single run command.

Current ptp behavior is that the run command includes a jobid that is
generated by the front end and passed to the proxy. The proxy responds to
the run command with an event containing that jobid as well as the
proxy-generated identifier for that job. This works for a single job, or
for the first step of a multi-step job.

How should multi-step jobs be handled? Should the PTP front end have a
list or array of jobs steps built at the time the job is submitted, and use the same jobid for each of those steps? Should the front end generate
a unique jobid for each step that is then passed across in the run
command, maybe as an array of jobids, and then the proxy generate a new job event for each step using the corresponding jobid? Should the proxy just use the passed jobid for the first step and use -1 as the jobid for
all subsequent steps, since the front end doesn't know about the
additional job steps?

It sounds like LL handles multi-step jobs using a command file. I presume the user just submits this command file with a single job submission command? Why wouldn't you do the same thing from Eclipse - submit the job command file - rather than try and implement LL functionality in the UI?

It would be possible for you to build a job control UI that monitors the status of a job and when it is successfully completed, submits the next job in turn. However, there is currently no functionality to control whether a job is run or not, only if it is submitted. This sounds like some internal LL functionality that you would need to expose. How does it work now?


2) When we submit a job, the job may not appear on any job queue for a
while, possibly several minutes. We won't have some job related
information, such as cluster (machine name) where the job was queued,
until the job appears on the job queue. If we delay our event response to the run command until we have the required information, does that cause problems, such as blocking any additional jobs from being queued until the event notification from the first run command is received? Does the front end have problems tracking multiple 'in process' run commands active at
the same time?

The architecture supports multiple outstanding submit commands, so delaying the new job event won't cause the UI any problems. However, it would be nice if the user got feedback that something had actually happened when they submitted the job. One thought is that you could have a dummy 'submitted' queue that these jobs would go onto immediately. Then when the job is actually placed on a cluster, you could remove it from the dummy queue and add it to the correct queue.

Greg



Back to the top