Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] on moab integration

Dave,

Does that mean that the LoadLeveler API only supports polling, not callback? I guess this is going to be an issue with other job schedulers too.

Greg

On Jul 24, 2007, at 3:55 PM, Dave Wootton wrote:

Also, since the process state model for PE is so simple, we don't maintain any process state within the proxy to try to optimize the events sent to teh PTP gui.

We're also working on an implementation to support LoadLeveler (IBM batch job scheduler). In that model, LoadLeveler provides a C programming API which allows is to query many attributes of the cluster LoadLeveler is running on, including node state, job state, etc. as well as to submit jobs. Our implementation will use that API rather than using LoadLeveler commands to retrieve the data. For the LoadLeveler implementation, since the run enviroment and status tracking is more complex, we will be maintaining state within the proxy so we only send events for changes ratheer than sending complete state each time we want to update the GUI. We also need to be concerned with polling interval in that case, both due to concerns about CPU load on the proxy node as well as overloading the LoadLeveler daemons with status requests too frequently.
Dave
<graycol.gif>Dave Wootton/Poughkeepsie/IBM@IBMUS


<ecblank.gif>
To
<ecblank.gif>
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
<ecblank.gif>
cc
<ecblank.gif>
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>, ptp-dev-bounces@xxxxxxxxxxx, "Canon, Richard Shane" <canonrs@xxxxxxxx>
<ecblank.gif>
Subject
<ecblank.gif>
Re: [ptp-dev] on moab integration
<ecblank.gif><ecblank.gif>

The Parallel Environment (PE) case is pretty simple since there is no programming interface for invoking PE applications and no methods to query status. When a user invokes a PE application from PTP, the proxy is sent a run command with PE invocation parameters as arguments to the run command. The proxy issues a fork and sets up the application's environment variables using the arguments from the run command and then invokes poe via exec(). (poe is the master process which is responsible for setting up and invoking all the tasks of the parallel application).

At the point the proxy invokes the poe executable, it does not know anything about the job other than the pid of the poe process. So the proxy starts a thread whose only purpose is to watch for a file generated by the poe process that has the mapping of application tasks to hostname/pid pairs. Once this thread has the mapping file, it sends events to the PTP gui notifying it of the existence of the processes (tasks) and also that the job is now running.

The proxy has a thread whose only purpose is to watch for process termination of any child process of the proxy by issuing waitpid() with the W_NOHANG flag set. The only processes detected by this polling loop are the poe processes started by the proxy. When a poe process terminates, the proxt sents a job terminated event to the PTP GUI.

In the PE execution model, any application output written to stdout or stderr is captured by the PE runtime and sent to te poe process. The poe process simply echoes that output to its stdout and stderr file descriptore. For PTP, since the poe process is fork/execed by the proxy, at the point where the fork is issued, I set up pipe pairs for stdout and stderr and capture poe stdio that way. I register the proxy's pipe handles to the select() listening loop set up at proxy startup, and as stdio data becomes available, the proxy generates the events to send that data to the PTP gui. I also have an option to redirect stdout/stderr to files avoiding sending data to the PTP gui.

In this model I have two polling loops. The first is within the thread watching for the poe process to generate the task map file, and is normally a short-lived polling loop. The second polling loop is in the thread watching for poe process termination. In both cases, I sleep a few seconds before iterating the loop. I don't consider either of these polling loops to be heavy CPU users since the processing within these loops is fairly simple.

That's the basic concept. Let me know if you have questions.

Dave
<graycol.gif>Greg Watson <g.watson@xxxxxxxxxxxx>

Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
<ecblank.gif>
To
<ecblank.gif>
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
<ecblank.gif>
cc
<ecblank.gif>
"Canon, Richard Shane" <canonrs@xxxxxxxx>
<ecblank.gif>
Subject
<ecblank.gif>
Re: [ptp-dev] on moab integration
<ecblank.gif><ecblank.gif>

Feiyi,

On Jul 24, 2007, at 11:54 AM, Feiyi Wang wrote:

> hi, folks -
>
> Moab document says it can interface with both java and c, but  
> looking into the API, I have two concerns:
>
> - It doesn't have callback function to update node and job status,  
> meaning if I want to update eclipse front, I have to probe  
> periodically.
> Is orte actively updating the front? how does it handle this?

ORTE gets it's information via callbacks that are generated only when  
state changes occur, which is much more efficient than polling. I'm  
not sure what Dave is doing for the PE case, but he might want to  
comment also.

Polling would probably be ok as long as you keep the frequency fairly  
low. This is going to be a tradeoff with the responsiveness you want  
to deliver to the user. You should also keep a snapshot of the state  
internally in your proxy so that you only need to send differences to  
Eclipse. That way you'll be able to minimize the load on Eclipse.

Does Moab have a tool to monitor status? How does that work?

>
> - It is strange that I couldn't find API/structure to query system  
> resource. Most API documented there are job related. One Moab  
> developer suggested me to use their so-called "XML api", some C  
> function will take a XML string, and return XML result, the same as  
> you would get from their command line tool.
>
> The issue with getting XML string *not* C structure is, I need to  
> re-parse the result and set up the correct structure again. As a  
> side note, these returned xml string can be very very large:
>
> For example, on ORNL jaguar system with over 11000 nodes, to  
> implement get_node_attributes(), a query to system yields 6.8M XML  
> string, the worst of it is, *a single string* - so far it render  
> Eclipse, Emacs, Gedit not responsive anymore when trying to load  
> it. Even if I parse it right eventually, it feels so ugly, do you  
> folks see this is a long term solution?

It sounds like this will generate a lot of overhead just parsing the  
string every time. Is there any way to only generate differences or  
do you just get a full dump every time?

You're going to be restricted by whatever API Moab provides. If it  
proves unworkable, then getting Moab to add a better interface might  
be one option.

Greg

>

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx

https://dev.eclipse.org/mailman/listinfo/ptp-dev
<2F090181.gif>_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

<graycol.gif><pic09874.gif><ecblank.gif><2F090181.gif>_______________________________________________

ptp-dev mailing list


Back to the top