Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Suggestions for implementing LML system viee for PE JAXB resource manager with large node count

Hey all -

On a related note, I am working on BG/P support for Intrepid/Challenger/Surveyor at Argonne. They don't use Load Leveler, they use PBS Cobalt. So, to get job info I am using qstat, and to get node info I am using partlist.

One problem I have encountered is that partlist returns the partition blocks, not the actual machine nodes. From the partlist results, I can either return the partition blocks (which are not mutually exclusive - a single node can exist in many different partitions, configured different ways) or I can reconstruct the machine based on the partition names, and then map the jobs to the machine. For example, the 1024 nodes of Challenger are organized into 230 different partitions, with 16 to 512 node per partition. 

The jobs themselves are allocated to partition blocks, so they have to be remapped too.

Neither of these mappings is that difficult, but I just wonder if it is more meaningful to show the partitions in the Eclipse IDE, or the BG/P hardware hierarchy.

Any suggestions?

Thanks -
Kevin
--
Kevin A. Huck
ParaTools, Inc
(541) 359-2261



On Jan 13, 2012, at 6:32 AM, Greg Watson wrote:

I think this is a question for the LML developers: How can the layout of the nodedisplay be customized for a particular resource manager?

Greg

On Jan 12, 2012, at 9:00 PM, Dave Wootton wrote:


I'm trying to figure out what I need to do in order for my PE JAXB resource manager to effectively handle LML status views when the user's application is using a large number of nodes in the cluster, and where displaying status for all those nodes will affect Eclipse performance.

With PE, I have no information about which nodes are in use until the application starts since PE uses either a host file or a back end resource manager such as LoadLeveler to handle node allocation at job submit time. I also have no information at all about node topology, at least in the hostfile case.

So I think the default action is that I need to arbitrarily group nodes into groups of 100 nodes, so taht if I was using 500 nodes, the initiial view would be 5 elements representing groups of 100 nodes, and where the user could zoom in to see detail of those 100 nodes.

One idea that I had about better grouping was to do something where I apply a regular _expression_ pattern to the list of nodes, where node names may be representative of what frame they resided in and group nodes in the same logical set into a single display element.

I'm looking for suggestions about what I need to do in my resource manager or elsewhere to make this work.

Thanks.
Dave_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


Back to the top