Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] PTP now correctly hooks into node state change with bproc and ORTE

I just committed some code that finishes up something I've been working on for a bit. Now, with PTP using ORTE/OMPI and running on a bproc system we can correctly monitor node status information and display that with the correct node. So, for instance, I can run PTP on the front-end of a 10 node bproc test machine I've got here and it will show me, graphically, who the owners of each node are, what their state is, etc.

Then I can tell it to change the ownership or permissions of one node and wham, immediately our icon changes for that corresponding node. :) Then I can say reboot the entire cluster and we'll get a flurry of messages as each of the nodes changes state and goes through a series of states, such as 'reboot, down, booting, up'. They don't do them all in lock step, of course, as machines can't be expected to boot exactly at the same time and it's wonderful to see my little grid of icons all flicker as the machines change state.

One thing it doesn't do yet is it sends one event for each event in the subsystem. What this means is that if we rebooted say a 2000node bproc cluster we'd get 2000 events for each state change. This is obviously not going to scale so that's going to be a focus in the coming weeks after the release to put in some sort of throttling or coalescing code.

Anyway, just wanted to relay some good news and since I don't think many of the people listening to this list actually run bproc systems you guys might not get to see it first hand. :)

Thanks go out to the OMPI guys for getting me the correct code to make this possible.

--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------



Back to the top