Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] Notes/Comments

Anyone know why the CVS commit lists don't list WHO did the commit? Even just their user name on the server? I'd like to know these things! :)

Just committed a pile of code that lets you switch your monitoring system from the simulated one over to the ORTE/OMPI one. In fact, in the preferences now when you select one and 'OK' it the ModelManager gets told to change over its universe / model that it has cached. The UI refreshes, etc.

There's a tiny bug right now where if you go FROM the simulation TO the ORTE it bombs out. This is because the simulation is simulating running processes / jobs and the threads aren't told to shutdown cleanly. I'm going to fix this (by removing those Threads as Greg and I discussed today) so in the meantime just be sure to go from ORTE TO Simulated (if you feel like testing / playing).

I still need to resolve the bit where the preferences page screws up the first time you load it up (with no default preferences). I'll be working on that and passing down the runtime arguments to the runtime system layer (JNI layer) over the next few days.

FYI for those that weren't involved: We've decided to drop the term 'runtime environment' or 'runtime system' generically and replace it with two systems: a monitoring system / state of health monitor system paired with a control system. The control system is responsible for starting jobs, stopping jobs, etc. The SoH monitoring system (or just monitoring system) is responsible for determining the status of the universe as you see it: what machines are there? how many nodes on each machine? what jobs are running? who owns them? etc, etc.

These two components were previously linked into a single 'runtime system' but now are being broken out. This allows us to set the monitoring system through preferences and refresh / populate the runtime model without requiring the user to, at that time, choose how they want to actually control jobs. While systems like OMPI will have both the control and monitoring system interlinked (in time), a system like MPICH might require MPICH for the job control but another system for monitoring (like perhaps Supermon or Ganglia (sp)). Things to think about.

Just wanted to drop an update.

--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------



Back to the top