[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Newsgroup Home]
[news.eclipse.technology.cosmos] Re: COSMOS for high-performance computing

Hi Randal,

There is currently an enhancement open to integrate Nagios with the COSMOS 
framework:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=188390

The purpose of the enhancement is to provide a comprehensive solution for 
easily monitoring a set of hosts and services.  A design document is 
underway to spell out the requirements and implementation details.  Assuming 
the approval of the enhancement, there should be something tangible by the 
end of COSMOS first release.  Feel free to add yourself to the CC list of 
the enhancement.  Help and guidance from the community will always be 
appreciated.

As for a system model, COSMOS uses a project-specific model based on the 
Service Model Language (SML) to represent resources.  This was developed to 
illustrate the COSMOS framework.  It will likely be modified as the project 
evolves.  It is not at a stage to be used in production by other vendors. 
The plan is to eventually have the model derived from the Common Model 
Library (CML) which is still under development.

A lot of the work currently going on in COSMOS revolves around the CMDBf 
specification.  Here are some links that you may find useful:

1) Read about CMDBf here: http://cmdbf.org/
2) COSMOS commitment to CMDBf: 
http://wiki.eclipse.org/images/b/b3/Cmdbf-cosmos-deliverables-v0.07.zip
3) Providing a CMDBf Query and Registration Service: 
http://wiki.eclipse.org/Providing_a_CMDBf_Query_and_Registration_Service
4) COSMOS Programming Model: 
http://wiki.eclipse.org/COSMOS_Programming_Model

There are also many design documents, relevant links, and architectural 
meeting minutes that you can find here:
http://wiki.eclipse.org/COSMOS_Architecture_Meetings
You'll need to do some data mining to make sense of what is included in the 
architectural meeting minutes.

I hope that helps.
Thanks,

Ali Mehregani


"Randal Rheinheimer" <randal@xxxxxxxx> wrote in message 
news:f29cb8a646b44a77d691574d69de63b7$1@xxxxxxxxxxxxxxxxxx
>I lead a project at Los Alamos National Laboratory charged with revamping 
>(replacing) our current monitoring infrastructure for HPC systems.  Our 
>environment is several 1000-10000 node Linux clusters, and our definition 
>of monitoring is real-time alerting, system event investigation, and 
>regular
> reporting of system interrupts in some detail.
>
> Our requirements documentation identifies several concepts in common with 
> the COSMOS project--the importance of a system model, for instance. 
> However, we're having trouble pulling out the details from current 
> documentation, and the June, 2008, general release date is problematic for
> us.
>
> We're currently talking with GroundWork and Zenoss (only one of whom seems 
> to be involved with COSMOS) about our extension of one of their 
> infrastructures to meet our needs.  Is COSMOS release 0.4 something we 
> should consider as a basis for a project that needs to provide software 
> used in a production HPC environment, or should we just not spend the 
> time?
>
> Regardless of the answer to that question, what is the proper mechanism 
> for understanding the core COSMOS principles (other than what we glean 
> from the eclipse site)?  For instance, is an HPC environment an eventual 
> potential target, or is the focus on networks and application servers? 
> There seem to be some biases (each piece of data is atomically relevant, 
> with little room for higher-level correlations, for instance) in all the 
> products/infrastructures we've surveyed, and I personally would like to 
> understand whether the biases are real or if we just misunderstand some 
> underlying concepts.
>
> Thanks,
> Rand
>