Erwin,
Sorry for taking so long to get back to you. PTP is a workbench for developing C/C++ and Fortran parallel (MPI and OpenMP) applications. In addition to the normal development activities, such as coding, debugging, etc., we also provide a module to launch and monitor applications on a remote target system from within the workbench.
The launching capability uses a framework that supports arbitrary resource managers/job schedulers. Support for resource managers can be added by customizing an XML description file. Users submit their jobs via the normal Eclipse launch configuration mechanism that has been tailored to suit the target system resource manager. Support is currently available for PBS/Torque, LSF, LoadL, GE, SLURM, interactive Open MPI, interactive MPICH2, and generic remote launch.
The monitoring framework provides the ability to monitor job queue and system information from within the workbench. It provides customizable views that show system a job information, and supports a wide variety of target resource managers. It is easily customizable to support other systems.
We did look at using DRMAA a few years ago, but decided to “roll our own” as support was very limited at the time. In addition, by using a simple framework we are able to provide a very easy out-of-the-box experience for users, as no additional installation or configuration is necessary to begin submitting and monitoring jobs.
Currently both the launching and monitoring frameworks are loosely integrated with PTP. It would be fairly easy to extract them as separate libraries that other tools could use if that would be of interest. I think it would also be fairly easy to add DRMAA support, but I’m not sure what the advantage of doing that would be.
I’d be happy to discuss this more and explore options for combining the two projects.
Regards, Greg
Hi Greg,
With Diamond LS we've built a Java-based API to SGE via DRMAA,
packaged as OSGi bundles.
DRMAA seemed to be the cleanest approach to have a standardized
API on different resource management systems.
(but most implementations seem to be C-based :-( + a JNI binding
for some of them)
Besides being able to submit jobs etc. from an application running
on a grid node (which is the default for a DRMAA/SGE
implementation), we've added a JMX-based remote-submission
"gateway" (provider+consumer) and a REST-based service-wrapper on
it.
The goal is to be able to submit jobs from automated
processes/workflows. A typical use case would be to slice
scientific data sets in chunks, submit them all as separate jobs
where each job runs a data analysis workflow, wait for the results
and then merge them etc.
If you want to dig a bit deeper, you can review the available code
at https://github.com/DawnScience/dawn-hpc
SLURM-integration could become relevant at another site. I would
negotiate with them if they're willing to deliver the code in
open-source, and then we would develop it as a DRMAA
implementation and offer it here. As far as I know there is no
Java-based SLURM DRMAA "binding" available yet. I did find a
C-based one in Poland...
If I understand the docs in PTP correctly, this is oriented to
C&Fortran-based jobs and the configuration of resource
management systems in a workbench.
It could indeed be great if PTP already offers resource management
and job submission services and we could find a way to integrate
them behind a Java/DRMAA facade!
Is that correct?
Would the above be of interest?
regards
erwin
Greg Watson schreef op 21/04/2015 om 16:57:
What sort of APIs are you proposing accessing
computing grids? e.g Java APIs or Eclipse APIs? PTP already has
a target system configuration framework for controlling,
launching, and monitoring resource management systems like job
schedulers. Are you thinking of something along those lines? PTP
already supports SLURM and SGE (as well as PBS, Torque, LSF,
etc.) If you wanted to open up the APIs or make the
implementation more portable, I’m sure we’d be interested in
that.
Regards,
Greg
Dear all,
I would like to propose 2 new projects :
- a set of APIs and impls in the domain of HPC, computing
grids etc
- a move of our Passerelle process engine from eclipselabs
to a formal eclipse project
Would this be of interest for science IWG? If so, read on
;-)...
Any feedback is of course welcome!
kind regards,
erwin
More info :
=========
1. in the HPC domain :
- APIs and impls for accessing computing grids (cfr DRMAA,
SGE, SLURM, ...)
- other clustering-related tools
- memory grids
- etc
This would contain an initial code drop from the DAWN
repos at github with just the DRMAA and SGE grid access.
Matt Gerring (DAWN lead) supports this move.
In the near future we would be extending this to a
DRMAA-implementation for SLURM.
Another next task is upgrading from DRMAA v1 to v2.
Other topics are less/not concrete right now, and would
depend on requests from science IWG or from additional
committers.
Open questions :
- What would be a good name? Can we claim a generic name
like "science HPC" or so? (and then hope that there's
sufficient participation to enlarge the scope to other
HPC-related tools)
- Where to put this? Is this a sufficient scope for a new
eclipse project? Or should it be a component of a parent
project?
- I guess this would become a technology project?
2. a move of our Passerelle workflow
engine&workbench from the current eclipselabs@Google
hosting to a formal eclipse project
The initial code drop would include a minimized Passerelle
core, built on top of a new OSGi-ified version of Ptolemy
(the underlying actor-based hybrid modeling software of UC
Berkeley)
(Ptolemy is and would remain hosted by UC Berkeley, so
their sources would not move to eclipse)
The current GEF-based Passerelle model
editor would no longer be maintained, and would be
replaced by an EMF&Graphiti-based one that would
become the future Ptolemy model editor.
(i.e. the editor would no longer be specific for
Passerelle)
For the longer term we would be extending Passerelle
towards the needs for "reproducible science" (cfr http://www.reproduciblescience.org).
Open questions/issues :
- as Passerelle is the basis for our production software,
with frequent releases, we can't be blocked for too long
in an incubation phase (which seems to prevent formal
releases).
- would this be a technology or a tools project?
--
Met
vriendelijke
groeten - Bien
à vous - Kind
regards
Erwin
De Ley
<signature-1.gif>
_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/science-iwg
_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/science-iwg
|