Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [science-iwg] SWG code for DRMAA

Hi Greg,

Your remark about additional installation steps is true indeed, at least for the implementations that use the native libraries.

But the API allows to decide on other implementation approaches, e.g. based on using shell commands, that would not depend on that.
Which is the approach we've followed in our (operational but not feature complete) SLURM implementation.
But this implies generating and parsing cmdline strings, and we've noticed that these are not always easy to discover or maintain across versions etc.

Would it be an option to use parts of your work in PTP as a generic implementation framework/approach behind a DRMAA API?
That would seem like a best-of-both worlds, reusing proven components, offering a standard API for generic reusability, and taking advantage of OSGi services to provide dynamic pluggability!

Cheers
erwin

-----Original Message-----
From: science-iwg-bounces@xxxxxxxxxxx [mailto:science-iwg-bounces@xxxxxxxxxxx] On Behalf Of Greg Watson
Sent: Wednesday, February 22, 2017 4:58 PM
To: Science Industry Working Group <science-iwg@xxxxxxxxxxx>
Subject: Re: [science-iwg] SWG code for DRMAA

Yes, by all means let's schedule a meeting.

The problem with DRMAA is that it requires additional installation steps by the system administrators and has a different implementation for each type of job scheduler. This can be very problematic in production HPC environments due to security concerns as well as other factors. It is (or at least was) also limited to a subset of the functionality provided by the job schedulers. The approach we chose was to provide a mechanism for interacting with arbitrary job schedulers without requiring any changes to the target system, and with the ability to use any features they provide. It also gives us the most flexibility in terms of designing the system. 

Regards,
Greg

> On Feb 22, 2017, at 7:50 AM, Torkild U. Resheim <torkildr@xxxxxxxxx> wrote:
> 
> Thanks Erwin,
> 
> It appears that DRMAA is exactly what we need for the service component mentioned earlier as it has only a few basic requirements. The service will be running on one of the cluster nodes; it will only need to create new jobs and handle their state.
> 
> Shall I create a Doodle for next week?
> 
> Best regards,
> Torkild
> 
>> 19. feb. 2017 kl. 15.20 skrev Erwin De Ley <erwin.de.ley@xxxxxxxxxx>:
>> 
>> I think it would indeed be good to have a call about this!
>> 
>> Personally I believe in DRMAA as one of the possible correct approaches. It’s quite simple to grasp and to integrate the ideas and library.
>> It’s basically a standardized API that can be implemented for your 
>> grid/cluster/resource manager, with already quite some popularity and 
>> implementations outside of the Eclipse community. (cfr 
>> https://www.drmaa.org/ )
>> 
>> Son of Grid Engine is one of the supported systems, but there are several others.
>> I.e. DRMAA is not part of SGE, but the open source API code I found at the time came from them, that’s all.
>> And Diamond was/is using SGE. For Soleil we built a SLURM implementation (but that’s not fully featured yet).
>> 
>> I’ve worked with the v1, which is about job submission and state management. But not about the actual management or monitoring of the grid itself. This has been added in the v2 standard which I haven’t used yet.
>> 
>> One of the main differentiators with PTP (but I may be totally wrong here) is that DRMAA is just an API with pluggable implementations.
>> If you want ready-to-use GUIs for grid management, job submission and tracking etc : that’s not part of the DRMAA story.
>> 
>> But if you want a standardized API that’s easy to integrate for automated processing and/or in your new/custom GUIs etc : I think it’s a very good option. I built e.g. an implementation on a Java ExecutorService that can be plugged in for simple local testing.
>> 
>> Some gotchas :
>> 	• DRMAA implementations often depend on native interfaces to C-based implementations.
>> 	• In fact the formal DRMAA standard focuses on C. Java APIs are “recommended”. For v2 the DRMAA group has told me a while ago they could support/collaborate on having the formal Java API defined/implemented with the Eclipse Science WG.
>> 	• DRMAA does not offer “remote” access. I.e by default the implementation must run on a node in the grid that has the native resource manager installed. For Diamond I built a JMX- and REST-based remote access gateway to resolve that.
>> 	• As Greg mentions, there are many different resource managers out there. And for each installation, there are still “local differences” like the submission queues available, system resources (e.g. cpu cores/power, memory, installed JDKs etc) and how they are documented in the resource manager, priority queue handling etc. 
>> A bit like JDBC and SQL, DRMAA gives you standards for the most common features and needs. If you need to pass custom settings, this is possible via so-called “native specification” properties.
>> Overall I had the impression this approach plays nicely with separating responsibilities between grid admins, fwk developers and end-user UIs etc.
>> 
>> What I would like to see happening, or at least being investigated :
>> 	• support DRMAA as a standard API. (The goal of Triquetrum is to provide this and some implementations.)
>> 	• use PTP as management workbench using its existing features
>> 	• and more tentative/future :
>> 		• use the current usecase as a driver to integrate DRMAA in PTP?
>> 		• investigate DRMAA v2. This is more long-term, but is also on the roadmap of Triquetrum.
>> The DRMAA group has expressed their interest to collaborate on this with the Eclipse Science group a while ago.
>> 
>> Regards
>> erwin
>> 
>> From: science-iwg-bounces@xxxxxxxxxxx 
>> [mailto:science-iwg-bounces@xxxxxxxxxxx] On Behalf Of Jay Jay 
>> Billings
>> Sent: Friday, February 17, 2017 9:18 PM
>> To: Science Industry Working Group <science-iwg@xxxxxxxxxxx>
>> Subject: Re: [science-iwg] SWG code for DRMAA
>> 
>> I just want to throw in here that ICE does all of this through PTP and it works great. The reasons that Greg mentioned are why we went that route a few years back. 
>> 
>> We, where we = me nagging & Greg coding, just developed a completely Java based proxy for PTP that handle all communications to and from the remote machines in order to deal with special security requirements. It could in theory be used to replace the Perl monitoring scripts and we're already looking at how it can stream visualization data back to EAVP. It might even be possible to execute the Perl scripts in the JVM as an interim solution.
>> 
>> Jm2c,
>> Jay
>> 
>> On Feb 17, 2017 13:15, "Greg Watson" <g.watson@xxxxxxxxxxxx> wrote:
>> Torkild,
>> 
>> My bad... we were supposed to have a meeting to discuss this but it fell off the pile.
>> 
>> I think the issue here is that you're going to need something is able to handle all the different configurations of Torque, Slurm, GE, etc. that are out there. This is non trivial, and not just the case of running a few commands. I'm not sure how Son of Grid Engine works in this regard, but presumably it doesn't support other schedulers in any case.
>> 
>> My suggestion would be to have some pre-configured submission scripts for the different job schedulers, and use the PTP infrastructure to do the submission for you. There's API for doing this, and I'd be happy to make changes if there's something missing. There's also already the capability to monitor for job completion, so you can close the Eclipse session and return later, although this is one area you might want to improve on. It is also already compatible with the Remote Services layer, so you get remote connections as part of the deal.
>> 
>> The perl dependencies are only for the monitoring component, which is independent of the job submission engine, so the two can be easily separated.
>> 
>> I'd be happy to arrange a meeting to discuss in more detail...
>> 
>> Regards
>> Greg
>> 
>>> On Feb 17, 2017, at 12:01 PM, Torkild U. Resheim <torkildr@xxxxxxxxx> wrote:
>>> 
>>> Hi all,
>>> 
>>> Erwin, Greg and I was earlier briefly discussing how to handle compute clusters within the SWG. I’m working on a project that requires my code to work on both Torque and Slurm based clusters, which obviously made me want to do some abstraction. PTP looks interesting, but I think we can and should utilize that code without change, mostly for monitoring for advanced users. What I want is something without the Perl dependencies, that can be utilized by our users without them having to think about clusters. They just want to run a workflow that will take a couple of days on an 144 CPU EC2 cluster that the application will automatically provision for the – and get a message when it’s done. Also there are breakpoints and error handling that must be taken care of.
>>> 
>>> So I think the code based on Son of Grid Engine may be part of the solution. I’ve not investigated it yet, and it may be that I must do some adjustments to that code. Has anyone looked into this? If possible, should we adopt this code so that we can further develop it?
>>> 
>>> I do have a budget and can probably spend 2-3 weeks on something we can share. If we include the provisioning mechanism I can spend more time. I need to do that regardless. Anyone up for a short meeting to discuss this?
>>> 
>>> Best regards,
>>> Torkild
>>> 
>>> PS: If I remember correctly the AWS code from Amazon is BSD licensed.
>>> 
>>> _______________________________________________
>>> science-iwg mailing list
>>> science-iwg@xxxxxxxxxxx
>>> To change your delivery options, retrieve your password, or 
>>> unsubscribe from this list, visit 
>>> https://dev.eclipse.org/mailman/listinfo/science-iwg
>> 
>> _______________________________________________
>> science-iwg mailing list
>> science-iwg@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or 
>> unsubscribe from this list, visit 
>> https://dev.eclipse.org/mailman/listinfo/science-iwg
>> _______________________________________________
>> science-iwg mailing list
>> science-iwg@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or 
>> unsubscribe from this list, visit 
>> https://dev.eclipse.org/mailman/listinfo/science-iwg
> 
> _______________________________________________
> science-iwg mailing list
> science-iwg@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or 
> unsubscribe from this list, visit 
> https://dev.eclipse.org/mailman/listinfo/science-iwg

_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/science-iwg

Back to the top