Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [science-iwg] SWG code for DRMAA

No problem Greg,

Actually, we don’t have to handle a lot of different configurations. Only three specific configurations of Torque, and one of Slurm. One of the Torque configurations will be of our own making as we’ll use Amazon Elastic Cloud to create clusters on demand based on Docker images.

On a cluster we will have a service component that will create jobs as required depending on the outcome of preceding jobs. This will have a REST API and will be connected to the user application. So it’s on this end we will have to use or create something that talk cluster. I’ll have a closer look at both PTP and Son of Grid Engine to see what fits the bill with the least of effort.

Shall we create a Doodle? Next week looks mostly free for my part.

Best regards,
Torkild

> 17. feb. 2017 kl. 19.14 skrev Greg Watson <g.watson@xxxxxxxxxxxx>:
> 
> Torkild,
> 
> My bad... we were supposed to have a meeting to discuss this but it fell off the pile.
> 
> I think the issue here is that you're going to need something is able to handle all the different configurations of Torque, Slurm, GE, etc. that are out there. This is non trivial, and not just the case of running a few commands. I'm not sure how Son of Grid Engine works in this regard, but presumably it doesn't support other schedulers in any case.
> 
> My suggestion would be to have some pre-configured submission scripts for the different job schedulers, and use the PTP infrastructure to do the submission for you. There's API for doing this, and I'd be happy to make changes if there's something missing. There's also already the capability to monitor for job completion, so you can close the Eclipse session and return later, although this is one area you might want to improve on. It is also already compatible with the Remote Services layer, so you get remote connections as part of the deal.
> 
> The perl dependencies are only for the monitoring component, which is independent of the job submission engine, so the two can be easily separated. 
> 
> I'd be happy to arrange a meeting to discuss in more detail...
> 
> Regards
> Greg
> 
>> On Feb 17, 2017, at 12:01 PM, Torkild U. Resheim <torkildr@xxxxxxxxx> wrote:
>> 
>> Hi all,
>> 
>> Erwin, Greg and I was earlier briefly discussing how to handle compute clusters within the SWG. I’m working on a project that requires my code to work on both Torque and Slurm based clusters, which obviously made me want to do some abstraction. PTP looks interesting, but I think we can and should utilize that code without change, mostly for monitoring for advanced users. What I want is something without the Perl dependencies, that can be utilized by our users without them having to think about clusters. They just want to run a workflow that will take a couple of days on an 144 CPU EC2 cluster that the application will automatically provision for the – and get a message when it’s done. Also there are breakpoints and error handling that must be taken care of.
>> 
>> So I think the code based on Son of Grid Engine may be part of the solution. I’ve not investigated it yet, and it may be that I must do some adjustments to that code. Has anyone looked into this? If possible, should we adopt this code so that we can further develop it?
>> 
>> I do have a budget and can probably spend 2-3 weeks on something we can share. If we include the provisioning mechanism I can spend more time. I need to do that regardless. Anyone up for a short meeting to discuss this? 
>> 
>> Best regards,
>> Torkild
>> 
>> PS: If I remember correctly the AWS code from Amazon is BSD licensed.
>> 
>> _______________________________________________
>> science-iwg mailing list
>> science-iwg@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>> https://dev.eclipse.org/mailman/listinfo/science-iwg
> 
> _______________________________________________
> science-iwg mailing list
> science-iwg@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/science-iwg



Back to the top