Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] MPI Runtime Environment Commits/Updates

I committed some code just now having to do with the MPI Runtime Environment layer. Basically this code when linked with Open MPI (OMPI) lets you spawn a parallel job from within the PTP/Eclipse. You get messages about it as it's going, etc.

There's some very big caveats currently - some are my problems, others aren't.

  1. Firstly you have to have already spawned the Open Runtime
     Environment Daemon (ORTEd) from a console.  I'm going to be
     changing this so that you have a file selector somewhere in the UI
     that lets you pick the daemon and PTP will spawn it for you.
  2. It's hard coded what job is spawns.  Yes, I know this is stupid
     but this was for testing purposes.  Right now it's looking for a
     file in a directory that I have.  If you want to start tinkering
with this code then you have to change this hard coded String. Obviously I'll quickly change this to pull down the information
     from the Run. . . box.
  3. The ORTEd has a lot of locking issues due to it not being
     thread-safe.  Right now I'm doing the locks in JNI-C because of
     another issue that the OMPI team is aware of.  In the future I'll
     move these locks out of C and into Java where they're easier and
     more portable - in Java we'll have more control over them as well.
  4. Currently the ORTEd doesn't like it when you sleep on their
     progress function and in another thread try and start a job.  This
     is a thread safety issue and, while they have multithreaded
     versions of OMPI, they apparently have other bugs that are
     problems for us.  We've got a workaround for this which involves
     the progress function not being a true sleep (which is implemented
     as a blocking select() as I understand it).  Instead we've had to
     go with a nonblocking select and then putting the thread to sleep
     for 1000usecs.  This results in a light weight polling - but
     something I'm not happy with long-term.  It'll allow work to
     continue now, but something they're addressing as we speak.
  5. There's also an interesting bug with ORTE/OMPI where the messages
     coming out of your MPI task are repeated the more you run your
     job.  Again, the OMPI team is aware of this now and is trying to
     fix it.  Probably only a few days on this one.

So, some real progress here. And if you want to try this sub-alpha version feel free. :) You may want to wait a day or two for me to tidy it up with the GUI so that it gets rid of this bit of hardcoding.

--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------



Back to the top