Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] Core/Launch/UI Release Strategy Update

Attached is the updated release strategy and feature list. It includes the newer feature where you can select which machine to run your job on.

Today's RC1 for these components will not have a completely functional OMPI runtime layer implementation. This is because of recent bugs in OMPI which I believe have been resolved last night which were preventing me from hooking in a pile of code I've been holding here to test. It should be in and begin testing very early next week.

--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------

PTP RELEASE STRATEGY FOR THE CORE/RUNTIME COMPONENTS
----------------------------------------------------
FEATURES:
 1: A Parallel Development Perspective which is comprised of a Machines
    View, Jobs View, Process View, Legend, and includes a Preferences
    Page.
 2: A Machine's View which displays the status of all the machines the
    user knows of.
 3: Dynamically updated status of the nodes of machines as those nodes
    change state (such as 'up', 'down', 'has a job running on it')
 4: A Job's View which displays all the jobs that were started during
    the current session.  This includes job state and a listing of
    processes comprising the job (including process state).
 5: A Process View which displays the status of a single process (though
    multiple Process Views may be open concurrently).  This includes the
    stdout of the process and has status and exit code fields that are
    updated dynamically as the state of the process changes.
 6: Ability to focus on a machine, node, job, or process and display
    current status of that entity.
 7: A Legend dialog that displays the various icons for nodes and 
    processes that represent the states these entities can undertake.
 8: A Preferences Page which lets the user specify settings for the
    type of monitoring and control system to use.
 9: A model (series of instantiated data structures) that represents the
    known universe (machines, nodes, jobs, processes).  The model is 
    organized heirarchically and each entity contains attributes
    (key/value pairs) that represent additional information about the
    object (such as process state, node ownership, etc.)
10: An interface to external control and monitoring systems (runtime
    system components).
11: Open-MPI control and monitoring systems implementations.  These
    interface to Open-RTE through the Java Native Interface (JNI).
12: Simulated control and monitoring systems which exercise the
    runtime system interface, user interface, and allow demonstration
    in environments without other control/monitoring systems.
13: Ability to start a job on a specified machine on a specified
    number of processes.
14: Ability to terminate a running job.
15: Ability to create sets of nodes and processes for ease of viewing.
    Also the ability to delete entries from these sets, add to them, and
    focus on a given set.

TESTING PLAN:
    SETUP:
      Start on a bproc machine.
      Make sure Open-MPI is setup and working (not part of this test,
      just required to utilize Open-MPI).
      Start with a fresh Eclipse install (including workspace).
      Install PTP.
      Compile the PTP Open-MPI JNI library.
      Acquire some set of nodes for a long period for testing.
      Launch Eclipse and then launch a new Eclipse with the PTP plugins
        running.

 1: Using the Parallel Development Preferences Page select the Simulated
    runtime system.
 2: Using the menuing system, open the Machines View and Jobs View.
 3: Confirm that the Jobs View shows no jobs since it is a clean start.
 4: Confirm that the Machines View displays the current state of the
    machines.  Use the drop-down menus to observe other machines that
    are known and confirm they too display the current machine state.
 5: Create a set of nodes using the user interface and name the set.
 6: Confirm that the user can switch between (focus on) the full set of
    nodes for the given machine and the newly created set.
 7: Add a few more nodes to the new set.
 8: Focus on a different machine and confirm that the set is no longer
    visible (since it pertains to the original machine).
 9: Create a new C project.
10: Create a new C-MPI source file in the project.  The source file will
    have each process producing periodic output and run for a few
    minutes (so that the tests can be performed on a running job).
11: Compile the C-MPI application.
12: Create a new Run Configuration for this project, utilizing the
    Parallel Development configuration to specify the number of
    processes for this run and a chosen simulated machine.
13: Run the job (under simulated control).
14: Confirm that the appropriate node's change state in the Machines
    View to specify they contain a running job and that the job starts
    on the correct machine.
15: Focus on a node where one of the processes has been assigned.
    Confirm that the Machines View displays the processes on that node,
    including which job the process belongs to.
16: Double-click on one of the processes in the Machine View to bring 
    up the Process View.  Confirm that the MPI rank, node number, job
    number, and status are correct.
17: Observe process output in the output section of the Process View.
18: Wait for job to terminate.
19: Observe that the process state and exit code correctly display in
    the Process View, Machine View (for the appropriate node), and Jobs
    View (for the appropriate Job).
20: Bring the Jobs View to the foreground.
21: Confirm the Job previously run, as well as the processes contained
    within it, are listed and is shown as terminated.
22: Re-run the same job.  Confirm the Job View displays the job as
    running and the processes as well.
23: Double-click on a process of the job, opening the Process View.
    Confirm the running state.
24: Terminate the job by using the terminate icon.
25: Confirm the Job View updates to show the terminated state.
26: Confirm the Process View updates to show the terminated state,
    including an exit-code.
27: Using the Parallel Development Preferences Page select the Open-MPI
    runtime system.
28: Using the Open MPI Preferences Page under the Parallel Development
    Preferences Page set the path and arguments to the ORTE daemon
    (ORTEd).
29: Run the same job from step #12 (under OMPI control).
30: Repeat steps #14 through #26 for this second runtime system
    (Open-MPI as opposed to simulation).
31: Switch back to the Machine View.
32: Using another terminal change the state of one of the nodes (reboot
    it, change ownership, etc) and confirm that the node's status
    changes in the Machine View (both the icon to match the legend as
    well as the detailed text information to display the new change(s)).

SUPPORTED ARCHITECTURES/RUNTIMES:
    bproc Linux [SIMULATOR AND OPEN-MPI]
    Mac OS-X 10.4 (Tiger) [SIMULATOR ONLY]
    Requires Eclipse 3.1.0
    Requires CDT version 3.0.0 for building C-MPI applications.

PACKAGING:
 1: Open-MPI binary build for bproc 64bit Linux.
 2: PTP's Open-MPI JNI library build for 64bit Linux.
 3: PTP core, launch, and UI packages as source.
 
 All above will be tarballs and gzipped (X.tar.gz).  
 User will untar and gunzip the tarballs.  
 User will need to setup and test Open-MPI's compile and run facilities 
   themselves, confirming it works on their architecture.
 When user launches Eclipse PTP source will build.

Back to the top