Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Job submission problems with PTP 5.0.3 and Torque RM

Beth,

I'll have to go back and try.  I believe it was working with 5.0.1 (it was in the June 2011 time frame).  An update in this system's Torque installation occurred since then also, but I have no control over that and so would like to get 5.0.3 working with whatever is installed.

Phil


On Nov 1, 2011, at 16:10 , Beth Tibbitts wrote:

> 
> I can't help with the RM problem but can you tell us, what was the previous
> version that worked for you?
> 5.0.2?  This was a breakage fro 5.0.2 to 5.0.3 ????
> 
> 
> ...Beth
> 
> Beth Tibbitts
> Eclipse Parallel Tools Platform  http://eclipse.org/ptp
> IBM STG - High Performance Computing Tools
> Mailing Address:  IBM Corp., 745 West New Circle Road, Lexington, KY 40511
> 
> 
> |------------>
> | From:      |
> |------------>
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
>  |"Roth, Philip C." <rothpc@xxxxxxxx>                                                                                                               |
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | To:        |
> |------------>
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
>  |PTP User list <ptp-user@xxxxxxxxxxx>                                                                                                              |
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Date:      |
> |------------>
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
>  |11/01/2011 03:50 PM                                                                                                                               |
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Subject:   |
> |------------>
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
>  |[ptp-user] Job submission problems with PTP 5.0.3 and Torque RM                                                                                   |
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Sent by:   |
> |------------>
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
>  |ptp-user-bounces@xxxxxxxxxxx                                                                                                                      |
>> --------------------------------------------------------------------------------------------------------------------------------------------------|
> 
> 
> 
> 
> 
> 
> Hello all,
> 
> After the recent 5.0.3 update, I'm back to trying to get PTP working well
> on a cluster to which I have access.  The cluster uses Torque 2.5.7 for
> batch queue software.  PTP 5.0.3 and Eclipse 3.7.1 don't seem to be able to
> completely talk to this Torque installation, and there are other problems
> that I can't seem to diagnose.  Perhaps someone else has figured these out
> already, or has some suggestions about how to fix or work around these
> problems?
> 
> For the following description I'm in the Parallel Runtime perspective, and
> working with a local MPI-based C++ project.
> 
> The first hint something is wrong is that it isn't clear whether the
> PBS-Generic-Batch resource manager is fully started or not.  I created a
> stock PBS-Generic-Batch RM.  I use the context menu to start the RM, and
> after a second or so the RM icon changes from grey to green.  However, if I
> select it and look at the properties view, the properties view still
> indicates the RM is in the STOPPED state, with "num machines" and "num
> queues" both 0.  Furthermore, nothing shows up in the "Machines" or "Jobs
> list" views.  These views seem contradictory - the RM view showing me it
> has started but others suggesting not.
> 
> I'm able to create a Run Configuration for my program.  Interestingly, the
> set of available queues in the "Run Configuration" dialog's Resources tab
> is the set of queues on our system, so PTP must have been able to obtain
> the correct set of queues.
> 
> If I attempt to Run my Run Configuration (i.e., submit it to the batch
> queue), the progress view gets as far as saying 'submit-batch' with a large
> hex number (a GUID?) but sticks at 75%.  PTP creates a file in my home
> directory named $(GUID)managed_file_for_script, but with a different GUID
> than the one shown with the submit-batch progress bar.
> 
> Eventually I have to cancel the submit-batch operation in the progress
> view.  When I do so, a message is displayed to the screen and written to
> the workspace log file saying that the qsub command failed because it
> couldn't find the batch script.  The message shows the command was trying
> to use the path $HOME$HOME$GUIDmanaged_file_for_script (i.e., the home
> directory path is listed twice).  I can't see how to modify the
> PBS-Generic-Batch XML file to keep it from building the path with $HOME
> twice.
> 
> Just to explore, I created the directories and a symlink so that $HOME$HOME
> existed and pointed to $HOME.  PTP was able to submit the job but it
> produced no output to the Console view nor did it change the Machines or
> Jobs list views.
> 
> Does anyone have any ideas?
> 
> Phil Roth
> 
> P.S., using diagnostics advice given previously on this list, I found that
> the LML_da_driver.pl script is not correctly finding the version of Torque.
> When given the --version flag, the qstat command with Torque 2.5.7 writes
> its output on stderr, and that script sends stderr to /dev/null.  If I
> change the script so that it sends stderr to stdout for this test, the
> script determines the version correctly but it has no effect on the
> problems I describe above.
> 
> 
> --
> Philip C. Roth | +1 865 241-1543 | http://ft.ornl.gov/~rothpc
> 
> 
> 
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-user
> 
> 
> <graycol.gif><ecblank.gif>_______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> hxxps://dev.eclipse.org/mailman/listinfo/ptp-user

-- 
Philip C. Roth | +1 865 241-1543 | http://ft.ornl.gov/~rothpc





Back to the top