Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-user] Job submission problems with PTP 5.0.3 and Torque RM

Thanks.  Unfortunately, with respect to actually submitting the job to the queue, the new Keeneland-specific RM is showing the same behavior as the generic PBS RM (though this is with everything else at 5.0.3).  I'll try the 5.0.4 release candidate (?) version.

Phil


On Nov 1, 2011, at 16:50 , Greg Watson wrote:

> Both perspectives are the same from this regard. You can run programs from either, but the System Monitoring perspective is the only one with views that work with the new RM framework. Views in the Parallel Runtime perspective only work with the legacy RMs.
> 
> Regards,
> Greg
> 
> On Nov 1, 2011, at 4:37 PM, Roth, Philip C. wrote:
> 
>> 
>> Greg,
>> 
>> Thanks.  It is Keeneland, so I'll be glad to see where the differences are between my current config and yours.
>> 
>> Regarding using the System Monitoring perspective - I want to be able to run programs, not just see the system status.  Isn't Parallel Runtime the right perspective for this?
>> 
>> Phil
>> 
>> 
>> On Nov 1, 2011, at 16:28 , Greg Watson wrote:
>> 
>>> Phil,
>>> 
>>> Is this keeneland you're trying to use or some other machine? If it is keeneland, you can add a new update site "hxxp://download.eclipse.org/tools/ptp/updates/indigo_sr2", then open PTP. You should see "PTP Contributed Resource Manager Definitions". If you load this plugin, I've added a definition for keeneland which allowed me to monitor and submit jobs. I'd be keen (pun intended) for any feedback.
>>> 
>>> Regarding your other comments. I'm not sure the properties view is working with the RM view currently, though it should be. Also, you should be using the "System Monitoring" perspective with the PBS RMs, not the "Parallel Runtime" perspective. This perspective has a whole lot of new views showing system and job status.
>>> 
>>> Regards,
>>> Greg
>>> 
>>> 
>>> On Nov 1, 2011, at 3:49 PM, Roth, Philip C. wrote:
>>> 
>>>> 
>>>> Hello all,
>>>> 
>>>> After the recent 5.0.3 update, I'm back to trying to get PTP working well on a cluster to which I have access.  The cluster uses Torque 2.5.7 for batch queue software.  PTP 5.0.3 and Eclipse 3.7.1 don't seem to be able to completely talk to this Torque installation, and there are other problems that I can't seem to diagnose.  Perhaps someone else has figured these out already, or has some suggestions about how to fix or work around these problems?
>>>> 
>>>> For the following description I'm in the Parallel Runtime perspective, and working with a local MPI-based C++ project.
>>>> 
>>>> The first hint something is wrong is that it isn't clear whether the PBS-Generic-Batch resource manager is fully started or not.  I created a stock PBS-Generic-Batch RM.  I use the context menu to start the RM, and after a second or so the RM icon changes from grey to green.  However, if I select it and look at the properties view, the properties view still indicates the RM is in the STOPPED state, with "num machines" and "num queues" both 0.  Furthermore, nothing shows up in the "Machines" or "Jobs list" views.  These views seem contradictory - the RM view showing me it has started but others suggesting not.
>>>> 
>>>> I'm able to create a Run Configuration for my program.  Interestingly, the set of available queues in the "Run Configuration" dialog's Resources tab is the set of queues on our system, so PTP must have been able to obtain the correct set of queues.
>>>> 
>>>> If I attempt to Run my Run Configuration (i.e., submit it to the batch queue), the progress view gets as far as saying 'submit-batch' with a large hex number (a GUID?) but sticks at 75%.  PTP creates a file in my home directory named $(GUID)managed_file_for_script, but with a different GUID than the one shown with the submit-batch progress bar.  
>>>> 
>>>> Eventually I have to cancel the submit-batch operation in the progress view.  When I do so, a message is displayed to the screen and written to the workspace log file saying that the qsub command failed because it couldn't find the batch script.  The message shows the command was trying to use the path $HOME$HOME$GUIDmanaged_file_for_script (i.e., the home directory path is listed twice).  I can't see how to modify the PBS-Generic-Batch XML file to keep it from building the path with $HOME twice.
>>>> 
>>>> Just to explore, I created the directories and a symlink so that $HOME$HOME existed and pointed to $HOME.  PTP was able to submit the job but it produced no output to the Console view nor did it change the Machines or Jobs list views.
>>>> 
>>>> Does anyone have any ideas?
>>>> 
>>>> Phil Roth
>>>> 
>>>> P.S., using diagnostics advice given previously on this list, I found that the LML_da_driver.pl script is not correctly finding the version of Torque.  When given the --version flag, the qstat command with Torque 2.5.7 writes its output on stderr, and that script sends stderr to /dev/null.  If I change the script so that it sends stderr to stdout for this test, the script determines the version correctly but it has no effect on the problems I describe above.
>>>> 
>>>> 
>>>> -- 
>>>> Philip C. Roth | +1 865 241-1543 | hxxp://ft.ornl.gov/~rothpc
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ptp-user mailing list
>>>> ptp-user@xxxxxxxxxxx
>>>> hxxps://dev.eclipse.org/mailman/listinfo/ptp-user
>>> 
>>> _______________________________________________
>>> ptp-user mailing list
>>> ptp-user@xxxxxxxxxxx
>>> hxxps://dev.eclipse.org/mailman/listinfo/ptp-user
>>> 
>> 
>> -- 
>> Philip C. Roth | +1 865 241-1543 | hxxp://ft.ornl.gov/~rothpc
>> 
>> 
>> 
>> _______________________________________________
>> ptp-user mailing list
>> ptp-user@xxxxxxxxxxx
>> hxxps://dev.eclipse.org/mailman/listinfo/ptp-user
> 
> _______________________________________________
> ptp-user mailing list
> ptp-user@xxxxxxxxxxx
> hxxps://dev.eclipse.org/mailman/listinfo/ptp-user
> 

-- 
Philip C. Roth | +1 865 241-1543 | http://ft.ornl.gov/~rothpc





Back to the top