Following your suggestion, I have managed to create a new PBS RM and start it successfully.
Perhaps the root cause is the wrong path to the batch script provided to qsub command, e.g.,
the correct script path should be "/home/jiangjie/7bafb232-c374-4eec-8687-6ffc653d86a1managed_file_for_script",
does not exist. It seems that PTP creats and uses a wrong job launch parameter.
Note the batch script itself is correct, I can submit it with qsub command outside PTP.
> From: g.watson@xxxxxxxxxxxx
> Date: Wed, 21 Mar 2012 12:57:34 -0400
> To: ptp-dev@xxxxxxxxxxx
> Subject: Re: [ptp-dev] BUG report of PBS JAXB resource manager
>
> Jie,
>
> Yes, the RM is a bit misnamed. It should really be Torque-Generic-Batch.
>
> In order to use PBS, you need to import the RM definition into your workspace. Go to Import..., then open Resource Managers>Resource Manager Definition From Plug-in. Choose the PBS-Generic-Batch RM from the combo. This should create a folder in your workspace called resourceManagers. Open this and double click on the xml file. Go to the end of the file, and change line 828 from '<monitor-data schedulerType="TORQUE">' to '<monitor-data schedulerType="PBS">'.
>
> Now, go to the System Monitoring perspective and add a new RM. Use the definition you just edited (should have "(1)" after the name).
>
> You can also change the batch file that gets created to see if this fixes the hanging problem.
>
> If you're planning to change the XML after you have used it to create a RM, go to Preferences then select Parallel Tools>Resource Managers> Configurable Resource Manager and check the "Always load XML from URL" option. Stopping and restarting the RM will then reload the new definition.
>
> Cheers,
> Greg
>
> On Mar 21, 2012, at 10:03 AM, JiangJie wrote:
>
> >
> > Hi all,
> >
> > Recently I have been trying PTP_5.0.6 with PBS-Generic-Batch(LML_JAXB) resource manager.
> > The underlying RMS is PBS Pro 11.2.0 and everything works well.
> > Here are two problems:
> >
> > 1. With the default installation and configuration, while starting the PBS-Generic-Batch resource manager,
> > a error window popup with error message:
> > "LML DA Driver (Local) has encountered a problem. Server finished with exit code 1".
> > And the PBS-Generic-Batch resource manager failed to start.
> >
> > Under $HOME/.eclipsesettings directory, there occurs some temporary directories, for example, tmp_node0_31974 (here node0 is the hostname of the front end node of PBS cluster).
> > There are "report.log" and "request.xml" files under the temporary directory.
> > Following is the contents of report.log:
> >
> > ------------------------------------------------------------------------------------------
> > LLVIEW Data Access Workflow Manager Driver 1.15, starting at (Thu Mar 22 05:09:19 CST 2012)
> > command line args:
> > ------------------------------------------------------------------------------------------
> > LML_da_driver.pl: temporary directory not found, create new directory ./tmp_node0_31947 ...
> > LML_da_driver.pl: tmpdir created (./tmp_node0_31947)
> > LML_da_driver.pl: requestfile=-
> > LML_da_driver.pl: parsing XML requestfile in 0.0027 sec
> > LML_da_driver.pl: check request for rms hint ...
> > LML_da_driver.pl: check_for rms, got hint from request ... (TORQUE)
> > LML_da_driver.pl: check_rms_TORQUE: found pbsnodes by which (/opt/pbs/11.2.0.113417/bin/pbsnodes)
> > LML_da_driver.pl: check_rms_TORQUE: found qstat by which (/opt/pbs/11.2.0.113417/bin/qstat)
> > LML_da_driver.pl: check_rms_TORQUE: PBSpro found
> > LML_da_driver.pl: check_rms_TORQUE: seems not to be a TORQUE system
> > LML_da_driver.pl: rms/TORQUE/da_check_info_LML.pl unable to locate rms TORQUE
> > LML_da_driver.pl: ERROR LML_da_driver.pl: could not determine rms, exiting ...
> >
> >
> > It seems that LML_da_driver.pl assumes that the underlying RMS is TORQUE, not PBS.
> > If I modify the LML_da_driver.pl and force the rms to PBS (just before echoing "check_for rms, got hint from request ..."), everything works well.
> > So how to configure the LML_da_driver so that the correct underlying RMS can be detected?
> >
> > 2. Even after starting the PBS-Generic-Batch resource manager successfully, I still failed to launch a batch job within PTP.
> >
> > Following is the batch script generated by PTP with my job configuration:
> > -------------------------------------------------------------------------
> > #!/bin/bash
> > #PBS -q workq
> > #PBS -N ptp_job
> > #PBS -l nodes=1
> > #PBS -l walltime=00:30:00
> > #PBS -V
> > MPI_ARGS="-np 4"
> > if [ "-np" == "${MPI_ARGS}" ] ; then
> > MPI_ARGS=
> > fi
> > COMMAND=mpirun
> > if [ -n "${COMMAND}" ] ; then
> > COMMAND="${COMMAND} ${MPI_ARGS} /vol/test/demoApp/Debug/testMPI "
> > else
> > COMMAND="/vol/test/demoApp/Debug/testMPI "
> > fi
> > cd /home/jiangjie
> > ${COMMAND}
> >
> > -------------------------------------------------------------------------
> >
> > And following is the configuration output:
> > -------------------------------------------------------------------------
> > Job_Name=ptp_job
> > Resource_List.nodes=1
> > Resource_List.walltime=00:30:00
> > control.address=localhost
> > control.queue.name=workq
> > control.user.name=jiangjie
> > control.working.dir=/home/jiangjie
> > current_controller=Basic.PBS.Settings
> > destination=workq
> > directory=/home/jiangjie
> > enabled_Basic.PBS.Settings=Account_Name Job_Name Resource_List.mem Resource_List.nodes Resource_List.walltime destination export_all mpiCommand mpiCores
> > executableDirectory=/vol/test/demoApp/Debug
> > executablePath=/vol/test/demoApp/Debug/testMPI
> > export_all=-V
> > invalid_Basic.PBS.Settings=script_path
> > managed_file_for_script=/home/jiangjie/home/jiangjie/7bafb232-c374-4eec-8687-6ffc653d86a1managed_file_for_script
> > mpiCommand=mpirun
> > mpiCores=4
> > org.eclipse.debug.core.appendEnvironmentVariables=true
> > org.eclipse.ptp.launch.ATTR_CONSOLE=true
> > org.eclipse.ptp.launch.ATTR_COPY_EXECUTABLE_FROM_LOCAL=false
> > org.eclipse.ptp.launch.ATTR_REMOTE_EXECUTABLE_PATH=/vol/test/demoApp/Debug/testMPI
> > org.eclipse.ptp.launch.ATTR_SYNC_AFTER=false
> > org.eclipse.ptp.launch.ATTR_SYNC_BEFORE=false
> > org.eclipse.ptp.launch.ATTR_SYNC_RULES=[]
> > org.eclipse.ptp.launch.PROJECT_ATTR=testMPI
> > org.eclipse.ptp.launch.RESOURCE_MANAGER_NAME=eeb2b0e5-4035-4131-8f64-7e38ab9d179c
> > ptpDirectory=/home/jiangjie/.eclipsesettings
> > queues=[workq]
> > script=#!/bin/bash
> > #PBS -q workq
> > #PBS -N ptp_job
> > #PBS -l nodes=1
> > #PBS -l walltime=00:30:00
> > #PBS -V
> > MPI_ARGS="-np 4"
> > if [ "-np" == "${MPI_ARGS}" ] ; then
> > MPI_ARGS=
> > fi
> > COMMAND=mpirun
> > if [ -n "${COMMAND}" ] ; then
> > COMMAND="${COMMAND} ${MPI_ARGS} /vol/test/demoApp/Debug/testMPI "
> > else
> > COMMAND="/vol/test/demoApp/Debug/testMPI "
> > fi
> > cd /home/jiangjie
> > ${COMMAND}
> >
> > stderr_remote_path=${ptp_rm:directory#value}/${ptp_rm:Job_Name#value}.e${ptp_rm:@jobId#default}
> > stdout_remote_path=${ptp_rm:directory#value}/${ptp_rm:Job_Name#value}.o${ptp_rm:@jobId#default}
> > valid_Basic.PBS.Settings=Account_Name Job_Name Resource_List.mem Resource_List.nodes Resource_List.walltime bindir control.address control.queue.name control.user.name control.working.dir current_controller destination directory enabled_Basic.PBS.Settings executableDirectory executablePath export_all invalid_Basic.PBS.Settings managed_file_for_script mpiCommand mpiCores ptpDirectory queues script stderr_remote_path stdout_remote_path valid_Basic.PBS.Settings visible_Basic.PBS.Settings
> > visible_Basic.PBS.Settings=Account_Name Job_Name Resource_List.mem Resource_List.nodes Resource_List.walltime destination export_all mpiCommand mpiCores
> > -------------------------------------------------------------------------------
> >
> > Note the output line
> > "managed_file_for_script=/home/jiangjie/home/jiangjie/7bafb232-c374-4eec-8687-6ffc653d86a1managed_file_for_script".
> > The correct path to the batch script should be "/home/jiangjie/7bafb232-c374-4eec-8687-6ffc653d86a1managed_file_for_script"!
> > And the console also outputs that "submit-batch: 9954302e-5f0c-4bbd-bd17-805df85936d1: qsub /home/jiangjie/home/jiangjie/7bafb232-c374-4eec-8687-6ffc653d86a1managed_file_for_script".
> > Maybe it is the wrong script path that causes the job launch hangs.
> >
> > How to fix it?
> >
> >
> > Regards,
> > Jie
> >
> > _______________________________________________
> > ptp-dev mailing list
> > ptp-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/ptp-dev
>
> _______________________________________________
> ptp-dev mailing list
> ptp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-dev