|Re: [ptp-user] remote LML|
First, the computer is TERA100, the batch resource manager is based on Slurm + special commands. I developed LML DA scripts that are working well if I run them (or if eclipse PTP) runs them from the supercomputer.
So the Eclipse PTP Resource Manager is homemade based on the PBS/openmpi one. It works fine when I run Eclipse from the supercomputer.
But, this is not what we want for users… Eclipse should not run on supercomputers J !
So, I am trying to make everything remote using the rdt-server.
Again, basic RSE operations are working fine…
When I launch from Eclipse on the station, the resource manager, it queries correctly the names of the queues, etc, and it stops when it requires the perl LML_da_driver.pl script.
What I am seeing is that (running rdt-server verbosely) Eclipse does send a correct request.xml but it is set to a 0 size in the tmp_xxx/request.xml file until I stop the server from the Eclipse progress window where the script is correctly runs (but the station’s client is stop so it is useless J ).
(at the end of the .xml there are 2 lines
Is it more clear now ?
I think I am not far from my goal, I am planning to see what is happening on the rdt-server but I tried this one year ago and I do remember it is not that easy ! If someone has an idea, I’ll be happy then.
PS: Also what I’d like to see is if we are able to debug in batch… (i.e. to wait until the debug job is launched…) Does that make sense ?
I'm not sure what you mean by "never ends"? Do you see anything in the system monitoring view?
You can debug the LML_da_driver.pl script by creating a file called .LML_da_options in the .eclipsesettings directory on the remote machine (note the "." at the beginning of both). This file should contain one line: keeptmp=1
Each time the driver script runs, it will create a directory in .eclipsesettings called tmp_xxx (where xxx is some system dependent stuff). The content of this directory contains the output from running the script, log, and error files.
Finally I got my LML working when I run directly on the supercomputer. The step I’d like to go now is to have my eclipse running on my workstation.
What I have : eclipse indigo SR2 + PTP 5.0.7 + rse + rdt.
What I arrive to do : I have the remote shell, remote file, remote compilation, all working well.
What I want
1) To be able to monitor the supercomputer from my station
2) To be able to run programs on the supercomputer from my station
3) To be able to debug programs on the supercomputer from my station
4) To be able to analyze (tau / GEM) programs on the supercomputer from my station…
I do not know if step 4 if feasible but from my understanding steps 1-3 should work.
I have the following problem with step 1 :
I open the system monitoring perspective, I create the resource manager, I configured it I presume in the right way since a process runs on my supercomputer namely LML_da_driver.pl whenever I start the resource manager.
But it runs as if it were running without stdio input given, so it never ends ! (If I run on the supercomputer LML_da_driver.pl with a good input, it runs as expected).
How can I debug that ? Do you have any idea on what I am doing wrong ?