Simon,
Would the resolve attribute even work? Assuming replaceEnvironment worked, if the environment is cleared then there will no longer be a PATH or MMCS_SERVER_IP to resolve. It seems like you need to be able to clear the environment except for some selected vars, which would be difficult to implement. I think you really need a combination of replaceEnvironment and being able to look up environment vars from the remote target prior to the environment being cleared. Then you could do something like:
<submit-batch …. replaceEnvironment="true> ... <environment>PATH=${remote_get_env:PATH}</environment > <environment> MMCS_SERVER_IP =${remote_get_env:MMCS_SERVER_IP}</environment > … </submit-batch>
To get this to work we'd need to 1) check the implementation of replaceEnvrionment to ensure that it is working as documented, and 2) implement a variable resolver called "remote_get_env" (or similar) that would allow lookup of remote environment variables.
We could target this for Juno as I don't think we have time to implement and test this for SR2.
How does this sound?
Greg
On Feb 5, 2012, at 9:04 PM, Simon Wail wrote: Roland,
I don't think the "replaceEnvironment"
option will work as I need it to. It doesn't seem to work as described
in the documentation for remote shell commands. The documentation
says:
"If the environment set on the
command should entirely replace the shell environment (the default behavior
is to append the command environment), set replaceEnvironment to
true."
This seems to describe what I want,
but doesn't work. I'm not sure even if it works for local commands
(I haven't tested that). Also to completely replace the remote shell
environment, would mean the remote PATH variable would have to be defined
in the launch configuration and the necessary system path values might
not be known to the user. Also the MMCS_SERVER_IP would definitely
not be a user set value. Therefore this is not really a viable option,
unless there was some way to retain selected remote shell environment variables
like what Greg originally suggested.
In terms of reliability, I think using
the "resolve" attribute for each command line argument should
be OK. At some point you have to rely on the JAXB author knows what
they're doing. It is not as if general users will be defining their
own resource managers (at least I wouldn't expect so). I think this
solution is the most viable and provides the JAXB author with the flexibility
they need for the nuances of different resource managers and systems.
Regardless of any consensus on this
issue, I assume the necessary code changes will not make the current SR2
build schedule :-(
Regards,
Simon Wail, Ph.D
|
HPC Specialist
|
<Mail Attachment.gif>
| IBM Research Collaboratory
for Life Sciences - Melbourne
|
<Mail Attachment.gif>
|
phone:
| +61 3 9035-4341
fax: +61 3 8344-9130
|
address:
| VLSCI, Gnd Floor, 187 Grattan St
|
| Carlton VIC 3010 Australia
|
email:
| simon.wail@xxxxxxxxxxx |
|
From:
Roland Schulz <roland@xxxxxxx>
To:
Parallel Tools Platform
general developers <ptp-dev@xxxxxxxxxxx>
Date:
01/02/2012 05:16 PM
Subject:
Re: [ptp-dev]
Remote tools now escapes all non-alphanumeric characters which causes problems
for JAXB resource manager commands
Sent by:
ptp-dev-bounces@xxxxxxxxxxx
On Tue, Jan 31, 2012 at 10:55 PM, Simon Wail <simon.wail@xxxxxxxxxxx>
wrote:
Unfortunately the only way to fix this
is to allow the unescaping of arguments to the remote commands. The
problem with Greg's solution, even if it did work with remote environment
variables, is that I need to use the "env -" command to entirely
clear the remote shell environment before invoking the "sbatch"
command.
Why wouldn't the replaceEnvironment option be enough?
Although there are several environment
variables that I do need to pass to "sbatch" - PATH and MMCS_SERVER_IP,
but once the "env -" command is executed, these will be empty
as well. The only way to do what I want is to set PATH and MMCS_SERVER_IP
as part of the "env -" command which can use the shell environment
values before they are cleared. The reason I need to clear the entire
shell environment is because the BG/P has a 2K limit on the size of the
environment passed to it, and often a standard shell environment can be
bigger than this.
In the JAXB specification, there is already the "resolve" attribute
to enable/disable PTP variable substitution of command/script arguments
(<arg type>). Could this same attribute be used to allow the
unescaping of the arguments in the remote command. Then by default
all special characters would be escaped, and only those specified by the
JAXB author (who should know what they're doing) would not be escaped.
This hopefully solves the reliability issues raised by Roland.
Yes. If it would be per argument (and not per whole command)
it could be made reliable. But I would still think it would be better to
do it correctly. Because as soon as the next person needs:
<arg>PATH=$PATH:${ptp_rm:some_path#value}</arg>
it wouldn't be reliable, because
it wouldn't be guaranteed that the user provided string doesn't
contain special characters.
Roland
In terms of delaying my contribution, this problem only affects the Blue
Gene/P RM. I can still contribute the LML code and Blue Gene/Q RM.
Hopefully I can do this soon after some further testing.
Regards,
Simon Wail, Ph.D
|
HPC Specialist
|
| IBM Research Collaboratory
for Life Sciences - Melbourne
|
|
|
From:
Roland Schulz <roland@xxxxxxx>
To: Parallel
Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date: 01/02/2012
05:49 AM
Subject: Re:
[ptp-dev] Remote tools now escapes all non-alphanumeric characters which
causes problems for JAXB resource manager commands
Sent by: ptp-dev-bounces@xxxxxxxxxxx
On Tue, Jan 31, 2012 at 12:43 PM, Greg Watson <g.watson@xxxxxxxxxxxx>
wrote:
I don't think Simon can use IRemoteProcessBuilder#environment() directly,
since this is being specified in the JAXB RM XML file.
Ideally, he should be able to do something like:
<submit-batch name="submit-batch" directory="${ptp_rm:directory#value}"
waitForId="true" replaceEnvironment="true">
<arg>sbatch</arg>
<arg>${ptp_rm:managed_file_for_script#value}</arg>
<environment name="PATH"
value="${env_var:PATH}" />
<environment name="MMCS_SERVER_IP"
value="${env_var:MMCS_SERVER_IP}" />
…
However, there are a couple of problems with this. The first one is I found
a bug in the JAXB environment handling code. Hopefully this is fixed now.
The second is that even if ${env_var:xxx} worked (I haven't tried it),
it would only look at the local environment. There would probably need
to be a special variable handler, say "remote_env_var", to look
up the remote environment.
I think this would be the "correct" way to do what Simon wants,
but it would take some effort to implement. Other than this, I don't see
any other way of fixing Simon's problem, apart from either providing some
way to override the shell escaping, or reversing the change.
I'm strongly in favor of doing it the correct way - even if that causes
a delay for Simon's contribution. Using shell character is unreliable.
It is never avoidable that some string or argument contains special characters.
Thus even with the suggestion of overriding the escaping, code which would
use this option, would be unreliable. And I think it is much more important
to be reliable than adding new features.
Roland
Greg
On Jan 31, 2012, at 1:46 AM, Roland Schulz wrote:
Simon,
we made that change on purpose to avoid unintended consequences if
special shell characters are part of a directory or command argument.
E.g. the original approach wouldn't work if the <generated
batch file> would contain a dollar sign somewhere in its path.
Why don't you use org.eclipse.ptp.remote.core.IRemoteProcessBuilder.environment()
to set the environment variables?
Roland
On Tue, Jan 31, 2012 at 1:10 AM, Simon Wail <simon.wail@xxxxxxxxxxx>
wrote:
Dev Team,
I've been testing my SLURM - Blue Gene JAXB resource manager in preparation
for code submission and hit a major problem. My original development
of the resource manager was done using PTP 5.0.1 and it was working fine.
Now I've been testing it under 5.0.4 and it doesn't work anymore.
I've investigated the problem and it seems to be the way remote commands
are passed to ssh using the remotetools code. The command I want
to execute is "env - PATH=$PATH MMCS_SERVER_IP=$MMCS_SERVER_IP sbatch
<generated batch file>" and this is defined in the JAXB resource
manager file as:
<submit-batch name="submit-batch"
directory="${ptp_rm:directory#value}" waitForId="true">
<arg>env</arg>
<arg>-</arg>
<arg>PATH=$PATH</arg>
<arg>MMCS_SERVER_IP=$MMCS_SERVER_IP</arg>
<arg>sbatch</arg>
<arg>${ptp_rm:managed_file_for_script#value}</arg>
...
This all works fine under 5.0.1 and the remote shell command becomes:
tcsh -c /bin/sh -c 'echo "PID=$$ PIID=24" > /dev/pts/11; export
"MESG2=World"; export "MESG=Hello World"; cd /vlsci/IBM/swail/;
env - PATH=$PATH MMCS_SERVER_IP=$MMCS_SERVER_IP sbatch /vlsci/IBM/swail//3648df43-aae1-4226-ab56-2263458cb11cmanaged_file_for_script;
'
Now when moving to 5.0.4 the remote shell command becomes:
tcsh -c /bin/sh -c 'echo "PID=$$ PIID=27" > /dev/pts/17; export
"MESG2=World"; export "MESG=Hello World"; cd /vlsci/IBM/swail/&&
env - PATH\=\$PATH MMCS_SERVER_IP\=\$MMCS_SERVER_IP sbatch /vlsci/IBM/swail//5252c4b2-de35-47e6-ade9-de855f30904dmanaged_file_for_script'
Notice the escape "\" before the equals and dollar signs for
setting PATH and MMCS_SERVER_IP. This is breaking the "env"
command - the "env -" command allows you to execute a subsequent
command with a cleared environment. By using "env - PATH=$PATH
sbatch ..." it allows me to clear the existing shell environment,
set the PATH environment variable to the previous value (before the environment
is cleared) and then execute the sbatch command. This then provides
a limited environment to the sbatch command which is required for the Blue
Gene/P system.
With the escapes now in the command, the PATH environment variable is not
set correctly and therefore the sbatch command is not found. This
in causing the job submission to fail.
I've looked at the recent differences in the PTP code between 5.0.1 and
5.0.4 and found the following in org.eclipse.ptp.remote.remotetools.core.RemoteToolsProcessBuilder:
1) In the class constructor a set of "trusted" characters is
created - this include all alphanumeric characters plus / . _ -
2) In the "start" method the remote command is built and escapes
any character NOT in the trusted set - see line 130
I believe this is how the "\"s are now appearing in the remote
command and this is causing my job submission to fail.
I've also tried to set the "resolve=false" attribute for each
of the "arg" lines in the JAXB file but this makes no difference.
I believe it might not be correct to escape all non-alphanumeric
characters in the remote commands, but if this is felt to be necessary,
then maybe the "resolve" attribute in the XML could override
this to enable the remote command to be sent as desired.
Regards,
Simon Wail, Ph.D
|
HPC Specialist
|
| IBM Research Collaboratory
for Life Sciences - Melbourne
|
|
|
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537,
ORNL PO BOX 2008 MS6309
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537,
ORNL PO BOX 2008 MS6309_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________ ptp-dev mailing list ptp-dev@xxxxxxxxxxx https://dev.eclipse.org/mailman/listinfo/ptp-dev
|