Bug 332275 - "#" in local file name is changed to "#035" after copy
Summary: "#" in local file name is changed to "#035" after copy
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.3 M5   Edit
Assignee: David McKnight CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 334128 334129
  Show dependency tree
 
Reported: 2010-12-10 02:26 EST by Kenya Ishimoto CLA
Modified: 2011-01-12 11:18 EST (History)
1 user (show)

See Also:
dmcknigh: review? (ddykstal.eclipse)


Attachments
patch to check path validity before escaping (2.21 KB, patch)
2010-12-10 16:44 EST, David McKnight CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kenya Ishimoto CLA 2010-12-10 02:26:40 EST
Build Identifier: 3.2

When a file is copied from local file system, character '#' in the file name is changed to '#035'. For example, when a file "abc#xyz" is copied from Local system in Remote Systems view to a folder in Project Explorer, the file name is changed to "abc#035xyz".

The behavior causes problem in Rational Developer for System z. The same symptom occurs when copying a local file to our z/OS MVS file subsystem in Remote Systems view. Customer uses '#' in file name because it is allowed in Windows, z/OS, and Linux. When the customer copied the file from local file system to remote z/OS MVS file subsystem, the name is changed. Also, extending file name for three characters per one '#' easily exceeds data set member name limitation of max 8 characters.

This problem was reproduced in Windows XP client and Linux client.

Reproducible: Always

Steps to Reproduce:
1. In Remote Systems view, expand Local system > Local Files and select a file which name contains character '#'
2. Drag the file and drop in Navigator view or Project Explorer view
Comment 1 David McKnight CLA 2010-12-10 16:41:51 EST
The transformation of the name is a result of RSE's character escape mechanism that gets used when downloading files with special characters that aren't okay on the local machine.  In this case, the escape character gets escaped - #35 refers to ASCII 35 (i.e. '#') however # is a valid character.
Comment 2 David McKnight CLA 2010-12-10 16:44:24 EST
Created attachment 185002 [details]
patch to check path validity before escaping

One approach to resolving this is to validate the path of the file first.  If the path is valid, then we shouldn't have to do any escaping.  As far as I know, this shouldn't cause any adverse effects although I'll want to be sure before committing anything.
Comment 3 David McKnight CLA 2010-12-10 16:45:15 EST
Dave, do you see any problems with this approach?
Comment 4 Martin Oberhuber CLA 2010-12-14 03:14:31 EST
With this, I think you'll have a problem unescaping.

Because when you see eg "#255" in the cache you won't know whether it was the literal "#255" on the remote or whether it is a character that was escaped.

Every escape mechanism needs to have a mechanism describing the escape character.

I think that in order to fix this, you'll need to ensure that the unescape happens at the right time.
Comment 5 David McKnight CLA 2010-12-14 16:37:55 EST
(In reply to comment #4)
> With this, I think you'll have a problem unescaping.
> 
> Because when you see eg "#255" in the cache you won't know whether it was the
> literal "#255" on the remote or whether it is a character that was escaped.
> 
> Every escape mechanism needs to have a mechanism describing the escape
> character.
> 
> I think that in order to fix this, you'll need to ensure that the unescape
> happens at the right time.

When the file gets uploaded in RSE, we don't upload based the temp file name (i.e. with the escaped characters) but rather on the IFile metadata that stores the actual name.
Comment 6 Martin Oberhuber CLA 2010-12-15 06:04:39 EST
Ok, 

but another potential problem is getting a name clash between a remote file that's named "#255" literally and another one that just translates to "#255" in the cache.

Again, changing the escaping just addresses the tip of the iceberg, and it's not unlikely that similar issues come up with other special characters than # in the future. I think that to address the problem, the escaping / unescaping must always be symmetric.
Comment 7 David McKnight CLA 2010-12-17 14:54:57 EST
(In reply to comment #6)
> Ok, 
> 
> but another potential problem is getting a name clash between a remote file
> that's named "#255" literally and another one that just translates to "#255" in
> the cache.
> 
> Again, changing the escaping just addresses the tip of the iceberg, and it's
> not unlikely that similar issues come up with other special characters than #
> in the future. I think that to address the problem, the escaping / unescaping
> must always be symmetric.


An example of where there would be a clash is if you had the following two files:

abc#062def.txt
abc>def.txt

Using the old algorithm, "abc#062def.txt" will get mapped to "abc#035062def.txt" even though it is a valid Windows filename.  The other file, "abc>def.txt", will get mapped to "abc#062def.txt".  The mappings are different for each.  One problem with this is that the first file didn't need to be mapped (since it's valid) and, as a result, the editor will display the unnecessarily mapped filename.  The bigger problem is that, when an RSE copy occurs, the unnecessarily mapped file name gets used (rather than the metadata filename) and this mapping doesn't make sense as a copied filename.


Using the new algorithm, "abc#062def.txt" is considered a valid filename on Windows, so the name would stay the same.  The other file, "abc>def.txt", is not considered a valid filename on Windows, so it's name would get mapped to "abc#062def.txt" (i.e. the same name).  As far as the upload mapping goes, since we're using the metadata to determine the actual remote path, there are no problems with this.  The problem is that downloading one remote file will replace the other in the temp file cache if it's already there, although having a scenario with two files in the same directory that have names that map to each other would be fairly unlikely.  Even if this does happen when the original temp file is dirty, RSE will still prompt the user to save (and hence upload) it before replacing it with the other mapped file.

Based on this, I think my proposed solution (file validation) is certainly better than what we had before.  Is there a better solution to the customer problem that I haven't considered here?
Comment 8 Martin Oberhuber CLA 2010-12-17 15:10:57 EST
(In reply to comment #7)
> unnecessarily mapped filename.  The bigger problem is that, when an RSE copy
> occurs, the unnecessarily mapped file name gets used (rather than the metadata
> filename) 

Why is this? IMO this is the bug to get fixed. If metadata is used in other cases, why isn't it used here?
Comment 9 David McKnight CLA 2010-12-17 15:27:16 EST
(In reply to comment #8)
> (In reply to comment #7)
> > unnecessarily mapped filename.  The bigger problem is that, when an RSE copy
> > occurs, the unnecessarily mapped file name gets used (rather than the metadata
> > filename) 
> 
> Why is this? IMO this is the bug to get fixed. If metadata is used in other
> cases, why isn't it used here?

Prior to the name validation approach I did try an approach that, during a copy, makes use of the correct name.  However, I still think the better approach is not to escape things that aren't supposed to be escaped in the first place.
Comment 10 Martin Oberhuber CLA 2010-12-17 15:59:40 EST
But you'll always have to escape the escape character somehow, or you'll run into the situation described in bug 160100. Ideally, find a new way of escaping and solve both issues with one change.

Allowing a name clash in the cache is asking for trouble and doesn't seem wise in my eyes.

I also can't follow your argument saying "the cache name's not relevant since we'll use the metadata on upload". If this is true, then why isn't the # correctly unescaped on upload. And why can't you generate a file name for the cache that's guaranteed to be unique, e.g. myfile.txt, 1~myfile.txt, 2~myfile.txt, ...
Comment 11 David McKnight CLA 2010-12-17 16:34:58 EST
(In reply to comment #10)
> I also can't follow your argument saying "the cache name's not relevant since
> we'll use the metadata on upload". If this is true, then why isn't the #
> correctly unescaped on upload. And why can't you generate a file name for the
> cache that's guaranteed to be unique, e.g. myfile.txt, 1~myfile.txt,
> 2~myfile.txt, ...

The escaping is intended for cases where Windows doesn't support a filename that is supported on another system.  For example, "abc>def.txt" is a valid filename on Linux but it's an invalid filename on Windows.  So when we download the file, we can't download it to RemoteSystemTempFiles/.../"abc>def.txt".  Instead we map the temp file name to RemoteSystemTempFiles/.../"abc#062def.txt" while maintaining the remote file name (i.e. "abc>def.txt" as IFile metadata).  When we do an upload, we don't use the name of the file stored in RemoteSystemTempFiles (i.e. "abc#062def.txt"), instead we use the name stored in the IFile metadata.  So, on upload, we're not actually unescapting "abc#062def.txt" back to "abc>def.txt" - we're just using the stored remote file name.

That said, I think the confusion here is in the terminology.  When you say 'upload', do you mean the 'cross-system copy'?  By that, I mean the case where we download a remote file to the temp files cache and, from there, we copy to either another project or another remote file system.  If that's what you mean by upload, then indeed, the file name is not being unescaped and we're not using the metadata there - in the case of a copy to a project the remote file name wouldn't work anyway if those characters are invalid.  Unfortunately, if we're copying from the temp files to a generic Eclipse project, we lose the metadata that would store the original remote filename since such projects are outside of the scope of RSE's file handling.

For this particular customer scenario, we're needlessly complicating things because the original filename was valid all along in the following places:
1) on the remote system
2) in the RemoteSystemTempFiles cache
3) in the Navigator or Project Explorer view

If the customer had used a filename that actually needed to be escaped, then the file in the Navigator would have had to be escaped as well.
Comment 12 David McKnight CLA 2011-01-04 10:07:13 EST
Martin, did you get a chance to look at my last comment?
Comment 13 Martin Oberhuber CLA 2011-01-04 10:37:29 EST
(In reply to comment #11)
The explanation makes sense, and I see now how your proposed change is not evil, so I think I can live with your proposed change. I'm still confused about this statement:

> 'upload', do you mean the 'cross-system copy'?  By that, I mean the case where
> we download a remote file to the temp files cache and, from there, we copy to
> either another project or another remote file system.  If that's what you mean
> by upload, then indeed, the file name is not being unescaped and we're not
> using the metadata there - in the case of a copy to a project the remote file
> name wouldn't work anyway if those characters are invalid.

I think that in this case, the original filename from metadata should be used for forming a candidate target filename, and the test what characters are invalid should be done against the target system (and not the current host system). But that's a proposed improvement beyond what we have today so this could be tracked with a separate bug.

> Unfortunately, if we're copying from the temp files to a generic Eclipse
> project, we lose the metadata that would store the original remote filename
> since such projects are outside of the scope of RSE's file handling.

I don't understand this. Any copy operation A -> B should support using a different name in B ?
Comment 14 David McKnight CLA 2011-01-04 10:53:30 EST
(In reply to comment #13)

> I think that in this case, the original filename from metadata should be used
> for forming a candidate target filename, and the test what characters are
> invalid should be done against the target system (and not the current host
> system). But that's a proposed improvement beyond what we have today so this
> could be tracked with a separate bug.

I agree and it makes sense to open a separate bug for this.  
> 
> > Unfortunately, if we're copying from the temp files to a generic Eclipse
> > project, we lose the metadata that would store the original remote filename
> > since such projects are outside of the scope of RSE's file handling.
> 
> I don't understand this. Any copy operation A -> B should support using a
> different name in B ?

All I mean is that if a user copies a remote file with the name "abc>def.txt" via RSE to an arbitrary local project (on a Windows client), the file in the local project cannot be called "abc>def.txt" since those characters are invalid on local.  So either an escaping should be performed or the operation should be disabled.  In the former case, once the escaped file is copied to the local project, RSE is not managing the metadata to keep track of the original filename.
Comment 15 David McKnight CLA 2011-01-04 11:05:20 EST
I've committed the change to cvs.

Kenya, do you require a backport of this?
Comment 16 Kenya Ishimoto CLA 2011-01-12 02:51:47 EST
(In reply to comment #15)
> I've committed the change to cvs.
> 
> Kenya, do you require a backport of this?

(In reply to comment #15)
> I've committed the change to cvs.
> 
> Kenya, do you require a backport of this?

David, yes, we need backport for both 3.2.X and 3.0.X maintenance stream. 3.2.X for our latest product release. 3.0.X for maintenance release for the customer originally reported the problem.
Comment 17 David McKnight CLA 2011-01-12 11:18:48 EST
I've opened bug 334128 for the 3.2.x backport and bug 334129 for the 3.0.x backport.