Bug 203500 - [ssh][ftp][encodings] Error creating a folder or file with non-ASCII characters (converted to ?)
Summary: [ssh][ftp][encodings] Error creating a folder or file with non-ASCII characte...
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: 2.0.1   Edit
Assignee: Martin Oberhuber CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: investigate
Depends on:
Blocks: 191601 181573 236334
  Show dependency tree
 
Reported: 2007-09-14 17:37 EDT by Martin Oberhuber CLA
Modified: 2008-06-09 20:12 EDT (History)
3 users (show)

See Also:
kmunir: review+
mober.at+eclipse: review? (xuanchen)


Attachments
Patch to support encodings in FTP files and paths (6.94 KB, patch)
2007-09-18 07:39 EDT, Martin Oberhuber CLA
no flags Details | Diff
Patch to support encodings in SSH Sftp files and paths (14.46 KB, patch)
2007-09-18 07:59 EDT, Martin Oberhuber CLA
no flags Details | Diff
Updated patch warning about encoding problems (34.39 KB, patch)
2007-09-26 18:49 EDT, Martin Oberhuber CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2007-09-14 17:37:39 EDT
Burak Kulakli found while testing 2.0.1:

Folder name allows Turkish (Ex. ğ) characters but when you try to save a file in that folder, it gives error.

I tried connecting SSH-Only, then under My Home/aatmp:
New > Folder > Name = "ağb"
It keeps telling me "Folder already exists".

Under dstore, this works correctly. It seems that SSH converts "ağb" into "a?b" and finds that resource already existing.

FTP has the same issue as SSH. The FTP console shows 
  MKD a?b
  521 "/folk/mober/aatmp/a?b" directory exists


-----------Enter bugs above this line-----------
TM 2.0.1 Testing
installation : eclipse-SDK-3.3 (I20070625-1500), cdt-4.0.0, emf-2.3.0
RSE install  : Download RSE-2.0.1RC1: RSE-SDK,examples,tests,discovery,terminal
java.runtime : Sun 1.6.0_01-b06
os.name:     : Windows XP 5.1, Service Pack 2
------------------------------------------------
systemtype   : Linux SSH-Only
------------------------------------------------
Comment 1 Martin Oberhuber CLA 2007-09-14 17:38:57 EDT
Burak also found: File names allows turkish characters (Ex. ğ) but when you save the file, it becomes "?"

Suppose this is the same underlying problem, updating Summary.
Comment 2 Martin Oberhuber CLA 2007-09-18 07:39:06 EDT
Created attachment 78634 [details]
Patch to support encodings in FTP files and paths

Attached patch supports encodings in FTP. Note that due to a limitation in Commons Net, FTP commands will be encoded with the same encoding so this will not work for encodings which are not compatible with 8-bit ASCII (UTF-16 and other wide encodings specifically).
Comment 3 Martin Oberhuber CLA 2007-09-18 07:59:30 EDT
Created attachment 78636 [details]
Patch to support encodings in SSH Sftp files and paths

Attached patch fixes the issue for SSH Sftp.
In the future, recoding should be done inside Jsch.
Comment 4 Martin Oberhuber CLA 2007-09-18 08:03:03 EDT
Note that for SSH and FTP, we cannot find out the remote default encoding. Thus when no encoding has been specified by the user, we fall back to the local client default encoding.

This should ensure that the characters which users typically use on the client can actually by encoded to some form of byte streams. But those encodings may not be appropriate for the actual target platform.

I wonder if it might be better to throw an exception when a path can not be encoded (resulting in a question mark, ? in the file; or 16-bit wide encoding on FTP) such that users can review and update their encoding settings.
Comment 5 Martin Oberhuber CLA 2007-09-26 18:49:46 EDT
Created attachment 79249 [details]
Updated patch warning about encoding problems

Attached updated patch fixes both FTP and Sftp to honor the specified encoding.

As discussed during our F2F meeting, they now warn in case the user tries to modify the remote file system (create, rename, copy, delete, upload) with a local Unicode file name that can not be properly expressed with the given encoding (exception is thrown; text of the exception is not yet externalized).

For Sftp, a bug in Jsch always encodes with the local platform default encoding; therefore, if the requested remote encoding is different, we need to emulate and recode. Unfortunately, there are combinations related to he local default encoding (particularly the normal Windows cp1252), where some bytes can not be properly expressed. This leads to some unicode characters (particularly "č") not being able to be used on a local cp1252 / remote utf8 combination. A Jsch bug has been filed for this.

The patch is large, but in the default case (remote encoding == local encoding: this was always the case before the patch), recode() does nothing so the patch should be safe.
Comment 6 Martin Oberhuber CLA 2007-09-26 18:52:39 EDT
I'm committing the patch since Kushal orally agreed to review the patches:

[203500] Support encodings for SSH Sftp and FTP paths
   FTPService
   SftpFileService
   SshConnectorService
   FTPConnectorService
   ISshSessionProvider