Bug 181573 - DBCS3.3: DBCS file/directory name are garbled in RSE's FTP
Summary: DBCS3.3: DBCS file/directory name are garbled in RSE's FTP
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Linux-GTK
: P3 normal (vote)
Target Milestone: 2.0.1   Edit
Assignee: Javier Montalvo Orús CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: helpwanted
Depends on: 203500
Blocks:
  Show dependency tree
 
Reported: 2007-04-09 05:16 EDT by Kentaroh Noji CLA
Modified: 2007-10-03 07:35 EDT (History)
6 users (show)

See Also:


Attachments
Screenshot of IDBCS file name input (110.28 KB, image/png)
2007-04-09 05:17 EDT, Kentaroh Noji CLA
no flags Details
Result of DBCS file name input (118.99 KB, image/png)
2007-04-09 05:18 EDT, Kentaroh Noji CLA
no flags Details
FTP log (1.19 KB, text/plain)
2007-04-19 07:21 EDT, Kong XiMei CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kentaroh Noji CLA 2007-04-09 05:16:07 EDT
Remote system's DBCS file name or directory name is garbled in RSE's FTP connection. 

Build date: I20070405
OS: RedHat Enterprise Linux V5.0

Steps to recreate problem:
1- Start eclipse in DBCS locale such as ja_JP.UTF-8
2- Open Remote System Explorer perspective
3- Create new connection using FTP
4- Click "Files", click "My Home"
5- Open context menu in "My Home", then create "New" File. Then, input DBCS file name in the File name input field. 
6- The DBCS file name is garbled. Creating "New" DBCS directory results in the same symptom. 

Expected output:
DBCS file name or directory name should be created and displayed correctly.
Comment 1 Kentaroh Noji CLA 2007-04-09 05:17:24 EDT
Created attachment 63244 [details]
Screenshot of IDBCS file name input
Comment 2 Kentaroh Noji CLA 2007-04-09 05:18:22 EDT
Created attachment 63245 [details]
Result of DBCS file name input
Comment 3 Martin Oberhuber CLA 2007-04-10 05:01:33 EDT
Apparently your locale uses UTF-8 which means that full 8-bit characters are needed. Is the FTP protocol supposed to support this? In other words, when you try the same from a commandline or other ftp client, does it work as expected?

Since we cannot test in the ja_JP locale easily, I'd appreciate any help from the community on this one. See the TM and RSE FAQ on the Wiki for information on how to get started with a Workspace where you can modify RSE code yourself and submit patches:

http://wiki.eclipse.org/index.php/TM_and_RSE_FAQ#Working_on_TM_.2F_RSE
Comment 4 Kong XiMei CLA 2007-04-16 23:20:04 EDT
This problem also exists in our GB18030 testing. Thanks!
I tried it from a commandline to create a DBCS directory, it works as expected. 
Comment 5 Martin Oberhuber CLA 2007-04-17 11:37:12 EDT
Can you tell us any publicly available FTP server where you see those issues, such that we can test the issue?

If you have only FTP servers available to you, do you have a chance debugging this yourself? - The TM and RSE FAQ gives hints how to set up a workspace
Comment 6 Javier Montalvo Orús CLA 2007-04-18 06:11:33 EDT
Could you also provide the contents of the console when creating the new file ?
Comment 7 Kong XiMei CLA 2007-04-19 01:44:42 EDT
Sorry for we haven't any publicly available ftp server.
We tried to debugging this, but failed in importing Team Project Sets.
Comment 8 Kong XiMei CLA 2007-04-19 01:56:05 EDT
(In reply to comment #6)
> Could you also provide the contents of the console when creating the new file ?

In zh_CN.UTF-8 locale, you can try to create a new file/directory named with DBCS chars like [unicode 6d4b,8bd5]. Thanks!
Comment 9 Martin Oberhuber CLA 2007-04-19 04:18:33 EDT
For the records, the workaround is in 
   SystemDeferredTreeContentManager v1.3
and should perhaps be removed when the real root cause is fixed.
Comment 10 Javier Montalvo Orús CLA 2007-04-19 05:36:44 EDT
 (In reply to comment #8)
> (In reply to comment #6)
> > Could you also provide the contents of the console when creating the new file
> ?
> In zh_CN.UTF-8 locale, you can try to create a new file/directory named with
> DBCS chars like [unicode 6d4b,8bd5]. Thanks!

I mean, the contents of the Eclipse console containing the FTP commands sent and received. It appears as "FTP log JPNGSA.IBM.COM:0" in your screenshot
It will give an idea of the encoding used to send and retrieve locale chars.

Many thanks !
Comment 11 Javier Montalvo Orús CLA 2007-04-19 05:42:28 EDT
 (In reply to comment #10)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > Could you also provide the contents of the console when creating the new
> file
> > ?
> > In zh_CN.UTF-8 locale, you can try to create a new file/directory named with
> > DBCS chars like [unicode 6d4b,8bd5]. Thanks!
> I mean, the contents of the Eclipse console containing the FTP commands sent and
> received. It appears as "FTP log JPNGSA.IBM.COM:0" in your screenshot
> It will give an idea of the encoding used to send and retrieve locale chars.
> Many thanks !

The last reply was for kennoji's ftp connection, can you (kongxm) provide the same FTP logs when connecting using zh_CN.UTF-8 locale ?

Thanks !
Comment 12 Kong XiMei CLA 2007-04-19 07:20:50 EDT
Please check the attached FTP log. Thanks!
Comment 13 Kong XiMei CLA 2007-04-19 07:21:59 EDT
Created attachment 64294 [details]
FTP log
Comment 14 Javier Montalvo Orús CLA 2007-04-19 10:38:23 EDT
Well, after a bit of investigation it looks like FTP commands are intended to be sent as ISO_8859-1 (ASCII) characters, so unicode is not supported.
It doesn't limit the content of the files transferred and retrieved, only file and folder names.
FTP servers are then expecting 8-bit encoded commands and, although RSE allows providing unicode characters, those are cropped to the first 8bits and sent to the server. 
As a consequence, non-printable characters (`?´) are displayed and sent when unicode is provided, causing a failure on the remote server.

Although it is possible changing the encoding in which commands are sent to the server, I think it may cause confussion, as I couldn't find any unicode FTP server (any suggestion ?) and unicode configured FTP clients won't work with standard FTP servers.
Comment 15 Martin Oberhuber CLA 2007-04-19 11:00:31 EDT
(In reply to comment #14)

I cannot imagine that commands are cropped when sent to the server. Unless otherwise specified, I suppose that Eclipse uses the client's file.encoding (default encoding) in order to encode the 16-bit-unicode-string into an 8-bit-bytearray at some point. UTF-8, for instance, is specifically designed to be kind of "compatible" with 8-bit byte streams. So when a server is capable of receiving 8-bit commands it should be capable of accepting such encodings.

Just seing non-printable characters (`?´) on the client or in a log doesn't mean  necessarily that the server cannot process them properly - it just means that the console, log or whatever you use to watch these characters uses a different encoding. Whether the server fails or not is a different topic -- it could fail, but it could also work.

But anyways, I do not think that FTP is a good transport method for DBCS file and path names. I think that for TM 2.0, we should document that we do not support this - and users who want DBCS file and path names need to use dstore. We could leave this bug open as an enhancement request for post 2.0 -- would that be OK for you, Kentaroh?
Comment 16 Martin Oberhuber CLA 2007-04-19 11:02:23 EDT
(In reply to comment #7)
> We tried to debugging this, but failed in importing Team Project Sets.

I tested the rse-anonymous.psf team project set, and it worked fine for me today. Could you please let me know what failed for you. It's important for me that we enable the community to help out with fixes and patches.

Comment 17 Martin Oberhuber CLA 2007-04-19 11:05:06 EDT
Javier -- is the FTP log also stored in some file or only printed to the console? - Storing it in a file would probably allow seeing the real original 8-bit stream that was sent to the server. It would be much more verbatim than copy&paste from the console, and may help debugging such issues.

I would recommend either
 a) doing a preference setting that allows writing the FTP log into a file, or
 b) provide a "save log as file" command somewhere, or
 c) ALWAYS logging the last connection into a local file, and overwriting it
    when the next FTP connection connects.
Comment 18 Martin Oberhuber CLA 2007-05-30 17:19:41 EDT
Javier what's the status on this?
Comment 19 Javier Montalvo Orús CLA 2007-05-31 07:05:20 EDT
 (In reply to comment #17)
> Javier -- is the FTP log also stored in some file or only printed to the
> console? - Storing it in a file would probably allow seeing the real original
> 8-bit stream that was sent to the server. It would be much more verbatim than
> copy&paste from the console, and may help debugging such issues.
> I would recommend either
> a) doing a preference setting that allows writing the FTP log into a file, or
> b) provide a "save log as file" command somewhere, or
> c) ALWAYS logging the last connection into a local file, and overwriting it
> when the next FTP connection connects.

Those are good suggestions, but I'd postpone their implementation after the Europar release.
For the initial bug, I couldn't find any FTP server supporting unicode, so the files/folder names should be ASCII characters.
Comment 20 David Dykstal CLA 2007-05-31 08:39:25 EDT
Javier --

If DBCS file names are not supported by the FTP protocol then I think this can be closed saying this is a permanent restriction for FTP.
Comment 21 Martin Oberhuber CLA 2007-09-18 17:17:03 EDT
This bug could actually be fixed by the fix for bug 203500.
Comment 22 Martin Oberhuber CLA 2007-10-01 07:57:11 EDT
Bulk update target milestone 2.0.1 -> 3.0
Comment 23 Javier Montalvo Orús CLA 2007-10-01 12:44:29 EDT
This bug has been fixed by fixing bug 203500.
Now files created with UTF-8 encoding are displayed correctly in the RSE tree.