Bug 179937 - [api][breaking] BIDI3.3: HCG_ Request a control to change the default encoding for path and file names
Summary: [api][breaking] BIDI3.3: HCG_ Request a control to change the default encodi...
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: 2.0   Edit
Assignee: Kushal Munir CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: api
Depends on: 180202 182787
Blocks: 191601
  Show dependency tree
 
Reported: 2007-03-29 04:44 EDT by Gregory Brodsky CLA
Modified: 2008-08-13 13:20 EDT (History)
6 users (show)

See Also:


Attachments
Minimal Java Program to print file.encoding (2.26 KB, application/force-download)
2007-03-30 10:28 EDT, Martin Oberhuber CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gregory Brodsky CLA 2007-03-29 04:44:25 EDT
BIDI3.3:HCG_  UTF-8 not supported for path and file names

Scenario to recreate:

1. Connect to Linux machine using terminal emulator working in UTF-8 encoding.
    (You can use PUTTY with translation set to UTF-8.  Optionally, run shell application (gnome-terminal or konsole) in UTF-8 locale.)
2. Create file with Hebrew characters in its name
3. Create RSE connection to the Linux machine using either SSH or DStore
4. Look at the file name into Remote Systems view of RSE.

Result:  Hebrew characters look like wrong symbols.

Please notice, file names created with Hebrew ASCII encoding look fine.
The problem appears with both UTF-8 or ASCII Hebrew encoding of a container (Eclipse).  There is no option to customize encoding for connection.
Comment 1 Martin Oberhuber CLA 2007-03-29 12:57:23 EDT
What version of Eclipse and RSE are you using, exactly?

If using dstore, I think that the "file.encoding" property of the VM on your remote computer would determine how file names are interpreted. With JVM's version 1.4 and later, this system property is automatically determined from the locale of your remote system (look for "file.encoding" on http://java.sun.com/j2se/1.4.2/compatibility.html). So it looks for me as if the default locale of your remote system is probably not set up correctly.

Can you run the "env" command on the remote system and attach the output.
Also, what JVM exactly are you using on the remote side.

For ssh, since RSE 2.0M5, the Shell Subsystem has a Property page where you can specify an encoding. This should be good for showing correct characters when typing "ls" in an RSE shell. I'm not sure how such a default encoding would be set for the files tree as well. Perhaps, an encoding control similar to the Shells subsystem is missing for the files subsystem. 

Comment 2 Gregory Brodsky CLA 2007-03-29 13:51:32 EDT
I use RSE 2.0M5 with Eclipse v3.3M4.  JVM in a host is Sun v1.5.0.  I did not change its properties.

Please notice, currently the remote host locale is he_IL, but the same behavior was with locale en_US.UTF-8. 

Like the files tree, SSH Shell works with ASCII even if I set its encoding UTF-8

This is that "env" command returns: 

HOSTNAME=dyn-9-148-242-160.jrs.il.ibm.com
SHELL=/bin/bash
TERM=vt100
HISTSIZE=1000
QTDIR=/usr/lib/qt-3.3
USER=root
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;
32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;3
1:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:
*.png=01;35:*.tif=01;35:
KDEDIR=/usr
MAIL=/var/spool/mail/root
PATH=/opt/jdk1.5.0_10/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/ro
ot/bin
INPUTRC=/etc/inputrc
PWD=/root
LANG=he_IL
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
SHLVL=1
HOME=/root
LOGNAME=root
LESSOPEN=|/usr/bin/lesspipe.sh %s
G_BROKEN_FILENAMES=1
_=/bin/env

Comment 3 Martin Oberhuber CLA 2007-03-30 10:28:21 EDT
Created attachment 62494 [details]
Minimal Java Program to print file.encoding

Can you please run attached Java program on the remote side to see what your JVM thinks its file.encoding is:
   java print_file_encoding/bin/PrintFileEncoding.class

The point is that your remote dstore agent will translate what it sees in the local file system into Java/Unicode, and that is (or should be) what's shipped over the wire through dstore. Thus the client only sees unicode in its files subsystem and should not have to deal with encodings.

Let's focus on dstore on this bug, since this should work in all cases (files and shells).

For ssh shells, I created bug 180202 to track the issue of ssh shell not honoring the encoding (which was kind of already known to me). For ssh files, let's wait for the outcome of fixing dstore.files to know what we can possibly do before we go further.
Comment 4 Martin Oberhuber CLA 2007-04-02 06:53:34 EDT
From an E-Mail by Gregory:
I moved to M5eh version of Eclipse (targeted by a link in RSE v2M5 download page) and did not find any differences with encoding problems that I found. 
I found that there is the same problem even with Local (i.e. a local file system).

Please continue investigating this with the "Local" subsystem:

* Given a hebrew locale, create some file and path names.
  - Do they look OK in the Eclipse Project Explorer / Project Navigator?
  - Do they look OK in the RSE "Local" file system?
  - Launch an RSE dstore-windows server locally and connect to it. Do the files
    look ok under dstore-local?

Once these 3 cases are fixed, we can start looking at dstore-remote and ssh.
Comment 5 Gregory Brodsky CLA 2007-04-10 06:03:54 EDT
I don't know how to create file/path names with UTF-8 on Windows.  Even if such possibility exists, I don't believe anyone uses it in our country. Hence I test file/path names with ASCII only on Windows.

Answers to the previous questions:

* Given a hebrew locale, create some file and path names.

  - Do they look OK in the Eclipse Project Explorer / Project Navigator?
Yes (if I create some project with Hebrew name, it creates corresponded directory with Hebrew name and it looks correctly).
  - Do they look OK in the RSE "Local" file system?
Yes, and no any problem is detected with "Local" file system.
  - Launch an RSE dstore-windows server locally and connect to it. Do the files
    look ok under dstore-local?
Paths (in ASCII) look correctly. ASCII file looks correctly, UTF-8 file is not. Coomment will be added to a defect 179939 (which addresses file contents, this one addresses file names).

Yes, it looks like a property "file encoding" is just ignored in a current version. However, please notice - I do not fins any property to specify path names encoding of a connection, and I believe it should be.
Comment 6 Kushal Munir CLA 2007-04-11 09:40:22 EDT
Gregory, can you please run java print_file_encoding/bin/PrintFileEncoding.class and let us know what the encoding is?

>> The point is that your remote dstore agent will translate what it sees in the
local file system into Java/Unicode, and that is (or should be) what's shipped
over the wire through dstore. Thus the client only sees unicode in its files
subsystem and should not have to deal with encodings.

We use standard Java APIs such as the File class to get the list of files in a directory, create new files, etc. We should not have to translate to encodings since those APIs should already have done the necessary conversion. What you are suggesting will mean that RSE should behave like an emulator. Note that when you change the encoding in an emulator or shell, it only changes how a certain byte (or bytes) LOOK to the user. The byte values mean something else in the default encoding of the platform. A good test would be to try running Eclipse on the machine and create a project linked to the path you create through PuTTY or shell, and see if they come up ok.
Comment 7 Gregory Brodsky CLA 2007-04-12 12:17:30 EDT
I use JDK 1.5. there is no such a file.  JVM property file.encoding of one of the machines is "ISO-8859-8", second is "UTF-8".  I repeated the test with M6.  With DStore, ASCII works correctly with ASCII box and UTF-8 is not;  UTF-8 box works with UTF-8 but not with ASCII.

However from my experience it's enough common that people connect to UTF-8 Linux box using ASCII telnet, for example.  

This problem is not Hebrew-specific.  For example, in Russia two 8-bit encodings are widely used: ASCII Cp1251 and KOI-8 (introduced by a Russian gov't body).  I assume it's very possible that people use both of them in the same Unix/Linux machine, so they might need RSE working with Cp1251, KOI-8 and UTF-8 in the same time.  

I modified daemon.pl on my ASCII box and added -Dfile.encoding=UTF-8 in JVM command lines.  Now, it works correctly with UTF-8 but not with ASCII.  So, as workaround it is possible to modify the script in order to accept the encoding as a parameter, and to start different DStore servers listening to different ports on the same server. 

But I don't think it's comfortable solution for customers who work with different encodings - to start different DStore servers with different encodings and to define different connections in their workspace for the same host. 

Please also notice, all this is about DStore.  With SSH, there is the same problem on both of ASCII and UTF-8 boxes: file name in ASCII appears correctly, in UTF-8 - is not.
Comment 8 Martin Oberhuber CLA 2007-04-12 12:45:34 EDT
OK, let me summarize to verify I got this correct:

* The problem is not setting a specific encoding, it is supporting multiple
  encodings for file and path names in different parts of the file system 
  or for different users.

* On the Local file system, Eclipse Platform does not support this 
  (-Dfile.encoding System Property is always used).

* On dstore-remote you can workaround this by running multiple different dstore
  servers with different -Dfile.encoding settings. Otherwise the default 
  file.encoding is used or computed out of the locale by the JVM.

* On ssh-remote, the encoding cannot be set.

I'm wondering if this is really a severity==blocker? - It seems that you've found a workaround yourself so you can continue testing. For users, I'd consider it an enhancement request to support different encodings for the file and path names (which may also be different than the default file.encoding!) for different connections, rather than a bug. Especially since the Platform also does not support it.

In terms of implementation it might be a problem if the Shell uses encoding X but the files tree uses encoding Y (because it is the remote file.encoding default). Then files from "ls" etc. cannot be matched against files from the files tree. We would need to do the following:
  1.) dstore server must transmit the remote's file.encoding setting back to
      the client so that we know what encoding has been used. This is the 
      default encoding.
  2.) For files subsystem, provide an encoding dropdown to modify it just like
      for the shells subsystem. By default, it is the encoding transmitted 
      from the remote.
  3.) If user changes the encoding, for every filename:
      - RSE uses StreamEncoder to encode Unicode back into original bytestream
      - And another StreamEncoder to transform back into Unicode with 
        the user's chosen encoding
The question for me is, can it happen that the pathname encoding differs from the file contents encoding? If yes, then we'll need TWO dropdowns in the files subsystem, one for pathnames and the other for contents. This can be confusing.

The other question is: What if different branches in the file system of the
same connection use different encodings? Do we want to be able to set this or
is it sufficient to have the connection granularity? - Terminals also have
connection granularity.
Comment 9 Martin Oberhuber CLA 2007-04-12 12:48:37 EDT
Proposed Severity: Enhancement
Proposed Summary: Request selectable encoding on file subsystem for path names

I guess the same would then also hold true for process names on the processes subsystem? Perhaps the path name encoding should be a common property shared by the connector service or the host, or it would need to be kept in sync between files, processes, shells.
Comment 10 Martin Oberhuber CLA 2007-04-16 16:29:42 EDT
Considering this again, I think the right solution is as follows:

* There is a single setting "Default Encoding" which is used for file and
  path names, shells and process names. It is important to have the encoding
  consistent between the Shell, files tree and process list. Therefore, this 
  setting should be 
  - stored with the IHost
  - configured in the Property Page for IHost
  - it could also be a setting in the 1st page of the Wizard

* The default setting of this new property is "(default from remote)".

* The dstore daemon is extended to report back the remote default 
  file.encoding to the client.

* Optionally, each remote shell instance could override this default setting
  (In the shell's property page, as opposed to the shell subsystem property 
  page where it is now).
  But I'm not sure if this is a good idea, since it would make that shell 
  inconsistent with the files tree, so content assist would not work properly
  any more. Users should be directed to creating a new connection for those
  parts of the remote files tree that are in a different encoding.

Essentially, I'm proposing to move the "Shell Encoding" Property page, which is currently available on the "Shells Subsystem" node's property pages, to a host-wide setting which applies to shells, file and path names. This property page would then be one and the same for all subsystems, but also available from each subsystem. This solution is similar to terminal programs that allow changing the default encoding -- with the one difference that we have a kind of "superterminal" which also supports a files tree etc. through its connection. The reason for this proposal is that files which are created or listed through the shell need to be shown consistently in the files tree as well.

I'm not sure if we can fully implement this solution by 2.0 -- but the one thing we should definitely do for 2.0 is extend the dstore agent to return the default encoding. Since knowing the remote default is vital for debugging, customer support, and also any change of encoding on the client side.

Comments?
Comment 11 Gregory Brodsky CLA 2007-04-17 08:11:19 EDT
Responses to recent comments:

1. Agreed to change the severity to major.  There is still no working workaround for SSH.

2. >Can it happen that the pathname encoding differs from the file contents encoding?
Yes, it happens.
As far as I know, most of people don't rely on any terminal emulator and prefer to avoid creating non-Latin names for path and file names. So it's not very common problem. However it happens, so it could be nice if you will support such a case.  

It's great that DStore server can be extended to report its file.encoding. For "near  future", I would recommend to create Encoding property of connection which will be used for shells, processes and pathnames - to have them translated from the server encoding.  If a customer will need to work with file system branches in different encodings, he will need to create different RSE connections to the same host. I hope it could be implemented in this release.

In future releases, it could be nice to have Encoding property for each process, shell, directory name and so on. I don't know how much customers need that, though. 

Comment 12 Martin Oberhuber CLA 2007-04-17 12:58:43 EDT
(In reply to comment #11)
> 1. Agreed to change the severity to major.  There is still no working
> workaround for SSH.

SSH is tracked separately in bug 180202. Feel free to set severity "Major" there if you think it is a major loss of functionality.

> 2. >Can it happen that the pathname encoding differs from the file contents
> encoding? Yes, it happens.

This means we need to have a separate encoding setting for each file/folder to be honored to the editor.

> As far as I know, most of people don't rely on any terminal emulator and
> prefer to avoid creating non-Latin names for path and file names. So it's
> not very common problem. However it happens, so it could be nice if you will
> support such a case.  

But this is exactly what we are dealing with in this bug. Following what you
say, it looks like this bug should really have severity "Enhancement".

> I hope it could be implemented in this release.

Kushal will need to assess if he has time to implement the suggestion.

> In future releases, it could be nice to have Encoding property for each
> process, shell, directory name and so on. I don't know how much customers 
> need that, though. 

I strongly disagree here. Having different encodings for shell and filename leads to the following inconsistencies:
  * User types "ls" and gets names listed with encoding A
  * dbl clicking on a file name tries to download through the files subsystem
    using encoding B --> file cannot be opened in editor
  * content assist fails because current directly as known by the SHell is 
    different than the same path known to the files tree
  * typing "ps" in the shell shows process names which can not be related
    to the same processes in the Processes Subsystem
...

Because of this, I would be VERY careful with allowing a different encoding for a particular shell. It would be kind of "expert mode" for those who know what they are doing. It's much better having multiple different connections for the different encodings, if this should really be required.

- Setting severity to "Enhancement" based on the assessment above.
- Changing summary accordingly 
  (previous value: UTF-8 not supported for path and file names)
- Keeping Target Milestone M7 for dstore server to report remote encoding.
  We might want to track that in a separate bug, if any API changes are
  required, in order to have proper pointers for documentation.
Comment 13 Martin Oberhuber CLA 2007-04-17 13:04:09 EDT
Tracking the required dstore reporting file.encoding issue in bug 182787 now.
Kushal - feel free to change target milestone to "---" if you don't think you can do this for 2.0.
Comment 14 Martin Oberhuber CLA 2007-04-17 13:08:56 EDT
Condition of Satisfaction for this bug:
* "Default Encoding" control from the Shells Subsystem page is moved to 
  the Host page
* Shells still consider the default encoding
* FileServiceSubsystem considers this default encoding when doing queries into
  the IFileService. If known "remote encoding" differs from chosen encoding,
  Results from the Service are re-coded. The subsystem always deals with Unicode
  which is assumed to be the correct representation of what there is on the 
  remote side.
* Changing the default encoding triggers a refresh of the entire systemView.
Comment 15 Martin Oberhuber CLA 2007-04-19 11:25:14 EDT
Here are two interesting pointers for BIDI:

http://help.eclipse.org/help32/topic/org.eclipse.platform.doc.isv/reference/misc/bidi.html

http://help.eclipse.org/help32/topic/org.eclipse.platform.doc.isv/reference/api/org/eclipse/osgi/util/TextProcessor.html

Especially looking at the second, it seems to me that getting it right for file and directory trees is not trivial in a BIDI locale. Without being able to test BIDI, we may not be able to get it done properly. So after we get the default encoding fixed for file and path names (at some point) we may need some help working on the right-to-left issues with TextProcessor.
Comment 16 Kushal Munir CLA 2007-05-08 10:59:52 EDT
Before completing this, there are a couple of questions:

1. The encoding control will be added to the Host property page of the connection. Is that ok?
2. For results of queries, I think the encoding value, if it is different from the   platform encoding of the remote system, should only be used in the adapter for paths and names. This encoding setting is really an emulator type setting, e.g. for interpreting the output of queries. The actual names of files and paths should still be interpreted from the default encoding to Unicode. Only for viewing should this encoding setting apply, i.e. in the adapter.
Comment 17 Martin Oberhuber CLA 2007-05-09 06:53:53 EDT
I think that the encoding control will be fine on the Host property page.

I think that the default value of the encoding control should be "default 
from remote" so no change is made in the client. In case a change is 
requested, I think the change should be reflected in the model and not just the UI adapters. This is because the model uses Unicode everywhere, and programmatic access to the model (like searching a directory and comparing it with the contents of a list of files in a text file) must be able to expect seing the "real world" and not just some oddly encoded stuff.

Decoding the remote stuff properly for the model has some implications:

1. When the encoding is modified, all the model must be re-loaded. 
   Essentially, a disconnect / reconnect sequence is required. Because of 
   that, I'd recommend having the Encoding control disabled when any subsystem
   is connected. A note beside it should say "This setting can only be changed
   when no subsystem is connected".

2. In dstore, I'm not sure how the DataElements are shared with the remote 
   side. I assume that having a client-side file name with a different 
   decoded contents than on the remote side would be problematic. Because of
   that, I would propose that for dstore, when the "default encoding" 
   control is changed by the user, a command should be sent to the dstore 
   server which effectively changes the file.encoding property on the server.

   Note that if the suggestion (1) is also implemented, the server does not
   need to restore a modified encoding or ever remember it, because when all
   subsystems disconnect from the server, the server terminates anyways.

I think that implementing these two suggestions, we should have a pretty simple fix for this bug according to comment #7, since it is equivalent to running multiple dstore servers with different file.encoding settings.

Note that this suggestion intentionally does not deal with the case where different sections of the file tree have different encodings. I do not think we can ever address this properly since in many cases the context is not known (in what section of the file system we are). Therefore, users with multiple different encodings on the remote side will have to create multiple connections to the remote (with different encodings) and have filters in each connection that reveal those sections of the file system that they want to see.

For ssh, I think I can address the issue with recoding as soon as the control is added to the Host properties.
Comment 18 Kushal Munir CLA 2007-05-15 11:56:12 EDT
I agree with suggestion #1. The recodings can be done by the service for SSH and FTP. For dstore, the daemon has to start the JVM with the appropriate encoding. The server script also has to accept an option to take in the encoding so it can start RSE server with that property.

I need to have go ahead for two API changes:
- Two methods in IHost, setEncoding(String encoding) and getEncoding() to set and query the encoding of the host (that is either the default or set by the user). Class Host will implement the two methods, so the impact will be minimal. This is needed for services, or some other extender, to query the encoding set by the user and to be able to set it programatically.
 
- In SystemConnectionForm, I'd like to add a method addEncodingFields() to specify that fields for encodings should be included. This form is used in different dialogs/property pages, some of which may want to have this option while others will not. By default, encoding fields will not be added by this form. Note that SystemConnectionForm is used in the connection property page where we want the encoding field.
Comment 19 David Dykstal CLA 2007-05-15 12:26:17 EDT
(In reply to comment #18)
> I agree with suggestion #1. The recodings can be done by the service for SSH
> and FTP. For dstore, the daemon has to start the JVM with the appropriate
> encoding. The server script also has to accept an option to take in the
> encoding so it can start RSE server with that property.
> 
> I need to have go ahead for two API changes:
> - Two methods in IHost, setEncoding(String encoding) and getEncoding() to set
> and query the encoding of the host (that is either the default or set by the
> user). Class Host will implement the two methods, so the impact will be
> minimal. This is needed for services, or some other extender, to query the
> encoding set by the user and to be able to set it programatically.
> 
> - In SystemConnectionForm, I'd like to add a method addEncodingFields() to
> specify that fields for encodings should be included. This form is used in
> different dialogs/property pages, some of which may want to have this option
> while others will not. By default, encoding fields will not be added by this
> form. Note that SystemConnectionForm is used in the connection property page
> where we want the encoding field.
> 

It sounds like there may be UI changes, please keep the number of new strings as small as possible. We can handle only a very *SMALL* number of new strings. Please be sure to attach the names of the changed translatable files to this bug report.
Comment 20 Martin Oberhuber CLA 2007-05-15 12:43:24 EDT
(In reply to comment #18)
> encoding. The server script also has to accept an option to take in the
> encoding so it can start RSE server with that property.

If possible, I'd like to do this without changing the server launcher but via a DStore command that ends up doing 
  System.setProperty("file.encoding", newEncoding);
because this is more dynamic when the user wants to change the encoding while still connected. See also
http://forum.java.sun.com/thread.jspa?threadID=662179&tstart=0

> I need to have go ahead for two API changes:
> - Two methods in IHost, setEncoding(String encoding) and getEncoding() to set

I'd prefer seeing getDefaultEncoding() and setDefaultEncoding()
Please also specify how you'd handle the "default from remote" special case (would you use null or "" or any other special value; would the connectorService set it as soon as connected)

> - In SystemConnectionForm, I'd like to add a method addEncodingFields() to
> specify that fields for encodings should be included. This form is used in
> different dialogs/property pages, some of which may want to have this option
> while others will not. By default, encoding fields will not be added by this
> form. Note that SystemConnectionForm is used in the connection property page
> where we want the encoding field.

What would the signature of addEncodingFields() be exactly, and how would it be used?

Comment 21 Martin Oberhuber CLA 2007-05-15 12:48:08 EDT
Since you'll need new API, can you do this for M7?
Comment 22 Kushal Munir CLA 2007-05-17 20:51:54 EDT
Added the control and enabled it only if subsystems are not connected. The value in this control is now being used for file contents. Shell implementation will follow after M7.

The API changes are as follows:

In IHost, added two methods for encodings with parameters to distinguish encodings that are obtained from remote queries versus encodings set by other means (e.g. user specified). Please see appropriate javadoc for them:

* public String getDefaultEncoding(boolean checkRemote);
* public void setDefaultEncoding(String encoding, boolean fromRemote);

In SystemConnectionForm, three methods were added. These methods are for adding the encoding fields for that form, and to query the selections of the form so that the property page can use them. Javadoc is provided for them as well:
* public void addDefaultEncodingFields()
* public String getDefaultEncoding()
* public boolean isEncodingRemoteDefault()

There were two NL files that had new strings added:
org.eclipse.rse.ui/UI/org/eclipse/rse/internal/ui/SystemResources.properties
org.eclipse.rse.ui/systemmessages.xml

The number of new strings are small.
Comment 23 Martin Oberhuber CLA 2007-05-18 07:28:26 EDT
(In reply to comment #22)
I reviewed the change, and I have the following remarks:

1. The new UI Control added
   - The group should span the entire width of the Preference Page
   - The note should read "Note: This setting can only be changed..." 
     with the String "Note:" in boldface
   - The radiobuttons should be visually disabled when connected
   See the Eclipse Preferences:General screen for example, or the Eclipse
   Preferences:General:Editors screen, or Preferences:General:Network Conn.

2. At first sight it seemed odd for me that the remote default encoding is
   stored persistently. It seemed more correct to get it fresh from the 
   remote side every time it is queried. But on the other hand, storing it
   persistently does have an advantage (see 3, below)

3. The UI Control "Default from remote" should indicate the current or last
   encoding queried from the remote side, if query was possible. For instance:
   * Never connected --> Show "Default from Remote"
   * Currently connected, 
     * query was possible --> Show "Default from Remote (UTF-8)"
     * query was not possible --> Show "Default from Remote (unknown)"
   * After disconnect, if Remote encoding was known during connect
     --> Show "Default from Remote (was UTF-8 last time)"

4. The UI Control for Encodings on the Shell Subsystem Property Page should
   either be removed (and replaced by a Note: Encoding of shells can be 
   changed on the Host Property Page) or changed to reference the Host
   default encoding. Personally, I'd vote for removing the "Shell Settings"
   Property page entirely, in order to not confuse users with two encoding
   settings where only one is applicable or valid.

5. I think I'd like to see the Default Encoding control added to the first
   page in the New Connection Wizard as well, since some users may want to
   change it already when creating the connection. Or, add a Note label saying
   "Note: Default encoding of the remote connection can be changed in the Host
   Properties once the connection is created".

6. In the FilePropertyPage, the "File Encoding" group should reference the
   default encoding acquired from the IHost, rather than the local system 
   default encoding. Again, the group should span the entire width of the 
   Property Sheet.

The rest of the changes looks good, especially the API changes look good. Javadoc in IHost should be improved to indicate the following:
  * setDefaultEncoding() is only valid when no subsystem is connected
    - The text "by querying..." is misleading since the method would never
      go and run any queries itself. The client needs to get the remote
      encoding somehow, and then push it into the method.
  * getDefaultEncoding(boolean fromRemote) is better than checkRemote, 
    because the method does not actually go and check the remote side --
    it cannot do so because it has no concept of how to connect.

I'm not sure what to do with API in RemoteCmdSubSystem#getShellEncoding() and
RemoteCmdSubSystem#setShellEncoding. Since I cannot see how it would ever make
sense to change the shell subsystem's encoding such that it is different than
the hosts's global encoding, I'd vote for getting rid of these. For M7, I deprecated them such that we can remove them for RC1 without bad conscience.

In order for the change to work for Shells, ShellServiceSubSystem#internalLaunchShell() must pass the encoding acquired from the IHost into the launched shell (actually, I'm wondering very much how encodings for shells could ever have worked before)?    
Comment 24 Martin Oberhuber CLA 2007-05-18 07:53:49 EDT
(In reply to comment #23)
>    - The group should span the entire width of the Preference Page

Noticed that on the current "Shell Settings" Property page which you get in the Properties Dialog of a Shell subsystem, the Encoding group does span the entire width of the dialog. So you might be able to copy&paste the code from there.
Comment 25 Kushal Munir CLA 2007-05-24 23:04:05 EDT
Updated as per suggestions.

1. For the host "Get from remote system" label, there are only two states, either the encoding is set or it is not. Two new strings needed in org.eclipse.rse.ui/UI/org/eclipse/rse/internal/ui/SystemResources.properties:
RESID_HOST_ENCODING_SETTING_NOTE=Note:
RESID_HOST_ENCODING_REMOTE_ENCODING_LABEL=Default from remote system (%1)

2. Shell subsystem property page removed, but I don't think this ever worked! The settings from that property page do not seem to be used anywhere.

3. This needs to be fixed in RC2 as well as the encoding control added to the New Connection wizard.
Comment 26 Martin Oberhuber CLA 2007-05-25 03:46:47 EDT
Given that we're so late in the game already, perhaps we should keep the control on the property page only for now and defer adding it to the wizard to a later release.

We've had many bugs with the wizard incorrectly initializing properties in the past so adding this to the wizard might not be entirely trivial. Let's discuss this at the committer meeting.
Comment 27 Martin Oberhuber CLA 2007-05-25 07:04:34 EDT
Marking as breaking API change, since the following public methods are removed:
  IRemoteCmdSubSystem.getShellEncoding()
  RemoteCmdSubSystem.setShellEncodign(String)

These methods had been deprecated before, and are now replaced by the following:
  IHost.getDefaultEncoding(boolean)
  IHost.setDefaultEncoding(String, boolean)

Migration docs:
---------------
1. Compile errors due to the missing  method can be fixed by replacing with
   ss.getHost().getShellEncoding(true);
2. Implementations of the methods in subclasses of IRemoteCmdSubSystem 
   should be removed.
Comment 28 Martin Oberhuber CLA 2007-05-30 11:50:53 EDT
Do I get this right: the control has been added already, so what's missing that you'd like to defer this to the Future? 

This bug just asks for the Control. Shouldn't we try to get the missing things done for 2.0, or file a new bug (i.e. clone this one) in order to ask for things still missing if we cannot do it for 2.0?
Comment 29 Kushal Munir CLA 2007-05-30 14:18:56 EDT
Martin, applying the host encoding in our path names is what is missing. That  needs to be deferred to the future. Should I open a separate defect for it?
Comment 30 Martin Oberhuber CLA 2007-05-30 14:24:53 EDT
Yes, please.
So, right now the Encoding control applies to the Dstore shells, right? Or is it also taken as the default encoding for remote files?
Comment 31 Kushal Munir CLA 2007-06-07 18:04:31 EDT
The encoding control does not apply to shells. Shells don't seem to ever have supported encodings even though there was a property page that allowed users to enter encodings. It was not being used anywhere. I filed bug 191599 for this problem.

Bug 191601 is filed for file and folder paths to use the encoding contol value.

I'm marking thois bug as fixed in RC2.
Comment 32 Martin Oberhuber CLA 2008-08-13 13:20:30 EDT
[target cleanup] 2.0 RC2 was the original target milestone for this bug