204796 – [terminal][api] Terminal should allow setting the encoding to use

Bug 204796 - [terminal][api] Terminal should allow setting the encoding to use

Summary: [terminal][api] Terminal should allow setting the encoding to use

Status:	RESOLVED FIXED

Alias:	None

Product:	Target Management
Classification:	Tools
Component:	Terminal (show other bugs)
Version:	2.0.1
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	3.0 RC4
Assignee:	Martin Oberhuber
QA Contact:	Martin Oberhuber

URL:
Whiteboard:
Keywords:	api

Depends on:
Blocks:	227571
	Show dependency tree

Reported:	2007-09-27 08:12 EDT by Martin Oberhuber
Modified:	2008-06-09 20:46 EDT (History)
CC List:	0 users

See Also:

Flags:	mober.at+eclipse: pmc_approved+ mober.at+eclipse: review? (eclipse)

Attachments
Patch to provide API for setting Terminal encoding (15.88 KB, patch) 2008-04-10 18:00 EDT, Martin Oberhuber	no flags	Details \| Diff
Updated Patch also adding ITerminalViewControl API methods (18.02 KB, patch) 2008-04-10 18:25 EDT, Martin Oberhuber	no flags	Details \| Diff
Patch fixing one more encoding, and downgrading to J2ME-CDC/Foundation 1.1 (27.93 KB, patch) 2008-04-10 20:41 EDT, Martin Oberhuber	no flags	Details \| Diff
Updated the patch (14.68 KB, patch) 2008-06-07 11:00 EDT, Michael Scharf	no flags	Details \| Diff
Additional Patch for ITerminalViewControl (2.16 KB, patch) 2008-06-09 19:17 EDT, Martin Oberhuber	no flags	Details \| Diff
Additional patch fixing setEncoding() (1.03 KB, patch) 2008-06-09 20:43 EDT, Martin Oberhuber	no flags	Details \| Diff
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Oberhuber

2007-09-27 08:12:18 EDT

Given a local windows machine (Cp1252 default encoding) and a typical remote
Linux machine (RHEL4, UTF-8 default encoding), characters like Umlaut and other Unicode chars are wrongly displayed in the Terminal.

Connect to remote through ssh -> default encoding is used (Cp1252)
env | grep -i lang
LANG=en_US.UTF-8

When typing "ls" in a shell file names with umlaut or other Unicode special characters (e.g. created from RSE with the connection properties / encoding set to UTF-8) are not displayed correctly. This can be verified in an xterm for instance, which automatically uses the system's default encoding -- create directories with umlaut there (mkdir öäß) then observe the result in the Terminal.

Many commercial terminal support setting the encoding to use manually. We should also have such a setting. Implementation is pretty simple, just use Stream encoders like in the SSHHostShell constructor in RSEÖ

new BufferedReader(new InputStreamReader(fChannel.getInputStream(), encoding))
Charset cs = Charset.forName(encoding);
new PrintWriter(new OutputStreamWriter(outputStream,cs));

Workaround: Use an RSE Command view through SSH, it supports specifying the encoding.

-----------Enter bugs above this line-----------
TM 2.0.1 Testing
installation : eclipse-SDK-3.3.1 (M20070921-1145), cdt-4.0.1, emf-2.3.1
RSE install  : Download RSE-2.0.1: RSE-SDK,tests,discovery,terminal,remotecdt
java.runtime : Sun 1.6.0_02-b06
os.name:     : Windows XP 5.1, Service Pack 1
------------------------------------------------

Comment 1 Martin Oberhuber

2008-03-16 06:09:44 EDT

Marking Bugday since this should be relatively simple to do, by decorating the Terminal I/O Streams with an encoding conversion stream from the JDK. Plus a little bit of UI for allowing to choose an encoding.

I think that being able to choose encodings is very important, and since UI is involved I'd love to see it by M6.

In the longer run, we could think about this being an initial implementation of an extensible terminal framework that allows decorating the streams with arbitrary stuff (as proposed in bug 165893).

Comment 2 Martin Oberhuber

2008-04-10 18:00:07 EDT

Created attachment 95599 [details]
Patch to provide API for setting Terminal encoding

Attached patch adds methods to the VT100TerminalControl to set and get the encoding to use. That's valuable even without a UI, because the RSE Terminal Integration from bug 170910 can use it to set the Encoding (RSE does have its UI for that).

I haven't add unit tests for the new methods but I think that should be done - at least to prove the contract of throwing UnsupportedEncodingException, and the ability to switch the encoding even when the Terminal has been created already and is processing characters.

Comment 3 Martin Oberhuber

2008-04-10 18:25:17 EDT

Created attachment 95605 [details]
Updated Patch also adding ITerminalViewControl API methods

The patch also needs to update ITerminalViewControl interface in order to be complete. Attached patch makes that fix.

To test the patch, you can use the RSE Terminal Integration from bug 170910 and add the following in TerminalViewTab.java line 194:

try {
    terminalControl.setEncoding(host.getDefaultEncoding(true));
} catch (UnsupportedEncodingException e) {
    /* ignore and allow fallback to default encoding */
}

In RSE, create remote SSH connection e.g. to Linux, then on the created connection right-click > properties to specify the encoding e.g. UTF-8. Now, with RSE SSH File Service create some files / folders with foreign names (e.g. copy & paste from some Korean or Japanese website). Typing "ls" in the Terminal should correctly show these file names.

Comment 4 Martin Oberhuber

2008-04-10 20:41:58 EDT

Created attachment 95613 [details]
Patch fixing one more encoding, and downgrading to J2ME-CDC/Foundation 1.1

Attached patch fixes one more encoding, and downgrades the Terminal to J2ME-CDC/Foundation 1.1

Comment 5 Michael Scharf

2008-04-10 21:06:51 EDT

I think the setting of the encoding in the constructor of VT100TerminalControl is strange (you say it cannot fail but you have a backup strategy)...

I am also not sure is is safe to change the encoding after the some bytes have been sent over the wire. Are you sure that redecorating the input stream is safe? I would feel more comfortable if the fInputStreamReader of VT100TerminalControl would be final...

Comment 6 Martin Oberhuber

2008-04-10 21:19:01 EDT

As I wrote in the Javadoc, changing encoding after the fact can indeed lose some characters. I think that this cannot be avoided, because one character x can be encoded in an unknown number of bytes b (probably even spanning partial bytes such as an EBCD encoding or 4-to-5 encoding where 4 chars are in 5 bytes for redundancy and error correction. So, at the time encoding is changed, some bytes might still be in a buffer... and even flushing the buffer don't really help.

I'm not too concerned about this, if worst case is some garbled characters.
If you are worried, it might be possible to restrict changing the encoding to when the terminal is not yet connected (force disconnecting first). Though I'd think it's nore user friendly to allow changing a live connection as well. Some users might have a remote file system with some branches encoded as UTF-8 and others as Shift_JIS. They want to switch frequently.

The patch was made as a quick proof of concept to see if it actually works. I agree that not everything is nice. What I'd like for now is the outside-facing-API of setEncoding(), I'm less concerned about how it's implemented internally.

Comment 7 Martin Oberhuber

2008-04-11 07:10:11 EDT

Running out of time for this for M6. I think it's ok to add this API in M7, since it's only additional (non-breaking) API and since all Terminal API is marked provisional (in internal packages) anyways.

Comment 8 Martin Oberhuber

2008-05-05 17:50:02 EDT

Michael I'd appreciate if we could get this applied for M7

Comment 9 Martin Oberhuber

2008-05-07 05:13:41 EDT

Not for M7

Comment 10 Martin Oberhuber

2008-05-20 17:45:03 EDT

Bulk update of target milestone

Comment 11 Michael Scharf

2008-06-07 11:00:38 EDT

Created attachment 104087 [details]
Updated the patch

It patches only the encoding not the downgrading to J2ME-CDC/Foundation 1.1

Comment 12 Michael Scharf

2008-06-09 10:31:56 EDT

I applied the patch...

Comment 13 Martin Oberhuber

2008-06-09 19:17:38 EDT

Created attachment 104232 [details]
Additional Patch for ITerminalViewControl

Just to clarify, the "downgrading to J2ME" is no longer part of the patch because it had been committed separately already. I verified that the patch applied by Michael is exactly what I had originally submitted.

I noticed, that in Michael's version of the patch, my changes to ITerminalViewControl (adding #setEncoding(), #getEncoding() methods) are missing. Is this intentional? - Attached is a patch that brings HEAD to the stage that it had with my patch.

Apart from that, I approve the change because I had developed and tested it back in April already.

Comment 14 Michael Scharf

2008-06-09 19:39:51 EDT

I missed that part accidentally. And I applied your patch.

Comment 15 Martin Oberhuber

2008-06-09 20:43:15 EDT

Created attachment 104235 [details]
Additional patch fixing setEncoding()

I tested the patch (with the patch from bug 227571), and found it to be buggy -- the new encoding was never set. Attached patch fixes the issue. Please review and commit.

Comment 16 Martin Oberhuber

2008-06-09 20:46:29 EDT

Bugfix committed:
[204796] Bugfix for VT100TerminalControl#setEncoding()