Community
Participate
Working Groups
Given a local windows machine (Cp1252 default encoding) and a typical remote Linux machine (RHEL4, UTF-8 default encoding), characters like Umlaut and other Unicode chars are wrongly displayed in the Terminal. Connect to remote through ssh -> default encoding is used (Cp1252) env | grep -i lang LANG=en_US.UTF-8 When typing "ls" in a shell file names with umlaut or other Unicode special characters (e.g. created from RSE with the connection properties / encoding set to UTF-8) are not displayed correctly. This can be verified in an xterm for instance, which automatically uses the system's default encoding -- create directories with umlaut there (mkdir öäß) then observe the result in the Terminal. Many commercial terminal support setting the encoding to use manually. We should also have such a setting. Implementation is pretty simple, just use Stream encoders like in the SSHHostShell constructor in RSEÖ new BufferedReader(new InputStreamReader(fChannel.getInputStream(), encoding)) Charset cs = Charset.forName(encoding); new PrintWriter(new OutputStreamWriter(outputStream,cs)); Workaround: Use an RSE Command view through SSH, it supports specifying the encoding. -----------Enter bugs above this line----------- TM 2.0.1 Testing installation : eclipse-SDK-3.3.1 (M20070921-1145), cdt-4.0.1, emf-2.3.1 RSE install : Download RSE-2.0.1: RSE-SDK,tests,discovery,terminal,remotecdt java.runtime : Sun 1.6.0_02-b06 os.name: : Windows XP 5.1, Service Pack 1 ------------------------------------------------
Marking Bugday since this should be relatively simple to do, by decorating the Terminal I/O Streams with an encoding conversion stream from the JDK. Plus a little bit of UI for allowing to choose an encoding. I think that being able to choose encodings is very important, and since UI is involved I'd love to see it by M6. In the longer run, we could think about this being an initial implementation of an extensible terminal framework that allows decorating the streams with arbitrary stuff (as proposed in bug 165893).
Created attachment 95599 [details] Patch to provide API for setting Terminal encoding Attached patch adds methods to the VT100TerminalControl to set and get the encoding to use. That's valuable even without a UI, because the RSE Terminal Integration from bug 170910 can use it to set the Encoding (RSE does have its UI for that). I haven't add unit tests for the new methods but I think that should be done - at least to prove the contract of throwing UnsupportedEncodingException, and the ability to switch the encoding even when the Terminal has been created already and is processing characters.
Created attachment 95605 [details] Updated Patch also adding ITerminalViewControl API methods The patch also needs to update ITerminalViewControl interface in order to be complete. Attached patch makes that fix. To test the patch, you can use the RSE Terminal Integration from bug 170910 and add the following in TerminalViewTab.java line 194: try { terminalControl.setEncoding(host.getDefaultEncoding(true)); } catch (UnsupportedEncodingException e) { /* ignore and allow fallback to default encoding */ } In RSE, create remote SSH connection e.g. to Linux, then on the created connection right-click > properties to specify the encoding e.g. UTF-8. Now, with RSE SSH File Service create some files / folders with foreign names (e.g. copy & paste from some Korean or Japanese website). Typing "ls" in the Terminal should correctly show these file names.
Created attachment 95613 [details] Patch fixing one more encoding, and downgrading to J2ME-CDC/Foundation 1.1 Attached patch fixes one more encoding, and downgrades the Terminal to J2ME-CDC/Foundation 1.1
I think the setting of the encoding in the constructor of VT100TerminalControl is strange (you say it cannot fail but you have a backup strategy)... I am also not sure is is safe to change the encoding after the some bytes have been sent over the wire. Are you sure that redecorating the input stream is safe? I would feel more comfortable if the fInputStreamReader of VT100TerminalControl would be final...
As I wrote in the Javadoc, changing encoding after the fact can indeed lose some characters. I think that this cannot be avoided, because one character x can be encoded in an unknown number of bytes b (probably even spanning partial bytes such as an EBCD encoding or 4-to-5 encoding where 4 chars are in 5 bytes for redundancy and error correction. So, at the time encoding is changed, some bytes might still be in a buffer... and even flushing the buffer don't really help. I'm not too concerned about this, if worst case is some garbled characters. If you are worried, it might be possible to restrict changing the encoding to when the terminal is not yet connected (force disconnecting first). Though I'd think it's nore user friendly to allow changing a live connection as well. Some users might have a remote file system with some branches encoded as UTF-8 and others as Shift_JIS. They want to switch frequently. The patch was made as a quick proof of concept to see if it actually works. I agree that not everything is nice. What I'd like for now is the outside-facing-API of setEncoding(), I'm less concerned about how it's implemented internally.
Running out of time for this for M6. I think it's ok to add this API in M7, since it's only additional (non-breaking) API and since all Terminal API is marked provisional (in internal packages) anyways.
Michael I'd appreciate if we could get this applied for M7
Not for M7
Bulk update of target milestone
Created attachment 104087 [details] Updated the patch It patches only the encoding not the downgrading to J2ME-CDC/Foundation 1.1
I applied the patch...
Created attachment 104232 [details] Additional Patch for ITerminalViewControl Just to clarify, the "downgrading to J2ME" is no longer part of the patch because it had been committed separately already. I verified that the patch applied by Michael is exactly what I had originally submitted. I noticed, that in Michael's version of the patch, my changes to ITerminalViewControl (adding #setEncoding(), #getEncoding() methods) are missing. Is this intentional? - Attached is a patch that brings HEAD to the stage that it had with my patch. Apart from that, I approve the change because I had developed and tested it back in April already.
I missed that part accidentally. And I applied your patch.
Created attachment 104235 [details] Additional patch fixing setEncoding() I tested the patch (with the patch from bug 227571), and found it to be buggy -- the new encoding was never set. Attached patch fixes the issue. Please review and commit.
Bugfix committed: [204796] Bugfix for VT100TerminalControl#setEncoding()