Bug 32539 - [CVS Core] Charset for CVS log is invalid
Summary: [CVS Core] Charset for CVS log is invalid
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Team (show other bugs)
Version: 2.1   Edit
Hardware: PC All
: P1 enhancement with 21 votes (vote)
Target Milestone: 3.0 M9   Edit
Assignee: Michael Valenta CLA
QA Contact:
URL:
Whiteboard:
Keywords: investigate
: 39321 63188 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-02-22 00:18 EST by Takashi Okamoto CLA
Modified: 2004-05-20 09:32 EDT (History)
5 users (show)

See Also:


Attachments
set charset in preferences.ini (4.57 KB, patch)
2003-03-08 19:38 EST, Takashi Okamoto CLA
no flags Details | Diff
patch to configuration encoding with CVS UI. (6.26 KB, patch)
2004-03-19 18:57 EST, Takashi Okamoto CLA
no flags Details | Diff
patch to configure encoding with CVS UI (2) (9.07 KB, patch)
2004-03-19 18:59 EST, Takashi Okamoto CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Takashi Okamoto CLA 2003-02-22 00:18:25 EST
This is a internationalization proble. I write about Japanse but some of other
language may have same problem.

When I use CVS reposity with Windows and Linux in Japanese. CVS message is sent
with EUC-JP to pserver in Linux environment. But in Windows it is
Windows-31J(MS932).This problem bring broken characters in Windows and Linux
mixed environment.

I investigated the cause and found problem is in Connection.java:


 public String readLine() throws CVSException {
...    
    String result = new String(readLineBuffer, 0, index);

....
 public void write(String s) throws CVSException {
          write(s.getBytes(), false); 
 }
....
 public void writeLine(String s) throws CVSException {
          write(s.getBytes(), true);
 } 

Above code doesn't specify charset and use default charset. I want to set
charset by plugin.properties here. It will resolve charset conflict between
Windows and Linux.

Please fix this problem.

regards,

Takashi Okamoto
Comment 1 Kevin McGuire CLA 2003-02-23 00:40:47 EST
That is correct, we use the platform character set (as does Eclipse in general).

We'd need to consider the ramifications of using something different.  Michael, 
comments?
Comment 2 Takashi Okamoto CLA 2003-03-08 19:38:04 EST
Created attachment 3932 [details]
set charset in preferences.ini

I attached my patch to set charset at preferences.ini.I can't help for	GUI but
I would like to set charset in cvs dialog for each repositoly.
Comment 3 Michael Valenta CLA 2003-03-09 21:35:55 EST
This sounds interesting and is worth investigating for 2.2
Comment 4 Michael Valenta CLA 2003-03-11 10:36:51 EST
I'm adding Kai to the CC list because it is my understanding that he will be 
doing encoding work in 2.2 and this may be of interest to him as well.
Comment 5 Jean-Michel Lemieux CLA 2004-03-10 16:32:32 EST
I've read the cvs protocol spec and it says that string sent to the server can 
be any byte sequence and doesn't specify encoding restrictions. As long as all 
clients agree on encoding, then we should honour the platform encoding for 
comments.

Raising priority since this is a encoding incompatibility.
Comment 6 Jean-Michel Lemieux CLA 2004-03-10 16:34:11 EST
*** Bug 39321 has been marked as a duplicate of this bug. ***
Comment 7 Takashi Okamoto CLA 2004-03-19 18:57:41 EST
Created attachment 8719 [details]
patch to configuration encoding with CVS UI.

> I've read the cvs protocol spec and it says that string sent to
> the server can be any byte sequence and doesn't specify
> encoding restrictions.

In the fact, Japanese (and maybe other countries users) have conflict problem 
with encoding between sevral platform. I strongly recommend fixing this
problem.

This patch add encoding configuring function for Eclipse 3.0 latest
snapshot(2004-03-18) CVS core. Please use this patch with next patch for CVS
UI.

regards,

Takashi Okamoto
Comment 8 Takashi Okamoto CLA 2004-03-19 18:59:52 EST
Created attachment 8720 [details]
patch to configure encoding with CVS UI (2)

Please apply this patch with previous patch.
Comment 9 Jean-Michel Lemieux CLA 2004-03-26 09:25:36 EST
Thanks for the patched - this is a P1 bug for us to resolve in M9. 
Comment 10 Michael Valenta CLA 2004-03-26 11:04:02 EST
I've had a look at the patch and have the following questions and concerns:

1) I'm a bit uninformed when it comes to encodings. Is ASCII always preserved 
accross encoding translations (I'll look this up but thought I'd ask in case 
someone knows for sure).

2) The patch changes the encoding of outgoing messages to the server but does 
not do anything for incoming messages. Wouldn't the incoming messages need to 
be translated as well? Has anyone run with the patch or is it just a suggested 
direction?

Comment 11 Atsuhiko Yamanaka CLA 2004-03-26 11:51:21 EST
Hi,

 |1) I'm a bit uninformed when it comes to encodings. Is ASCII always preserved 
 |accross encoding translations (I'll look this up but thought I'd ask in case 
 |someone knows for sure).

ASCII characters will be always preserved. 
I, however, have another concern in that patch.  It seems that patch trys to 
translate every date from/to CVS server, but how about the case for binary 
files?  I mean files added with '-kb' option.
I think encoding translations should be done only for non-binary files and 
commit logs.

 |2) The patch changes the encoding of outgoing messages to the server but does 
 |not do anything for incoming messages. Wouldn't the incoming messages need to 
 |be translated as well? Has anyone run with the patch or is it just a   
 |suggested direction?

It seems to me that patch has also translated incoming messages in readLine 
method of org.eclipse.team.internal.ccvs.core.connection.Connection class.
Comment 12 Michael Valenta CLA 2004-03-26 13:14:08 EST
Sorry, I missed the read because I just did a visual inspection of the patch. 
The patch does not translate files at all since files are not transfered using 
the methods on Connection but are instead transfered directly to the streams. 
The encoding must be done for all other communications as they can contain 
file names or folder names which may be encoded. As long as ASCII is 
preserved, we should be OK.

As an aside, the expectation for situations like this is that files will 
always be binary since the CVS spec says that non-binary files are ASCII. 
There is a CVS preferences for making all new files binary which is meant to 
be used in these situations. Therefore, we do not plan on doing any encoding 
translations on files.
Comment 13 Atsuhiko Yamanaka CLA 2004-03-26 14:38:55 EST
Hi,

 |The patch does not translate files at all since files are not transfered 
using 
 |the methods on Connection but are instead transfered directly to the streams. 
 |The encoding must be done for all other communications as they can contain 
 |file names or folder names which may be encoded. As long as ASCII is 
 |preserved, we should be OK.

That sounds good!  We will wait for that patch will be applied to CVS 
repository.
Thank you for taking your time for this issue.
Comment 14 Michael Valenta CLA 2004-03-29 12:24:46 EST
I have released the Core changes to HEAD (with some modifications) and the UI 
changes to branch branch_20040329_CVSEncodingTranslation. I didn't release the 
UI changes to HEAD because it just uses a text field for the encoding. The 
Editors preference page (IDEEditorsPreferencePage) has a widget that allows 
selection of relevant encodings and we will want to do something similar for 
consistency. Unfortunately it is not reusable. We could either need to 
duplicate the code or generalize it and supply a patch to UI.
Comment 15 Michael Valenta CLA 2004-03-31 14:48:45 EST
I have released the UI changes for this to HEAD. The encoding can only be 
changed from the properties dialog. I didn't feel that it should be in the new 
location wizard since it would be confusing for most users. I am leaving this 
bug open since I have not had a chance to thouroughly test the encoding 
support. Feel free to give it a try.
Comment 16 Michael Valenta CLA 2004-04-06 15:54:22 EDT
Encoding support is in build I20040406 and beyond. I have tested that the 
encoding support does not have any negative effects when the encoding isn't 
set. However, I haven't gone through the process to set up a client and server 
that have different but compatible encodings. Since many of you who are 
listeneing to this bug work daily in this setup, I am hoping that some of you 
will volunteer to test the encoding support for this setup. 

Any volunteers?
Comment 17 Atsuhiko Yamanaka CLA 2004-04-10 12:38:23 EDT
  |Any volunteers?

I tried this new functionality and it worked!
Thank for your efforts.

FYI, here are what I did and what I got.
I have a CVS repository on GNU/Linux(Redhat 9) and its files
and commit logs are in 'EUC-JP' encoding.
I tried that repository by I20040407(GTK version) on GNU/Linux and
I could edit files and brows commit logs without any problems,
because Eclipse on GNU/Linux will use 'EUC-JP' by the default.

Next, I also tried that repository location by I20040407 on Win2k and
I could not browse commit logs correctly, because Eclipse on Windows
will use the encoding 'MS932' by the default.

So, I visited at the property page for that repository location and
go to the page 'Server Encoding'.  On this page, I putted 'EUC_JP' for the
encoding setting, and commit logs were displayed correctly!
Now, commit logs from Eclipse on Widonws can be shown on Eclipse on GNU/Linux.

Comment 18 Michael Valenta CLA 2004-05-20 09:32:53 EDT
*** Bug 63188 has been marked as a duplicate of this bug. ***