Community
Participate
Working Groups
This is a internationalization proble. I write about Japanse but some of other language may have same problem. When I use CVS reposity with Windows and Linux in Japanese. CVS message is sent with EUC-JP to pserver in Linux environment. But in Windows it is Windows-31J(MS932).This problem bring broken characters in Windows and Linux mixed environment. I investigated the cause and found problem is in Connection.java: public String readLine() throws CVSException { ... String result = new String(readLineBuffer, 0, index); .... public void write(String s) throws CVSException { write(s.getBytes(), false); } .... public void writeLine(String s) throws CVSException { write(s.getBytes(), true); } Above code doesn't specify charset and use default charset. I want to set charset by plugin.properties here. It will resolve charset conflict between Windows and Linux. Please fix this problem. regards, Takashi Okamoto
That is correct, we use the platform character set (as does Eclipse in general). We'd need to consider the ramifications of using something different. Michael, comments?
Created attachment 3932 [details] set charset in preferences.ini I attached my patch to set charset at preferences.ini.I can't help for GUI but I would like to set charset in cvs dialog for each repositoly.
This sounds interesting and is worth investigating for 2.2
I'm adding Kai to the CC list because it is my understanding that he will be doing encoding work in 2.2 and this may be of interest to him as well.
I've read the cvs protocol spec and it says that string sent to the server can be any byte sequence and doesn't specify encoding restrictions. As long as all clients agree on encoding, then we should honour the platform encoding for comments. Raising priority since this is a encoding incompatibility.
*** Bug 39321 has been marked as a duplicate of this bug. ***
Created attachment 8719 [details] patch to configuration encoding with CVS UI. > I've read the cvs protocol spec and it says that string sent to > the server can be any byte sequence and doesn't specify > encoding restrictions. In the fact, Japanese (and maybe other countries users) have conflict problem with encoding between sevral platform. I strongly recommend fixing this problem. This patch add encoding configuring function for Eclipse 3.0 latest snapshot(2004-03-18) CVS core. Please use this patch with next patch for CVS UI. regards, Takashi Okamoto
Created attachment 8720 [details] patch to configure encoding with CVS UI (2) Please apply this patch with previous patch.
Thanks for the patched - this is a P1 bug for us to resolve in M9.
I've had a look at the patch and have the following questions and concerns: 1) I'm a bit uninformed when it comes to encodings. Is ASCII always preserved accross encoding translations (I'll look this up but thought I'd ask in case someone knows for sure). 2) The patch changes the encoding of outgoing messages to the server but does not do anything for incoming messages. Wouldn't the incoming messages need to be translated as well? Has anyone run with the patch or is it just a suggested direction?
Hi, |1) I'm a bit uninformed when it comes to encodings. Is ASCII always preserved |accross encoding translations (I'll look this up but thought I'd ask in case |someone knows for sure). ASCII characters will be always preserved. I, however, have another concern in that patch. It seems that patch trys to translate every date from/to CVS server, but how about the case for binary files? I mean files added with '-kb' option. I think encoding translations should be done only for non-binary files and commit logs. |2) The patch changes the encoding of outgoing messages to the server but does |not do anything for incoming messages. Wouldn't the incoming messages need to |be translated as well? Has anyone run with the patch or is it just a |suggested direction? It seems to me that patch has also translated incoming messages in readLine method of org.eclipse.team.internal.ccvs.core.connection.Connection class.
Sorry, I missed the read because I just did a visual inspection of the patch. The patch does not translate files at all since files are not transfered using the methods on Connection but are instead transfered directly to the streams. The encoding must be done for all other communications as they can contain file names or folder names which may be encoded. As long as ASCII is preserved, we should be OK. As an aside, the expectation for situations like this is that files will always be binary since the CVS spec says that non-binary files are ASCII. There is a CVS preferences for making all new files binary which is meant to be used in these situations. Therefore, we do not plan on doing any encoding translations on files.
Hi, |The patch does not translate files at all since files are not transfered using |the methods on Connection but are instead transfered directly to the streams. |The encoding must be done for all other communications as they can contain |file names or folder names which may be encoded. As long as ASCII is |preserved, we should be OK. That sounds good! We will wait for that patch will be applied to CVS repository. Thank you for taking your time for this issue.
I have released the Core changes to HEAD (with some modifications) and the UI changes to branch branch_20040329_CVSEncodingTranslation. I didn't release the UI changes to HEAD because it just uses a text field for the encoding. The Editors preference page (IDEEditorsPreferencePage) has a widget that allows selection of relevant encodings and we will want to do something similar for consistency. Unfortunately it is not reusable. We could either need to duplicate the code or generalize it and supply a patch to UI.
I have released the UI changes for this to HEAD. The encoding can only be changed from the properties dialog. I didn't feel that it should be in the new location wizard since it would be confusing for most users. I am leaving this bug open since I have not had a chance to thouroughly test the encoding support. Feel free to give it a try.
Encoding support is in build I20040406 and beyond. I have tested that the encoding support does not have any negative effects when the encoding isn't set. However, I haven't gone through the process to set up a client and server that have different but compatible encodings. Since many of you who are listeneing to this bug work daily in this setup, I am hoping that some of you will volunteer to test the encoding support for this setup. Any volunteers?
|Any volunteers? I tried this new functionality and it worked! Thank for your efforts. FYI, here are what I did and what I got. I have a CVS repository on GNU/Linux(Redhat 9) and its files and commit logs are in 'EUC-JP' encoding. I tried that repository by I20040407(GTK version) on GNU/Linux and I could edit files and brows commit logs without any problems, because Eclipse on GNU/Linux will use 'EUC-JP' by the default. Next, I also tried that repository location by I20040407 on Win2k and I could not browse commit logs correctly, because Eclipse on Windows will use the encoding 'MS932' by the default. So, I visited at the property page for that repository location and go to the page 'Server Encoding'. On this page, I putted 'EUC_JP' for the encoding setting, and commit logs were displayed correctly! Now, commit logs from Eclipse on Widonws can be shown on Eclipse on GNU/Linux.
*** Bug 63188 has been marked as a duplicate of this bug. ***