Community
Participate
Working Groups
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729) Build Identifier: M20090917-0800 I made new file(example.Hoge.java) in cvs repository and edited the file. [importance]The file isn't committed. I created patch for this file. The patch file was mojibake. And I tested committed File. Committed File wasn't mobibake. mojibake example orignal-> abcdef mojibale -> ?#x??? environment Eclipse 3.5.1 System Encodeing MS932(Japanese) Hoge.java Text File Encoding EUC-JP(Japanese) CVS server Encoding EUC-JP(Japanese) Reproducible: Always
Created attachment 152126 [details] patch for eclipse 3.5.1
I'll have a look at the patch.
I can't reproduce the problem on my machine. I tried following steps. 1. Have a workspace project with Cp1252 as default encoding 2. Create a new file a.txt and change it's encoding to UTF-8 3. Type "ę" into a.txt 4. Create a patch (a.txt is an addition) Result: 5. The patch file is created as containing "ę" (in UTF-8) 6. The patch file's encoding is set to Cp1252 so the content is not properly shown 7. If the patch file's encoding is changed to UTF-8 everything is fine Result after applying a patch from comment 1: 8. The patch file is created as containing a corrupted character "?" 9. The patch file's encoding is set to Cp1252 10. Changing the encoding makes no difference since 3F is "?" in both UTF-8 and Cp1252
So, the problem is that platform's default encoding is used to save a patch. With your fix applied we read file's (a.txt) input ("ę" in my case) and try to save it using Cp1252. Since the character is not available in Cp1252 "?" is saved instead. This causes data loss. Without the fix bytes from a.txt are picked with wrong encoding. This causes misinterpretation of them (reading "ę" as two characters) but the misinterpreted chars are stored in the patch file using the same default encoding. Good thing about it is that after changing patch file's encoding it's possible to reinterpret "ę" as one character. In this particular example it causes no data loss. For your fix to be complete an addition fix has to be supplied. This fix have to warn user that generated patch contains characters that can't be saved into file with encoding currently used. Ideally it should allow choosing a different encoding in the same dialog but that's just nice to have.
BTW "ę" is a code point for a variant of "e" letter.
Created attachment 152148 [details] Patch_v01 for HEAD Shinya, this is your patch adjusted for HEAD. Are you willing to add code that warns users in case of data loss?
I am going to add the code. But I am sorry, I don't have a time to do it by busy work and searching new my home; So I need a time for 3 weeks to add them. well,Thank you very much your great advice
Looks like the same thing is already being discussed on bug 214085. In particular see Dani's comment 12. This looks like a more reasonable approach to solve the problem (don't convert to string at all). I'm removing the target. I'm also marking this bug a duplicate of bug 214085 in order to join the threads. *** This bug has been marked as a duplicate of bug 214085 ***