Bug 295036 - created patch with new file was mojibake
Summary: created patch with new file was mojibake
Status: RESOLVED DUPLICATE of bug 214085
Alias: None
Product: Platform
Classification: Eclipse Project
Component: CVS (show other bugs)
Version: 3.5.1   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: platform-cvs-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-12 23:55 EST by Shinya Uchimaki CLA
Modified: 2009-12-08 08:36 EST (History)
1 user (show)

See Also:


Attachments
patch for eclipse 3.5.1 (2.42 KB, patch)
2009-11-12 23:57 EST, Shinya Uchimaki CLA
no flags Details | Diff
Patch_v01 for HEAD (3.03 KB, text/plain)
2009-11-13 07:47 EST, Pawel Pogorzelski CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shinya Uchimaki CLA 2009-11-12 23:55:27 EST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Build Identifier: M20090917-0800

I made new file(example.Hoge.java) in cvs repository and edited the file.
[importance]The file isn't committed. 
I created patch for this file. The patch file was mojibake.
And I tested committed File. Committed File wasn't mobibake.

mojibake  example 
orignal-> abcdef
mojibale -> ?#x???

environment
Eclipse 3.5.1
System Encodeing    MS932(Japanese)
Hoge.java Text File Encoding    EUC-JP(Japanese)
CVS server Encoding EUC-JP(Japanese)

Reproducible: Always
Comment 1 Shinya Uchimaki CLA 2009-11-12 23:57:12 EST
Created attachment 152126 [details]
patch for eclipse 3.5.1
Comment 2 Pawel Pogorzelski CLA 2009-11-13 05:05:28 EST
I'll have a look at the patch.
Comment 3 Pawel Pogorzelski CLA 2009-11-13 07:13:05 EST
I can't reproduce the problem on my machine. I tried following steps.

1. Have a workspace project with Cp1252 as default encoding
2. Create a new file a.txt and change it's encoding to UTF-8
3. Type "ę" into a.txt
4. Create a patch (a.txt is an addition)

Result:
5. The patch file is created as containing "ę" (in UTF-8)
6. The patch file's encoding is set to Cp1252 so the content is not properly shown
7. If the patch file's encoding is changed to UTF-8 everything is fine

Result after applying a patch from comment 1:
8. The patch file is created as containing a corrupted character "?"
9. The patch file's encoding is set to Cp1252
10. Changing the encoding makes no difference since 3F is "?" in both UTF-8 and Cp1252
Comment 4 Pawel Pogorzelski CLA 2009-11-13 07:36:40 EST
So, the problem is that platform's default encoding is used to save a patch. With your fix applied we read file's (a.txt) input ("ę" in my case) and try to save it using Cp1252. Since the character is not available in Cp1252 "?" is saved instead. This causes data loss.

Without the fix bytes from a.txt are picked with wrong encoding. This causes misinterpretation of them (reading "ę" as two characters) but the misinterpreted chars are stored in the patch file using the same default encoding. Good thing about it is that after changing patch file's encoding it's possible to reinterpret "ę" as one character. In this particular example it causes no data loss.

For your fix to be complete an addition fix has to be supplied. This fix have to warn user that generated patch contains characters that can't be saved into file with encoding currently used.

Ideally it should allow choosing a different encoding in the same dialog but that's just nice to have.
Comment 5 Pawel Pogorzelski CLA 2009-11-13 07:41:01 EST
BTW "ę" is a code point for a variant of "e" letter.
Comment 6 Pawel Pogorzelski CLA 2009-11-13 07:47:49 EST
Created attachment 152148 [details]
Patch_v01 for HEAD

Shinya, this is your patch adjusted for HEAD. Are you willing to add code that warns users in case of data loss?
Comment 7 Shinya Uchimaki CLA 2009-11-13 09:56:55 EST
I am going to add the code.
But I am sorry, I don't have a time to do it by busy work and searching new my home; 

So I need a time for 3 weeks to add them. 
well,Thank you very much your great advice
Comment 8 Pawel Pogorzelski CLA 2009-12-08 08:36:42 EST
Looks like the same thing is already being discussed on bug 214085. In particular see Dani's comment 12. This looks like a more reasonable approach to solve the problem (don't convert to string at all).

I'm removing the target. I'm also marking this bug a duplicate of bug 214085 in order to join the threads.

*** This bug has been marked as a duplicate of bug 214085 ***