Community
Participate
Working Groups
Created attachment 85981 [details] Screenshot of Apply Patch dialog showing conflict Build ID: M20071023-1652 Steps To Reproduce: 1. Create a new class called Test with the following contents: (the comment simply says "I am a German comment. I may even contain umlauts, e. g. 'ä'") --------- public class Test { /* Ich bin ein deutscher Kommentar. * Ich darf sogar Umlaute enthalten, wie zum * Beispiel "ä". */ private int dummy = 0; } --------- 2. Commit to CVS 3. Change the value of the dummy variable to 1. 4. Select Team->Create Patch and leave the defaults (clipboard) 5. Restore from CVS 6. Select Team->Apply Patch from clipboard Eclipse will report an unmatching patch, because the German umlaut "ä" was not correctly put into the diff output, but replaced with a question mark. This is of course not limited to block comments. I have attached a screenshot of the conflict view.
Eclipse's Create Patch operation is based on the diff command called on your CVS repository. I would check (encoding) configuration of the repository first if I were you. Patches with both German Umlauts and Polish tails work fine for me. Let me know if you need some additional info.
I would indeed like some hints on where to look on the CVS server. What would I have to check/modify? I did not find anything helpful in the CVS manual. Apart from that I tried to get a better understanding and had a look at the CVS command Eclipse issues upon "Create Patch" and used the same arguments for a command line cvs call (cvs.exe from TortoiseCVS). Taking the output from that (redirected to a file) results in a working patch. Running "cvs diff -N -u -l" (taken from the Eclipse CVS console) on the Linux machine and inspecting the output in hex shows E4 for the German umlaut which is the correct code for codepage 1252 (Windows default) and is also what the file properties dialog in Eclipse shows for "Test.java". This output is identical to the cvs.exe output apart from the line terminators. After running unix2dos on the Linux side to correct that, both diff files result in the same MD5 hash. What encoding does Eclipse use when it tries to apply the patch?
(In reply to comment #2) > What encoding does Eclipse use when it tries to apply the patch? I belive that the Compare Editor uses an encoding specified in the workspace for the current local resource, but I'm not sure how it determines the encoding of a remote resource. I would need to take a look in the code to check that. Anyway, this case realized me that addressing bug 206586 could save us some effort here.
I have Eclipse configured to use UTF-8 by default. I have checked out org.eclipse.jdt.core.tests.model from eclipse cvs. Then I made some modifications to file org.eclipse.jdt.core.tests.model.JavaSearchTests which contains some non-ascii characters (several occurrences of '§', to be precise). In this situation I found it very difficult to create a correct workspace patch containing the differences of this file. It seems that the cvs checkout operation converts the file to UTF-8 (with no option to select the char encoding). At the first attempt this particular file was simply dropped (!!) from the patch while other changes in my workspace were correctly recorded. Next I changed the project preferences to use ISO-8859-1. Now the special characters were of course no longer correctly displayed. At this point I could create a patch with the fake differences discussed in this bug. Then I used the compare editor to *individually* fetch a fresh copy of each chunk containing funny chars into my workspace file, which changed these characters into ISO-8859-1, so in the editor they would look OK again. [IMHO This operation is not feasible if you have a large workspace with many local changes mixed with many differences caused by the encoding mess] Only then I could create the proper patch file. The special pain in this scenario stems from the inconsistency between automatically translating to the local default encoding upon checkout (which I generally like) and not being able to proceed with this encoding during "create patch" (which I consider a critical error actually). Two suggestions: 1. "create patch" should be investigated as it seems to silently die when encountering two file versions (remote/local) with incompatible char encoding. 2. the checkout wizard could offer an option to select the char encoding instead of forcing a recoding to the local default. Of course it would be great if the user would be informed when checking out a project with an encoding that differs from the workspace default.
Could this issue please be considered for 3.5? The current status simply means that a machine with default UTF-8 encoding cannot create patches for a CVS repository with ISO encoding. This is a critical bug.
OK, how does this relate to the fixed bug 223857? Also, what is the suggested way of working with different encodings locally/remotely? On a UTF-8 machine, recent checkouts from eclipse cvs look garbled (non-ascii characters replaced by inverted "?") but compare seems to work, hm?
It hit me once again just a minute ago using the latest integration build (I20090929-0800): 1. I have an eclipse project checked out for preparing a patch. 2. I do an update on the project -> OK 3. I do a "Synchronize with Repository": I see exactly my changes, nothing more 4. I create a patch: It contains garbage because a non-ascii char has changed (to itself) To reproduce: 1. checkout project org.eclipse.jdt.core.tests.model into a workspace with default encoding set to UTF-8. 2. Edit class JavaSearchTest. 3. Create a patch and observer fake changes regarding each occurrence of '§' Using a binary editor I see that the patch file indeed contains null-changes: the extra before and after lines are identical!! Note that the editor cannot correctly display the special char, because the checkout operation does not perform any conversion. So we do have unprintable characters on both sides, which seems to be fine for compare but not for create patch. Can someone please at least point out where in the code this is handled, so we have a chance to get this embarrassing bug off the table? On a different note: why doesn't the checkout wizard allow to select the encoding used in the checked out project?
FWIW, the '§' char is consistently converted to 0xef 0xbf 0xbd, which is a legal UTF-8 sequence but not mapped to any symbol, it seems. This sequence is not printable in any encoding that I know of.
Please see the attachment from bug 295894 comment 12 for a patch containing the garbage described earlier. Both the Eclipse install (3.6 M3) and the workspace were fresh. Not a single preference was changed. The linux environment contains: LANG=en_US.UTF-8
Make sure not to mix things: - if the checked out project (or resource in general) uses special characters without specifying the encoding then it's pure luck whether it works or not and starting to edit and save such a file will corrupt it unless you're lucky and the special character cannot be mapped with the invalid encoding. Of course afterwards making a patch on top of the corrupted file results in a corrupted patch. This can only be fixed by the project that uses special characters: it has to set the project specific encoding accordingly and store it in CVS. - creating the patch: here the mistake can be made in the code that the wrong encoding is used: the resulting patch will normally have the default OS encoding and hence this needs to be taken into account - applying the patch: here the same thing: the patch read from file or clipboard needs to be read with OS default encoding
Just to make sure I did not miss anything I tried my original testcase again with 3.5.1 and it still fails. Files are stored in Windows default Cp1252 locally, the CVS repository location is configured for UTF-8. From what I understand this should only affect filename and commit comments however. Creating a patch as described results in the same unusable patch - even when just put into to local keyboard, no files involved. I did observe however, that when the CVS repository encoding and the file encoding in the workspace - two completely separate things IMHO - match, the patch can be created correctly. Apparently there seems to be something wrong there. The only option - and I would be reluctant to do so, because I would have to check in the majority of our several 100.000 source files - would be to convert them all to UTF-8 locally. I believe there should be a better, more user-friendly solution.
I quickly looked at the code (org.eclipse.team.internal.ccvs.core.client.Diff and org.eclipse.team.internal.ccvs.core.client.listeners.DiffListener) that reads the patch from the server and then writes it to disk or the clipboard: it has two fundamental design issues: 1) The patch which is created on the server is wrongly converted into a string on the client using the encoding specified on the repository properties page. This is doomed to fail because the patch should be kept as bytes until it is compared/applied to the local file (at which point we can use each file's encoding). A patch can patch several files each of them having a different encoding hence converting the bytes into a string using a single encoding can corrupt parts of the patch. 2) When writing the patch out to a file or the clipboard using the platform default encoding more data can be lost. The patch should be written to a (binary) file as is i.e. as byte stream received from the server. The clipboard scenario is harder to fix (would need consistent encoding of the bytes). This also explains why binary patches (see bug 257263) can't work at this point.
*** Bug 295036 has been marked as a duplicate of this bug. ***
(In reply to comment #12) > 2) When writing the patch out to a file or the clipboard using the platform > default encoding more data can be lost. The patch should be written to a > (binary) file as is i.e. as byte stream received from the server. That's one way of addressing the problem. It should be pretty straightforward to implement. But what if encoding is changed after the patch has been generated? We would misinterpret stored bytes. And in case both UTF-8 and UTF-16 encodings are used (not sure if it's possible) mixing them in one patch file would be a mess. Wouldn't it be better to enforce one encoding per patch file? This would allow to handle them as text files and that's basically the whole point of creating a patch. I'm thinking of a prompt saying something like: "The patch contains characters that can't be stored using default encoding. Please select different encoding, otherwise a data loss will occur". This could be hard to implement though since all the diffs have to be processed to determine if the default encoding causes data loss.
>But what if encoding is changed after the patch has been generated? Bad luck. >Wouldn't it be better to enforce one encoding per patch file? Definitely no go. The SDK has projects with different encodings. This would prevent me from making a single patch covering changes in such projects. And of course it would be pretty hard to implement since the patch is generated on the server: you would have to do conversions per file on the client which seems wrong to me. Also, it would definitely break binary patching. >This would allow >to handle them as text files and that's basically the whole point of creating a >patch. No. There are also requests to support binary stuff in a patch.
(In reply to comment #15) > >But what if encoding is changed after the patch has been generated? > Bad luck. > [...] > >This would allow > >to handle them as text files and that's basically the whole point of creating a > >patch. > No. There are also requests to support binary stuff in a patch. Full Ack.
(In reply to comment #15) > >Wouldn't it be better to enforce one encoding per patch file? > Definitely no go. The SDK has projects with different encodings. This would > prevent me from making a single patch covering changes in such projects. It would simply work for majority of cases. However for a fraction of patches you would have to specify more comprehensive encoding like UTF-8 for the patch to cover all the characters. I don't see how enforcing single encoding prevents from making one patch in this case. Supporting binaries in patches is a different thing than the encoding we use for storing text differences. I don't see how they are related. Anyway, I'm just exploring different approaches. I like the approach of storing diffs as returned from the server. Just want to make sure we discuss other options.
*** Bug 269247 has been marked as a duplicate of this bug. ***
Bumping severity based on duplicates.
See bug for 309430 the Apply Patch issue.
>See bug for 309430 the Apply Patch issue. Should read: "See bug 309430 for the Apply Patch issue.
*** Bug 333339 has been marked as a duplicate of this bug. ***
Bug 215360 looks like a duplicate.
Look at org.eclipse.team.internal.ccvs.core.client.listeners.DiffListener. It treats the diff response as strings. See bug 215360 for why that's wrong.
*** Bug 215360 has been marked as a duplicate of this bug. ***
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie.
For me this may no longer be relevant due to bug 438875, i.e., by avoiding any encodings other than UTF-8. Not a solution to this bug, though.
This is probably still an issue, but CVS support is no longer under active development.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug. If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie.