Bug 110576 - [encoding] Rename CU looses encoding for file which charset is determined by contents
Summary: [encoding] Rename CU looses encoding for file which charset is determined by ...
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.1.1   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.2 M3   Edit
Assignee: Frederic Fusier CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-26 06:13 EDT by Martin Aeschlimann CLA
Modified: 2005-10-31 06:15 EST (History)
4 users (show)

See Also:


Attachments
Patch to fix this issue (5.49 KB, patch)
2005-09-29 11:23 EDT, Frederic Fusier CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2005-09-26 06:13:54 EDT
3.1.1

- Open notepad, enter some umlauts to the text and save with 'Save As', UTF-8,
file name 'X.txt'
- Drag the file into a workspace; check file properties to see that encoding has
been detected: UTF 8, with BOM
- Use 'Rename' (-> resource rename) to rename the file to 'X.java'
- File properties still show the file as UTF-8
- Use 'Rename' (-> refactoring rename) to rename file to 'Y.java'
Now the file-encoding is CP1252
Comment 1 Martin Aeschlimann CLA 2005-09-26 11:32:43 EDT
markus, is this us or jdt.core?
Comment 2 Markus Keller CLA 2005-09-28 06:05:40 EDT
This is core. We just call ICompilationUnit#rename(..).

Note that the problem only occurs when the CU is in a non-default package. I
guess the rewrite of the package declaration does not set the encoding, and thus
the BOM is lost.
Comment 3 Frederic Fusier CLA 2005-09-28 06:49:15 EDT
Cannot reproduce using build M20050923-1430.
Note that this issue was a duplicate of bug 66898. So it's sounds normal that it
does not longer occur in this build (there's a regression test in EncodingTests).

*** This bug has been marked as a duplicate of 66898 ***
Comment 4 Martin Aeschlimann CLA 2005-09-28 07:01:42 EDT
I still have this in M20050923-1430.

Make sure that the file is in a package (not the default package)
Comment 5 Frederic Fusier CLA 2005-09-28 07:12:23 EDT
Agreed, that was the missing info to reproduce it...
Comment 6 Frederic Fusier CLA 2005-09-28 07:13:20 EDT
Sorry, in fact I missed it in comment 2...
Comment 7 Frederic Fusier CLA 2005-09-28 10:21:00 EDT
Only happen for file which encoding is determined by content.
Also occur in 3.0
Comment 8 Frederic Fusier CLA 2005-09-28 13:27:31 EDT
This problems comes from the fact that copy/move a resource in Java perspective
also change its contents.

Problem is not on resource charset but on its content description which is lost
during the operation. As Platform clients has no possibility to set this
description, I cannot figure out how we can fix that. I currently just have a
workaround which explicitely sets file charset to content description charset
value in this case which is not really a good solution...

I see that content description cache in ContentDescriptionManager uses timestamp
based on ResourceInfo.charSetAndContentId. This works well while only
moving/copying a resource or while only changing its contents, but it seems that
it does not work while doing both.

Perhaps is it related to bug 67606? I don't really know...

Move to Platform/Resources for comments and further investigations.
Comment 9 Rafael Chaves CLA 2005-09-28 14:37:25 EDT
"This problems comes from the fact that copy/move a resource in Java perspective
also change its contents."

Frederic, before we look into the potential issue you mentioned, could you
clarify on this? How is that content change performed? Who does that? If
java.io.Readers/Writers are used, that would explain the loss of a BOM.
Comment 10 Frederic Fusier CLA 2005-09-28 16:31:31 EDT
Changes is done using an org.eclipse.jdt.core.dom.rewrite.ASTRewrite and saved
using org.eclipse.jdt.internal.core.Buffer.

Put a breakpoint in method processCompilationUnitResource(ICompilationUnit,
PackageFragment) of org.eclipse.jdt.internal.core.CopyResourceElementsOperation
class.

You'll see first copy/move of resource associated with ICompilationUnit, then
after, in method saveContent(PackageFragment,String,ASTRewrite,String,IFile),
change of code and save using method save(IProgressMonitor,boolean) of
ICompilationUnit which calls corresponding save(IProgressMonitor,boolean) method
of Buffer.

In this method, this is the IFile which is used to set new contents, no direct
java.io access is done.
Comment 11 Rafael Chaves CLA 2005-09-28 17:23:33 EDT
Thanks for the pointers. I know what is going on. 

The encoding information is not lost during the file rename
(CopyResourceElementsOperation#processCompilationUnitResource). It is lost later
during Buffer#save(...). The following code will cause the BOM to be lost:

byte[] bytes = encoding == null ? 
  stringContents.getBytes() : 
  stringContents.getBytes(encoding);

"stringContents" will be encoded as UTF-8 but the BOM is lost now
(String.getBytes("UTF-8") does not produce a BOM). And the problem is that there
is no supported way in Java for creating an output stream that has a UTF-8 BOM
(UTF-8 BOMs are optional). The text/Java editors, for instance, explicitly write
a BOM out before writing the file contents.

If you want to do something along those lines (only when the encoding is UTF-8),
you can use this to find if the file has a BOM:

IContentDescription.getProperty(IContentDescription.BYTE_ORDER_MARK)

And then you can write those bytes to the byte array output stream before
writing the encoded string contents.

Reassigning to Frederic as requested.
Comment 12 Frederic Fusier CLA 2005-09-28 17:41:25 EDT
Cristal clear now :-)
Thanks Rafael for the help
Comment 13 Frederic Fusier CLA 2005-09-29 11:23:10 EDT
Created attachment 27679 [details]
Patch to fix this issue

Apply same patch in Buffer.save than Platform/Text does in
ResourceTextFileBuffer.commitFileBufferContent...
Comment 14 Frederic Fusier CLA 2005-09-30 09:13:11 EDT
Patch released in HEAD.

Test case added in EncodingTests
Comment 15 Jerome Lanneluc CLA 2005-10-31 06:15:27 EST
Verified for 3.2 M3 using build I20051031-0010