Community
Participate
Working Groups
3.1.1 - Open notepad, enter some umlauts to the text and save with 'Save As', UTF-8, file name 'X.txt' - Drag the file into a workspace; check file properties to see that encoding has been detected: UTF 8, with BOM - Use 'Rename' (-> resource rename) to rename the file to 'X.java' - File properties still show the file as UTF-8 - Use 'Rename' (-> refactoring rename) to rename file to 'Y.java' Now the file-encoding is CP1252
markus, is this us or jdt.core?
This is core. We just call ICompilationUnit#rename(..). Note that the problem only occurs when the CU is in a non-default package. I guess the rewrite of the package declaration does not set the encoding, and thus the BOM is lost.
Cannot reproduce using build M20050923-1430. Note that this issue was a duplicate of bug 66898. So it's sounds normal that it does not longer occur in this build (there's a regression test in EncodingTests). *** This bug has been marked as a duplicate of 66898 ***
I still have this in M20050923-1430. Make sure that the file is in a package (not the default package)
Agreed, that was the missing info to reproduce it...
Sorry, in fact I missed it in comment 2...
Only happen for file which encoding is determined by content. Also occur in 3.0
This problems comes from the fact that copy/move a resource in Java perspective also change its contents. Problem is not on resource charset but on its content description which is lost during the operation. As Platform clients has no possibility to set this description, I cannot figure out how we can fix that. I currently just have a workaround which explicitely sets file charset to content description charset value in this case which is not really a good solution... I see that content description cache in ContentDescriptionManager uses timestamp based on ResourceInfo.charSetAndContentId. This works well while only moving/copying a resource or while only changing its contents, but it seems that it does not work while doing both. Perhaps is it related to bug 67606? I don't really know... Move to Platform/Resources for comments and further investigations.
"This problems comes from the fact that copy/move a resource in Java perspective also change its contents." Frederic, before we look into the potential issue you mentioned, could you clarify on this? How is that content change performed? Who does that? If java.io.Readers/Writers are used, that would explain the loss of a BOM.
Changes is done using an org.eclipse.jdt.core.dom.rewrite.ASTRewrite and saved using org.eclipse.jdt.internal.core.Buffer. Put a breakpoint in method processCompilationUnitResource(ICompilationUnit, PackageFragment) of org.eclipse.jdt.internal.core.CopyResourceElementsOperation class. You'll see first copy/move of resource associated with ICompilationUnit, then after, in method saveContent(PackageFragment,String,ASTRewrite,String,IFile), change of code and save using method save(IProgressMonitor,boolean) of ICompilationUnit which calls corresponding save(IProgressMonitor,boolean) method of Buffer. In this method, this is the IFile which is used to set new contents, no direct java.io access is done.
Thanks for the pointers. I know what is going on. The encoding information is not lost during the file rename (CopyResourceElementsOperation#processCompilationUnitResource). It is lost later during Buffer#save(...). The following code will cause the BOM to be lost: byte[] bytes = encoding == null ? stringContents.getBytes() : stringContents.getBytes(encoding); "stringContents" will be encoded as UTF-8 but the BOM is lost now (String.getBytes("UTF-8") does not produce a BOM). And the problem is that there is no supported way in Java for creating an output stream that has a UTF-8 BOM (UTF-8 BOMs are optional). The text/Java editors, for instance, explicitly write a BOM out before writing the file contents. If you want to do something along those lines (only when the encoding is UTF-8), you can use this to find if the file has a BOM: IContentDescription.getProperty(IContentDescription.BYTE_ORDER_MARK) And then you can write those bytes to the byte array output stream before writing the encoded string contents. Reassigning to Frederic as requested.
Cristal clear now :-) Thanks Rafael for the help
Created attachment 27679 [details] Patch to fix this issue Apply same patch in Buffer.save than Platform/Text does in ResourceTextFileBuffer.commitFileBufferContent...
Patch released in HEAD. Test case added in EncodingTests
Verified for 3.2 M3 using build I20051031-0010