Community
Participate
Working Groups
WebTools had an option to save its UTF-8 encoded files with the correct BOM. Now that it is using FileBuffers for opening and saving files, the previous setting and preference no longer applies. It would be nice if the platform could provide this itself for all text file types.
The following scenario does work for me: - open an UTF-8 file with BOM - change it - save it ==> BOM written back to file You are asking for a way to specify the BOM when creating a new file, correct? How did you do this in WebTools before switching to file buffers? Possible solutions could be - introduce an "UTF-8 BOM" encoding This would allow to specify the BOM for UTF-8 in the UI without modifications to the API and in the UI. - add Core API which tells whether a resource (workspace, container, file) wants to force a BOM and change the current BOM indication in the properties dialog to show a checkbox which depending on the encoding is enabled (e.g. for UTF-8) or disabled. Moving to Platform Resources for comment. Once API is in place we can adapt the file buffers.
I am not convinced that Resources is the right place to provide such thing. It seems to me that the use case is different than the encoding case. In the encoding case, users (not plugins) are making choices that need to be preserved for the life of the resource. Here, it looks like it is either an one-time user's choice (like in a save as... dialog) or a tool's choice (e.g. one may always want create UTF-8 files with BOMs), so I can't see the value of having a per-file/container setting of whether BOMs should be created. Why can't ITextFileBuffer allow clients to programmatically say if they want to save a BOM or not?
>Why can't ITextFileBuffer allow clients to programmatically say if they want to >save a BOM or not? I did not say they can't, I actually wrote that we'll have to adapt the file buffers based on the BOM info that we get from Platform Resources: I think the BOM info corresponds to the encoding which can be set for resources, containers and the workspace. Since the encoding and detecting the BOM is already handled by Platform Resources I think it is also the right place to offer this UTF-8 BOM setting. It would allow different plug-ins (including those not relying on file buffers) to access the UTF-8 BOM information and attach the BOM when creating a file in a container that has that flag set.
I just don't think there is need for an extra setting. As you said before, an existing BOM is automatically preserved, and that is cool. So, for existing files (which is usually the common case), there is no need for a setting. When creating a new file, it is just a matter of the client saying what it wants for that file. Tools may alwys want to create UTF-8 files with BOMs. I see BOM enablement a much less frequent use case to justify the overhead of having a scheme similar to the one provided for encoding. Actually, the originator is not requesting that much flexibility. They have their own preference, they just don't have means to make it effective.
Nitin, can you confirm that this flexibility is not needed and having API on the file buffers to force the BOM fits your needs?
Yes, I'm not requesting the level of flexibility that Daniel discusses in comment 3, it is exactly like Rafael says. SSE only needs to be able to force the addition of the BOM when it otherwise would not be written out.
OK, then.
hi are you working on this issue? we have BIG troubles with this not extisting BOM. For e.g. we come from Dreamweaver and developing ColdFusion... All old development is done in Homesite <=5 and Dreamweaver <=6.1. and therefor encoding is windows 1252. So i'm changed the workspace to UTF8 for creating new files only in UTF8. After we change this all files in the workspace looks destroyed if it comes to german umlauts like צה� and so on. if i change back to windows-1251 and create new files, they are not UTF8. 1. we must create new files only in UTF-8 *with* BOM (ColdFusion requires this and Dreamweaver, too). 2. we need a autodetection of older windows-1252 files and they should be opened as windows 1252 whatever the workspace config says and should inherit. (Critical - eclipse destroyes our files without our knowledge) 3. new files must saved as UTF8 and add a BOM by default, everytime. 4. additional i found there is the same problem with UTF16 in eclipse. IBM wrote in http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/cwbs_wsiprofile.html there must be BOM in UTF-16, but eclipse do not write this everytime. Only the UTF-16 setting saves a BOM to the files the UTF-16LE and UTF-16BE not! Is there any timeframe we can expect a fix? This is critical issue and it is not fixed in more then 2 years until now. PLEASE change priority to P1 and start working on this ASP. Regards Marc
If I understand you correctly you're looking for what I outlined in comment 3 and which I was told is not needed/requested.: the ability in the UI to specify that new files are created with BOM. If so, you should try again to raise this in a separate bug logged against Platform Resources (please add myself to the cc-list if you do so). This bug iabout adding API to text file buffers. >2. we need a autodetection of older windows-1252 files Sorry, this is not possible.
(In reply to comment #9) > If I understand you correctly you're looking for what I outlined in comment 3 > and which I was told is not needed/requested.: the ability in the UI to specify > that new files are created with BOM. i cannot realy understand why this is not "needed"!? > >2. we need a autodetection of older windows-1252 files > Sorry, this is not possible. why? if this is done, we will never ever run in a encoding detection problem. and this should be *the* goal.
Note that if the "BOM" is actually required by ColdFusion and Dreamweaver, then these applications are not processing UTF-8 correctly. From the Unicode reference: Because the UTF-8 encoding form already deals in ordered byte sequences, the UTF-8 encoding scheme is trivial. The byte ordering is already obvious and completely defined by the UTF-8 code unit sequence itself. The UTF-8 encoding scheme is defined merely for completeness of the Unicode character encoding model. While there is obviously no need for a byte order signature when using UTF-8, there are occasions when processes convert UTF-16 or UTF-32 data containing a byte order mark into UTF-8. When represented in UTF-8, the byte order mark turns into the byte sequence <EF BB BF>. Its usage at the beginning of a UTF-8 data stream is neither required nor recommended by the Unicode Standard, but its presence does not affect conformance to the UTF-8 encoding scheme. Identification of the <EF BB BF> byte sequence at the beginning of a data stream can, however, be taken as near-certain indication that the data stream is using the UTF-8 encoding scheme.
I found out that i can put a UTF-8 encoded file with a BOM from Dreamweaver into a Eclipse project. After i added the file it is WRONGLY detected by Eclipse as cp1252. 1. If i open the file the German "Sonderzeichen" are destroyed. 2. close the file 3. change Encoding to UTF-8 in settings of this one file (container is not inherited!!!) 4. open the file, all works, German Sonderzeichen are displayed correctly Do you think that UTF-8 detection in Eclipse is bugfree? I think NOT!!! Eclipse and encoding suxxx all people working with UTF-8 files, be assured - i know 10+ next to me.
(In reply to comment #12) > Do you think that UTF-8 detection in Eclipse is bugfree? I think NOT!!! It can also depend on the kind of file you were working with, as some files that declare their encoding internally expect to be read in that way as well.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug. If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie.