Community
Participate
Working Groups
When the document has the characters which can't be converted to the specified encoding and it is saved, no error is reported so that it gives the users that the save is ok. However, when the editor is closed and the same document is opened again, such characters are all changed to '?'. ( This is because java converter changes all such characters to '?' ). For example, if Window > Preferences > Workbench > Editors > Text file encoding is set to US-ASCII, and the text file is opened by TextEditor and the non US-ASCII characters are entered ( all they are doable now ), and finally that file is saved, no error is reported. However, after that document is closed and is opened again, there are '?' characters. This is a surpirse for the users so that it would be best if the save detected this kind of invalid character errors and reported this to the users. For example, OutputStreamWriter#write throws UnmappableCharacterException, so we can detect this kind of encoding errors.
>the non US-ASCII characters >are entered ( all they are doable now ) How do you do it? Copy/Paste? The only way to detect this is to re-read the saved file and then compare it to the editor content. This significantly slows down save performance.
> How do you do it? Copy/Paste? If you are using non US locale OS, you can do it. In my environment, I'm always using Japanese Win OS locale and I can set up US-ASCII to the preference though TextEditor can accept Japanese characters from IM. > The only way to detect this is to re-read the saved file OutputStreamWriter can throw the exception. There is no need to re-read the file.
What do other editors do in that situation (e.g. Notepad)? >OutputStreamWriter can throw the exception. There is no need >to re-read the file. AFAIK it - throws an exception upon creation if the encoding itself is not supported - IOException if the writing fails Can you point me to the method (and spec) that says the writer is checking whether a character is legal in the specified encoding.
- Notepad warns this mis-match encoding and suggests us to use the unicode. Also other commercial editors check this encoding error. - OutputStream<init>(OutputStream out, CharsetEncoder enc) accepts CharsetEncoder. According to CharsetEncoder, its doc says : How an encoding error is handled depends upon the action requested for that type of error, which is described by an instance of the CodingErrorAction class And CodingErrorAction#REPORT doc says : Action indicating that a coding error is to be reported, either by returning a CoderResult object or by throwing a CharacterCodingException, whichever is appropriate for the method implementing the coding process. I think they are good starting point for you to investigate this problem. .
Just FYI, I agree that saving text, and "losing data" due to charset mismatch, without warning user really is "losing data". We used to handle/gaurd against this in SSE layer ... but, since "the platform" now handles encoding/charset "the platform" is the right place to handle this. I'd suggest a user choice "cancel save" or "save anyway" sort of choice. (And, of course, its really that "model layer" that needs to throw an exception, and let the UI layer decide how to handle it).
(In reply to comment #5) > I'd suggest a user choice > "cancel save" or "save anyway" sort of choice. I think this choice is enough for users to see there are some encoding error so that this 'save' may cause some data-loss error. .
*** Bug 128026 has been marked as a duplicate of this bug. ***
This is now correctly handled by file buffers and document providers: they throw a CoreException with a custom status code if the mapping fails. - for the file buffers: IFileBufferStatusCodes.CHARSET_MAPPING_FAILED - for document providers: EditorsUI.CHARSET_MAPPING_FAILED The editor reports this to the user and suggests to either change the encoding or the offending characters. Fixed in HEAD. Available in builds > N20060224-0010.
Out of curiosity, do you need to define the status constant as API in two places? Presumably any client of EditorsUI will also have the filebuffers plugin available?
John, as you can see from IStatus.getCode(): /** * Returns the plug-in-specific status code describing the outcome. * * @return plug-in-specific status code */ public int getCode(); the value space for the code is per plug-in. If we used the same value in file buffers and Editors UI we'd have to make sure the codes don't overlap and that's a burden we don't want to take.
Ok, makes sense. CCing Mcq for PMC approval of this new API. The new API is two status code constants (ints). The constants allow a client to distinguish a generic failure from failure to save due to presence of characters that can not be saved in the chosen encoding.
>Ok, makes sense. CCing Mcq for PMC approval of this new API. Note that one constant is added to a final class and the other constant is added to a constant interface which is marked as: * <p> * Clients are not supposed to implement that interface. * </p> and also correctly listed in the component.xml. That's why I did not raise this as being an API change.
Makes sense to me.
Verified using I20060328-0010