Bug 144422 - [encoding] Save problems dialog not useful when problems with encodings are encountered
Summary: [encoding] Save problems dialog not useful when problems with encodings are e...
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.2   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 3.6 M3   Edit
Assignee: Dani Megert CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 193769 217560 276987 284069 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-05-30 08:08 EDT by Pascal Rapicault CLA
Modified: 2010-02-08 05:51 EST (History)
11 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pascal Rapicault CLA 2006-05-30 08:08:57 EDT
Eclipse 3.2
I had a file whose encoding was set to ISO-8859-1 in which I pasted some text from word that contained double quotes like those: ”.
When I tried to save the file I got a dialog saying that the file could not be saved because it was containing invalid characters. However it did not tell me what and where were those characters.
Comment 1 Dani Megert CLA 2006-05-30 08:24:30 EDT
Simply compare it the previous element in the local history. There are currently no plans to provide more support for this.
Comment 2 Adam Kiezun CLA 2006-10-04 12:48:45 EDT
This is a big pain. The dialog is not useful - it provides absolutely no insight as to how I can fix the problem. Can you hightlight the offending characters? 

Or provide a button: "Remove characters from different encoding" ? This would solve the problem in those cases when the text looks just fine but there's a character somewhere that eclipse chokes on (and it will not tell you where it is!)

The current state is very bad - I had to use another editor to paste the text into my file. The other editor had not complaint whatsoever.
Comment 3 Adam Kiezun CLA 2006-10-04 13:26:26 EDT
I just checked. The other editor (emacs) converted the offending characters to unicode symbols like so \u219c etc. 

I think it'd be much more appealing solution to have a warning button that said something in the spirit of "All characters from different encoding will be converted to unicode symbols" than to have the editor refuse to save my file at all.
Comment 4 Dani Megert CLA 2007-06-22 09:59:07 EDT
Get rid of deprecated state.
Comment 5 Adam Kiezun CLA 2007-09-17 13:43:01 EDT
any plans to fix this? I still have to use other editors than eclipse to simply paste text into files.
Comment 6 Dani Megert CLA 2007-09-18 02:53:44 EDT
No plans. Feel free to provide a patch.
Comment 7 Dani Megert CLA 2008-02-04 07:08:47 EST
*** Bug 217560 has been marked as a duplicate of this bug. ***
Comment 8 Markus Keller CLA 2008-11-14 12:11:08 EST
The dialog is also not helpful if you want to *keep* the unsaveable characters by changing the file's encoding, because
- Edit > Set Encoding... is disabled, and
- after setting the encoding in Properties > Resource on the file, Save still complains that the editor content is not valid w.r.t. the old encoding.

The only way I found to hammer the content into the file was to select all, cut, save, change the encoding, paste the content back, save.
Comment 9 Dani Megert CLA 2009-05-20 02:43:18 EDT
*** Bug 276987 has been marked as a duplicate of this bug. ***
Comment 10 Markus Keller CLA 2009-05-20 05:29:04 EDT
I addition to offering a button that just replaces bad characters with e.g. '?', we could also add a button "Show in Compare Editor", which opens a compare editor with the original content on one side and the proposed simplifications on the other side. That would allow the user to verify every single change and accept it or fix it manually. Saving the compare editor should reconcile the compare viewer and leave it unsaved (maybe with another error dialog), such that the user can fix the remaining issues.
Comment 11 Dani Megert CLA 2009-05-20 05:33:45 EDT
Yep, bug 261716 discusses to use compare for another feature. We "only" have to solve the chicken and egg problem: currently compare depends on text.
Comment 12 Dani Megert CLA 2009-07-16 02:18:59 EDT
*** Bug 193769 has been marked as a duplicate of this bug. ***
Comment 13 Markus Keller CLA 2009-07-28 14:04:02 EDT
*** Bug 284069 has been marked as a duplicate of this bug. ***
Comment 14 Dani Megert CLA 2009-08-07 03:43:41 EDT
*** Bug 285922 has been marked as a duplicate of this bug. ***
Comment 15 Markus Keller CLA 2009-08-07 06:34:14 EDT
The compare editor proposed in comment 10 is probably overkill.

An easier solution would be a dialog that:

1. solves comment 8, i.e. allows me to save the file with a different encoding (change encoding of the file or encoding of the whole project)

2. allows me to select the first offending character in the file, so that I can just go back to the editor and fix it in place. Would also be good if the dialog could tell me to total count of problematic characters, to help me decide how to best fix the problem.
Comment 16 Martin Oberhuber CLA 2009-08-07 07:26:20 EDT
Changing encoding of the entire project doesn't make sense, unless all existing files in the project get transcoded, or there is risk of data loss. A "transcode project" wizard would be nice, but I think that's a separate problem.

I also think that most users don't know or understand encodings. Rather than offering any other encoding to choose, it may make sense for the dialog to give a fixed button "Save as UTF-8" since that's known to be a safe choice.

But I think that before saving in a different encoding, most users will want to review what's going wrong. What about this idea that might solve the compare viewer dependency issues:

When "Save" runs into an encoding problem, then...
1.) Editor contents is copied into a temporary buffer
2.) A special "Save" operation replaces all offending characters with a "?"
    or "\u0123" so current encoding is not violated in the file on disk
3.) Temporary buffer is copied back into the editor (becomes dirty)
4.) Dialog is opened: "Some characters could not be saved in the current 
    encoding. Do you want to (a) Review changes, (b) Save as UTF-8"
5.) Compare editor against saved version (from local history) is opened -- 
    since this is existing functionality, it should be possible by sending 
    a Command so the dependency problem should be solved

Users can now review / edit changes. When they just click "save" again and not all issues are resolved yet, they can now save as UTF-8 which is guaranteed not to lose any data. Once they have successfully saved as UTF-8, they should be able to Edit > Set Encoding... if they want something other than UTF-8 (this should be transcoding the file).

Am I missing anything?
Comment 17 Dani Megert CLA 2009-08-07 09:23:02 EDT
>Changing encoding of the entire project doesn't make sense, unless all existing
>files in the project get transcoded, or there is risk of data loss. 
There's no immediate data loss when changing the encoding property on the file/project but a file might no longer be correctly read afterwards and when then saved it might cause damage.

>I also think that most users don't know or understand encodings.
Exactly and hence they don't run into this issue too often and therefore writing too much code/feature work around this is overkill. What we need is
1. a way to go to the problematic characters
2. allow to save the file in a different encoding (UTF-8 being the suggested 
   default
Comment 18 Michael Schierl CLA 2009-08-07 13:19:25 EDT
Bug 285922 was marked a "duplicate" of this bug, but please keep in mind that that bug is not about incorrect pasted text but about files that once decoded cannot be encoded again (because different developers used different default encodings but shared the same files via source control).

That means it can happen that you open a file, add a space, cannot save again, because somewhere else there is an offending comment...

A "save as UTF-8" feature is fine (at least it will let me save it; preferrable with an attached UTF-8 BOM so that the file will in all cases be read as UTF-8 later), but a message that tells me when I open the file that it cannot be saved again in this encoding because the content is invalid would still be nice ;-)

Save as \u1234 is fine for Java files, but might be suboptimal for HTML files...
Comment 19 Dani Megert CLA 2009-08-10 03:05:49 EDT
>but a message that tells me when I open the file that it cannot be
>saved again in this encoding
That's bug 145754.
Comment 20 Dani Megert CLA 2009-10-19 12:07:38 EDT
Fixed in HEAD.
Available in builds > N20091018-2000.
Comment 21 Dani Megert CLA 2009-10-19 12:15:31 EDT
.
Comment 22 Deepak Azad CLA 2009-10-27 05:19:53 EDT
Verified for 3.6 M3 with I20091026-1442
Comment 23 Renato Silva CLA 2010-02-04 14:40:19 EST
It has been mentioned in bug 261716 #c23 that a fix for that bug may allow a better fix for this one. I think the idea is to allow compare plugin to contribute a 'compareOpener' to text, so that text can use it in situations like enconding problems (this bug) or out-of-syncs (that bug).

However, I have a suggestion: how about open a search result highlighting the offending characters? This way the user can fix the wrong chars in the editor itself without opening a new view for that. 

I think this can be more natural. Eclipse says there are offending chars, you ask what chars, and Eclipse simply highlights them for you. Then you can delete or replace them in the text editor itself, or you can even use the opened search view to make a batch replace.
Comment 24 Dani Megert CLA 2010-02-05 02:29:57 EST
Search would be another approach. The problem there is that you don't see the diff with what's currently on disk. We could even combine the two.
Comment 25 Renato Silva CLA 2010-02-05 08:06:30 EST
(In reply to comment #24)
> Search would be another approach. The problem there is that you don't see the
> diff with what's currently on disk. We could even combine the two.

The whole point is that there's nothing to compare to. You just want to highlight the offending chars, your changes may include a considerable amount of valid chars which would be obfuscating those invalid, so you still would feel lost searching for them.

For example, imagine you're refactoring a class, and you change many lines, but you accidentally insert an invalid char. If you guys use compare, the user will see the diff between the original file and all the refactoring which was done without saving, and will have to find the char from this diff. However, doesn't it make much more sense to not search for the char at all? That is, you just tell exactly what chars are these.

Note: I'm not sure though, if one could open a search view for unsaved files. However as I have explained, I think a search highlight has much more sense.
Comment 26 Rustam Abdullaev CLA 2010-02-08 04:33:07 EST
I can see the status of this is "Fixed", but what exactly was the fix?
Comment 27 Dani Megert CLA 2010-02-08 05:51:42 EST
>I can see the status of this is "Fixed", but what exactly was the fix?
The dialog now
- provides a way to go to the problematic characters
- allows to save the file in a different encoding (UTF-8 being the suggested 
  default