Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[platform-core-dev] Re: [eclipse-dev] request for comments - improved file encoding support


I am not a UI guy, but just so we can have the same perspective, this is how I would imagine the UI for encoding:

- "Edit->Encoding" would set the encoding of the current text file being edited (enabled only if the editor is not dirty). Today, this is implemented only in text editor world. It would be changed to call IResource#setCharset.
- a new "Encoding" action would be available in the context menus for the resource navigator and package explorer. This action could be applied to the selected resource(s) (project, folder, file) and their children. Applying this action to a file currently being edited would have exactly the same effect as the "Edit->Encoding" option.
- as has already been suggested in this list, we could have an "Encoding" setting in the "Save as" dialog. When saving, it would call #setCharset, and then #setContents.

But, again, this is not a plan for the UI. It might end up being something well different. For now, we are gathering requisites only at the Core level. But some one from the Platform/JDT UI teams may want to comment on this.

Regarding the scenario of changing the encoding setting embedded in an XML file: editors may want to query the current encoding again after setting the file contents. If it is different, the editor could ask if the user wants to reload the contents with the new encoding. So the user would have some alternatives: just change the XML tag, save, and confirm he/she wants the just discovered encoding (if the editor supports this), or (if the editor does not re-query the encoding after saving) change the XML tag, save, close and reopen the file.

>If setCharset() is called, the charset property for the resource should be changed in sync.
> If Eclipse writes a file, it should be able to read it correctly the next time it is opened
>without further user intervention.

Definitely, this is what we expect. For a file that had its charset explicitly specified with setCharset(), that charset will be returned by getCharset(). Otherwise, it will be based on contents, or parent's charset. In these cases, the value returned by getCharset() may vary if contents change (like changing the embedded encoding set for a XML doc), the file is moved to a different parent, or the parent has a different charset.

> If this is done, the property alone suffices; don't need yet
> another place to record the charset.

I didn't understand this statement. The charset info can be set in only one place: the IResource object. We will not store any charset info unless it has been explicitly set.

Rafael



"Bob Foster" <bob@xxxxxxxxxxxx>

12/06/2003 10:33 PM

       
        To:        <platform-core-dev@xxxxxxxxxxx>, Rafael Chaves/Ottawa/IBM@IBMCA
        cc:        
        Subject:        Re: [eclipse-dev] request for comments - improved file encoding support



Ok, I subscribed to platform-core-dev. Thanks for replying directly. I hope you don't get too many copies of this!
 
>2. I am not sure we need [output encoding interpreter]... in the current proposal, the use case you described is supported - the user can force the encoding (see item 4) to be the one he/she wants, and then save.
You absolutely do need it. The user should not need to specify the encoding for an XML document twice and getting the encoding in the document out of sync with the encoding on the file system is a recipe for disaster; this won't interoperate with anyone.
 
If setCharset() is called, the charset property for the resource should be changed in sync. If Eclipse writes a file, it should be able to read it correctly the next time it is opened without further user intervention. If this is done, the property alone suffices; don't need yet another place to record the charset.
 
I'm sorry, but you keep talking about user interface but you describe only API.  I can pretty much guarantee that no one else knows exactly what UI you intend. Please add the user interface, menu commands, property pages, preferences, etc., you are proposing to the document. The UI that makes sense to me is Open with encoding... and Save with encoding....
 
Bob
 
----- Original Message -----
From: Rafael Chaves
To: platform-core-dev@xxxxxxxxxxx
Cc: Bob Foster
Sent: Thursday, June 12, 2003 9:03 AM
Subject: Re: [eclipse-dev] request for comments - improved file encoding support


Bob, I was not sure you were in the platform-core-dev list, so I am CC'ing you.


1. I will re-word the steps to make it look more algorithmic.  I agree: an encoding interpreter (if any) should run before we try to guess the encoding using the BOM test.


2. I am not sure we need this... in the current proposal, the use case you described is supported - the user can force the encoding (see item 4) to be the one he/she wants, and then save.


3. Encoding change notification was proposed to:

- allow editors to re-load contents with the new encoding if they want/can (probably not if the user already changed the contents);

- allow builders to run again considering the new encoding if they want/can.

This way, if the user gets a source file encoded with a completely incompatible encoding, the compilation of that source file will fail. The user may open the source and find out the reason. Setting the file encoding (e.g. in the resource navigator orpackage explorer) would fix the editor contents and the compiler errors.


4. The algorithm *is* the description of how getCharset() (not getEncoding(), which will be deprecated) works. setCharset() (not setEncoding())  is intended to support the above use case. Users should be able to fix a file encoding if it is not right for them (and not all text files have embedded encoding descriptions or BOMs). Regarding your example of changing the encoding of a file while it is open (dirty) in an editor: the editor could offer to reload the contents (losing current state), or ignore the change. When saving, the file encoding *should be queried again* and the file contents would be encoded using the new encoding.


5. I do not think there will be lots of different encoding interpreters. The Resources plug-in would provide standard encoding interpreters for XML, HTML and other (?) popular files that have embedded encoding configuration. Tools (not only editors) that support other less known fomats with embedded encoding configuration may provide their own encoding interpreters as extensions to a new Resources plug-in extension-point. Also, tools may want to associate a specific file extension to an existing "official" encoding interpreter. User intervention would only be needed if: a) there is more than one encoding interpreter for a given file format and the wrong one is being used, or b) there is a specific file extension that could use an existing encoding interpreter but no tool associated it.


6. isDefaultEncoding() is intended to allow the user to find out whether a file encoding is forced or is default (guessed/inherited).


7. Project-level (because teams share projects, not workspaces) and file-level encoding were originally requested. But we believe adding directory-level encoding makes sense for uniformity's sake, and have good use cases as well (e.g. you may have two source folders in the same project using different encodings).


Thanks for all suggestions/corrections/use cases. It helps a lot.


Rafael



"Bob Foster" <bob@xxxxxxxxxxxx>
Sent by: eclipse-dev-admin@xxxxxxxxxxx

12/06/2003 05:03 AM
Please respond to eclipse-dev

       
       To:        <eclipse-dev@xxxxxxxxxxx>

       cc:        

       Subject:        Re: [eclipse-dev] request for comments - improved file encoding support




How nice to see a specification! Looks good. It even has a feature - directory-level encoding default - that I haven't seen any requests for but might be useful.

 

The following are comments I posted to bug 37933. Since I'm not sure this is the right place, I've repeated them here.

 

Re: Non-uniform file encodings in the Eclipse Platform

 

Many worthwhile ideas here. Other comments...

 

1. I assume in the "basic algorithm" steps are performed in order listed. In that case, steps 2 and 3 must be interchanged. The encoding interpreter must always be consulted first. Multiple encodings are possible with the same BOM. The result of (current) step 3 should be final. Otherwise, the BOM test should be ignored unless it is inconsistent with the result of step 4 or 5.

 

2. Encoding must be determined upon save as well as open. This determination may require calling an output encoding interpreter, which you do not have in your scheme. (Use case: User has an <?xml encoding declaration in an XML file and changes text of the encoding attribute.) The editor should not be required to track these changes character-by-character and blast off encoding change notifications. In fact, the editor may not be aware of encoding at all. (Use case: Rick Jellife has proposed an encoding declaration that would appear in comments at the beginning of a file.) Instead, an output encoding interpreter should be called at save time. IOW, the "basic algorithm" should be applied at save time, too, using an encoding interpreter that operates on the Unicode text instead of a byte stream.

 

3. In light of the above, notifying of encoding changes seems of limited value, since may be re-determined at open/save time. Encoding should be discovered when it is needed. Notification may be counter-productive, leading editors to take actions they should not be taking, like calling setCharset().

 

4. setEncoding() should be removed and the basic algorithm should be the description of how getEncoding() works. setEncoding() is a potential source of problems. For example, if setEncoding() is called on an open resource and the resource is then saved and closed, the resource cannot be re-opened successfully unless the encoding set is remembered. This makes it a resource property, but there is already a resource property that may contain an encoding and the two may be in conflict. What is a valid use of setEncoding()?

 

5. It should be possible for an editor to have associated encoding interpreter(s), so that the user is not forced to set the encoding interpreter and the editor separately. It is highly likely that the user will not be aware of the encoding interpreter feature and will not correctly set it in advance of having encoding problems. In fact, users seem to have problems learning how to set editors associated with extensions, and they already know what an editor is. Likewise, editors should not have to establish their own encoding interpreters programmatically.

 

6. What is the use/purpose of isDefaultEncoding()? There may be several "defaults". If anyone cares that a resource is not using the workspace-level encoding, they should stop caring.

 

7. Workspace-level, resource-level and interpreter-determined are requirements, but I am not convinced there are use cases to support directory-level encoding, and they do add overhead. If the feature exists, someone may find it useful, if that's the threshhold.

 

Bob

----- Original Message -----
From:
Rafael Chaves
To:
eclipse-dev@xxxxxxxxxxx
Sent:
Tuesday, June 10, 2003 1:49 PM
Subject:
[eclipse-dev] request for comments - improved file encoding support


Hi all,


Platform/Core has started working on improving file encoding support (plan item - bug 37933). The goal is to allow clients to find out which specific encoding should be used when reading the contents of a file using a text stream.


The initial proposal is under the Platform/Core web area (Core Component Planning -> Commited Items). Here is a direct link (may be split in two lines):


http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/plan_encoding_intro.html


This work would involve changing all clients of IFile.get/setContents that use text streams  (Java Core/UI, Search, Platform Text, Compare, ...) to, instead of using the workspace default encoding, use the resource specific encoding, and to react to resource change notifications regarding changes of encoding. We would also need UI (Platform/JDT) for (re)setting/browsing encodings on resources. Also, the current mechanism in Platform Text for setting encodings would have to be retrofiited to work with the new support from Core.

We need feedback from the affected teams on whether this proposal makes sense for their needs and their willingness to adopt/expose the new functionality.


Thanks,


Rafael


Back to the top