Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [eclipse-dev] request for comments - improved file encoding support


Both the PR (37933) and the platform-core-dev list are good places to keep this discussion. Thanks for the comments.

Rafael



"Bob Foster" <bob@xxxxxxxxxxxx>
Sent by: eclipse-dev-admin@xxxxxxxxxxx

12/06/2003 05:03 AM
Please respond to eclipse-dev

       
        To:        <eclipse-dev@xxxxxxxxxxx>
        cc:        
        Subject:        Re: [eclipse-dev] request for comments - improved file encoding support



How nice to see a specification! Looks good. It even has a feature - directory-level encoding default - that I haven't seen any requests for but might be useful.
 
The following are comments I posted to bug 37933. Since I'm not sure this is the right place, I've repeated them here.
 
Re: Non-uniform file encodings in the Eclipse Platform
 
Many worthwhile ideas here. Other comments...
 
1. I assume in the "basic algorithm" steps are performed in order listed. In that case, steps 2 and 3 must be interchanged. The encoding interpreter must always be consulted first. Multiple encodings are possible with the same BOM. The result of (current) step 3 should be final. Otherwise, the BOM test should be ignored unless it is inconsistent with the result of step 4 or 5.
 
2. Encoding must be determined upon save as well as open. This determination may require calling an output encoding interpreter, which you do not have in your scheme. (Use case: User has an <?xml encoding declaration in an XML file and changes text of the encoding attribute.) The editor should not be required to track these changes character-by-character and blast off encoding change notifications. In fact, the editor may not be aware of encoding at all. (Use case: Rick Jellife has proposed an encoding declaration that would appear in comments at the beginning of a file.) Instead, an output encoding interpreter should be called at save time. IOW, the "basic algorithm" should be applied at save time, too, using an encoding interpreter that operates on the Unicode text instead of a byte stream.
 
3. In light of the above, notifying of encoding changes seems of limited value, since may be re-determined at open/save time. Encoding should be discovered when it is needed. Notification may be counter-productive, leading editors to take actions they should not be taking, like calling setCharset().
 
4. setEncoding() should be removed and the basic algorithm should be the description of how getEncoding() works. setEncoding() is a potential source of problems. For example, if setEncoding() is called on an open resource and the resource is then saved and closed, the resource cannot be re-opened successfully unless the encoding set is remembered. This makes it a resource property, but there is already a resource property that may contain an encoding and the two may be in conflict. What is a valid use of setEncoding()?
 
5. It should be possible for an editor to have associated encoding interpreter(s), so that the user is not forced to set the encoding interpreter and the editor separately. It is highly likely that the user will not be aware of the encoding interpreter feature and will not correctly set it in advance of having encoding problems. In fact, users seem to have problems learning how to set editors associated with extensions, and they already know what an editor is. Likewise, editors should not have to establish their own encoding interpreters programmatically.
 
6. What is the use/purpose of isDefaultEncoding()? There may be several "defaults". If anyone cares that a resource is not using the workspace-level encoding, they should stop caring.
 
7. Workspace-level, resource-level and interpreter-determined are requirements, but I am not convinced there are use cases to support directory-level encoding, and they do add overhead. If the feature exists, someone may find it useful, if that's the threshhold.
 
Bob
----- Original Message -----
From: Rafael Chaves
To: eclipse-dev@xxxxxxxxxxx
Sent: Tuesday, June 10, 2003 1:49 PM
Subject: [eclipse-dev] request for comments - improved file encoding support


Hi all,


Platform/Core has started working on improving file encoding support (plan item - bug 37933). The goal is to allow clients to find out which specific encoding should be used when reading the contents of a file using a text stream.


The initial proposal is under the Platform/Core web area (Core Component Planning -> Commited Items). Here is a direct link (may be split in two lines):


http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/plan_encoding_intro.html


This work would involve changing all clients of IFile.get/setContents that use text streams  (Java Core/UI, Search, Platform Text, Compare, ...) to, instead of using the workspace default encoding, use the resource specific encoding, and to react to resource change notifications regarding changes of encoding. We would also need UI (Platform/JDT) for (re)setting/browsing encodings on resources. Also, the current mechanism in Platform Text for setting encodings would have to be retrofiited to work with the new support from Core.


We need feedback from the affected teams on whether this proposal makes sense for their needs and their willingness to adopt/expose the new functionality.


Thanks,


Rafael


Back to the top