Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[platform-core-dev] Re: [eclipse-dev] request for comments - improved file encoding support


Bob, I was not sure you were in the platform-core-dev list, so I am CC'ing you.

1. I will re-word the steps to make it look more algorithmic.  I agree: an encoding interpreter (if any) should run before we try to guess the encoding using the BOM test.

2. I am not sure we need this... in the current proposal, the use case you described is supported - the user can force the encoding (see item 4) to be the one he/she wants, and then save.

3. Encoding change notification was proposed to:
- allow editors to re-load contents with the new encoding if they want/can (probably not if the user already changed the contents);
- allow builders to run again considering the new encoding if they want/can.
This way, if the user gets a source file encoded with a completely incompatible encoding, the compilation of that source file will fail. The user may open the source and find out the reason. Setting the file encoding (e.g. in the resource navigator orpackage explorer) would fix the editor contents and the compiler errors.

4. The algorithm *is* the description of how getCharset() (not getEncoding(), which will be deprecated) works. setCharset() (not setEncoding())  is intended to support the above use case. Users should be able to fix a file encoding if it is not right for them (and not all text files have embedded encoding descriptions or BOMs). Regarding your example of changing the encoding of a file while it is open (dirty) in an editor: the editor could offer to reload the contents (losing current state), or ignore the change. When saving, the file encoding *should be queried again* and the file contents would be encoded using the new encoding.

5. I do not think there will be lots of different encoding interpreters. The Resources plug-in would provide standard encoding interpreters for XML, HTML and other (?) popular files that have embedded encoding configuration. Tools (not only editors) that support other less known fomats with embedded encoding configuration may provide their own encoding interpreters as extensions to a new Resources plug-in extension-point. Also, tools may want to associate a specific file extension to an existing "official" encoding interpreter. User intervention would only be needed if: a) there is more than one encoding interpreter for a given file format and the wrong one is being used, or b) there is a specific file extension that could use an existing encoding interpreter but no tool associated it.

6. isDefaultEncoding() is intended to allow the user to find out whether a file encoding is forced or is default (guessed/inherited).

7. Project-level (because teams share projects, not workspaces) and file-level encoding were originally requested. But we believe adding directory-level encoding makes sense for uniformity's sake, and have good use cases as well (e.g. you may have two source folders in the same project using different encodings).

Thanks for all suggestions/corrections/use cases. It helps a lot.

Rafael



"Bob Foster" <bob@xxxxxxxxxxxx>
Sent by: eclipse-dev-admin@xxxxxxxxxxx

12/06/2003 05:03 AM
Please respond to eclipse-dev

       
        To:        <eclipse-dev@xxxxxxxxxxx>
        cc:        
        Subject:        Re: [eclipse-dev] request for comments - improved file encoding support



How nice to see a specification! Looks good. It even has a feature - directory-level encoding default - that I haven't seen any requests for but might be useful.
 
The following are comments I posted to bug 37933. Since I'm not sure this is the right place, I've repeated them here.
 
Re: Non-uniform file encodings in the Eclipse Platform
 
Many worthwhile ideas here. Other comments...
 
1. I assume in the "basic algorithm" steps are performed in order listed. In that case, steps 2 and 3 must be interchanged. The encoding interpreter must always be consulted first. Multiple encodings are possible with the same BOM. The result of (current) step 3 should be final. Otherwise, the BOM test should be ignored unless it is inconsistent with the result of step 4 or 5.
 
2. Encoding must be determined upon save as well as open. This determination may require calling an output encoding interpreter, which you do not have in your scheme. (Use case: User has an <?xml encoding declaration in an XML file and changes text of the encoding attribute.) The editor should not be required to track these changes character-by-character and blast off encoding change notifications. In fact, the editor may not be aware of encoding at all. (Use case: Rick Jellife has proposed an encoding declaration that would appear in comments at the beginning of a file.) Instead, an output encoding interpreter should be called at save time. IOW, the "basic algorithm" should be applied at save time, too, using an encoding interpreter that operates on the Unicode text instead of a byte stream.
 
3. In light of the above, notifying of encoding changes seems of limited value, since may be re-determined at open/save time. Encoding should be discovered when it is needed. Notification may be counter-productive, leading editors to take actions they should not be taking, like calling setCharset().
 
4. setEncoding() should be removed and the basic algorithm should be the description of how getEncoding() works. setEncoding() is a potential source of problems. For example, if setEncoding() is called on an open resource and the resource is then saved and closed, the resource cannot be re-opened successfully unless the encoding set is remembered. This makes it a resource property, but there is already a resource property that may contain an encoding and the two may be in conflict. What is a valid use of setEncoding()?
 
5. It should be possible for an editor to have associated encoding interpreter(s), so that the user is not forced to set the encoding interpreter and the editor separately. It is highly likely that the user will not be aware of the encoding interpreter feature and will not correctly set it in advance of having encoding problems. In fact, users seem to have problems learning how to set editors associated with extensions, and they already know what an editor is. Likewise, editors should not have to establish their own encoding interpreters programmatically.
 
6. What is the use/purpose of isDefaultEncoding()? There may be several "defaults". If anyone cares that a resource is not using the workspace-level encoding, they should stop caring.
 
7. Workspace-level, resource-level and interpreter-determined are requirements, but I am not convinced there are use cases to support directory-level encoding, and they do add overhead. If the feature exists, someone may find it useful, if that's the threshhold.
 
Bob
----- Original Message -----
From: Rafael Chaves
To: eclipse-dev@xxxxxxxxxxx
Sent: Tuesday, June 10, 2003 1:49 PM
Subject: [eclipse-dev] request for comments - improved file encoding support


Hi all,


Platform/Core has started working on improving file encoding support (plan item - bug 37933). The goal is to allow clients to find out which specific encoding should be used when reading the contents of a file using a text stream.


The initial proposal is under the Platform/Core web area (Core Component Planning -> Commited Items). Here is a direct link (may be split in two lines):


http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/plan_encoding_intro.html


This work would involve changing all clients of IFile.get/setContents that use text streams  (Java Core/UI, Search, Platform Text, Compare, ...) to, instead of using the workspace default encoding, use the resource specific encoding, and to react to resource change notifications regarding changes of encoding. We would also need UI (Platform/JDT) for (re)setting/browsing encodings on resources. Also, the current mechanism in Platform Text for setting encodings would have to be retrofiited to work with the new support from Core.


We need feedback from the affected teams on whether this proposal makes sense for their needs and their willingness to adopt/expose the new functionality.


Thanks,


Rafael


Back to the top