How nice to see a specification! Looks good. It
even has a feature - directory-level encoding default - that I haven't seen any
requests for but might be useful.
The following are comments I posted to bug 37933.
Since I'm not sure this is the right place, I've repeated them
here.
Re: Non-uniform file encodings in the Eclipse
Platform
Many worthwhile ideas here. Other
comments...
1. I assume in the "basic algorithm" steps are
performed in order listed. In that case, steps 2 and 3 must be interchanged. The
encoding interpreter must always be consulted first. Multiple encodings are
possible with the same BOM. The result of (current) step 3 should be final.
Otherwise, the BOM test should be ignored unless it is inconsistent with the
result of step 4 or 5.
2. Encoding must be determined upon save as well as
open. This determination may require calling an output encoding interpreter,
which you do not have in your scheme. (Use case: User has an <?xml encoding
declaration in an XML file and changes text of the encoding attribute.) The
editor should not be required to track these changes character-by-character and
blast off encoding change notifications. In fact, the editor may not be aware of
encoding at all. (Use case: Rick Jellife has proposed an encoding declaration
that would appear in comments at the beginning of a file.) Instead, an output
encoding interpreter should be called at save time. IOW, the "basic algorithm"
should be applied at save time, too, using an encoding interpreter that operates
on the Unicode text instead of a byte stream.
3. In light of the above, notifying of encoding
changes seems of limited value, since may be re-determined at open/save time.
Encoding should be discovered when it is needed. Notification may be
counter-productive, leading editors to take actions they should not be taking,
like calling setCharset().
4. setEncoding() should be removed and the basic
algorithm should be the description of how getEncoding() works. setEncoding() is
a potential source of problems. For example, if setEncoding() is called on an
open resource and the resource is then saved and closed, the resource cannot be
re-opened successfully unless the encoding set is remembered. This makes it a
resource property, but there is already a resource property that may contain an
encoding and the two may be in conflict. What is a valid use of
setEncoding()?
5. It should be possible for an editor to have
associated encoding interpreter(s), so that the user is not forced to set the
encoding interpreter and the editor separately. It is highly likely that the
user will not be aware of the encoding interpreter feature and will not
correctly set it in advance of having encoding problems. In fact, users seem to
have problems learning how to set editors associated with extensions, and they
already know what an editor is. Likewise, editors should not have to establish
their own encoding interpreters programmatically.
6. What is the use/purpose of isDefaultEncoding()?
There may be several "defaults". If anyone cares that a resource is not using
the workspace-level encoding, they should stop caring.
7. Workspace-level, resource-level and
interpreter-determined are requirements, but I am not convinced there are use
cases to support directory-level encoding, and they do add overhead. If the
feature exists, someone may find it useful, if that's the
threshhold.
Bob
----- Original Message -----
Sent: Tuesday, June 10, 2003 1:49
PM
Subject: [eclipse-dev] request for
comments - improved file encoding support
Hi all,
Platform/Core has started working on improving file
encoding support (plan item - bug 37933). The goal is to allow clients to find
out which specific encoding should be used when reading the contents of a file
using a text stream.
The initial
proposal is under the Platform/Core web area (Core Component Planning ->
Commited Items). Here is a direct link (may be split in two lines):
http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/plan_encoding_intro.html
This work would involve changing all
clients of IFile.get/setContents that use text streams (Java Core/UI,
Search, Platform Text, Compare, ...) to, instead of using the workspace
default encoding, use the resource specific encoding, and to react to resource
change notifications regarding changes of encoding. We would also need UI
(Platform/JDT) for (re)setting/browsing encodings on resources. Also, the
current mechanism in Platform Text for setting encodings would have to be
retrofiited to work with the new support from Core.
We need feedback from the affected teams on whether
this proposal makes sense for their needs and their willingness to
adopt/expose the new functionality.
Thanks,
Rafael
|