Bug 34421 - [Encodings] Encodings inconsistent between pref and editor
Summary: [Encodings] Encodings inconsistent between pref and editor
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 2.1   Edit
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: 3.1   Edit
Assignee: Tod Creasey CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 22016
  Show dependency tree
 
Reported: 2003-03-10 14:44 EST by Nick Edgar CLA
Modified: 2022-01-28 10:49 EST (History)
8 users (show)

See Also:


Attachments
list of commonly used charset names (1.93 KB, text/plain)
2004-07-15 06:08 EDT, David Williams CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Edgar CLA 2003-03-10 14:44:10 EST
RC1 build I20030307

The list of encodings in the Workbench / Editors pref page is:
Cp1252 
ISO-8859-1
US-ASCII
UTF-16
UTF-16BE
UTF-16LE
UTF-8

The Edit/Encoding menu has:
Cp1252 (Default)
ASCII
Latin 1
UTF-8
UTF-16 (big endian)
UTF-16 (little endian)
UTF-16

The pref uses the machine-readable encoding names.  It would be better to use 
human-readable names like the Edit/Encoding menu

Cp1252 is the default here, and may differ in other locales.
The combo in the pref does not indicate (Default) like the Edit menu does.

We should also add more default encodings.
Comment 1 Nick Edgar CLA 2003-03-10 14:44:32 EST
Defer to 2.2.
Comment 2 Nick Edgar CLA 2003-03-10 14:47:24 EST
Should contribute the available encodings (and their human-readable names) via 
XML.  That way, translation packs could add extra entries.
UI and Text would then always be consistent.
Comment 3 Michael Van Meekeren CLA 2004-05-25 11:53:50 EDT
Kai, where do you get your list of encodings from?
Tod has anyone complained about this?
Comment 4 Tod Creasey CLA 2004-05-25 12:19:12 EDT
Yes - there is a lot of buzz around encodings. Andre has restored some of this 
of late so we should recheck.
Comment 5 Nick Edgar CLA 2004-05-26 14:20:19 EDT
There really should be an extension point for the supported encodings, used in
both the Editors pref page and the text editors' encoding menu.
Should consider this for post-3.0.

Comment 6 Tod Creasey CLA 2004-05-27 08:26:14 EDT
Recheck and then mark later
Comment 7 Tod Creasey CLA 2004-05-27 13:34:21 EDT
These are the same now except that the editor uses Latin-1 as the title for 
8859-1.
Comment 8 Nick Edgar CLA 2004-05-27 16:41:39 EDT
Reopening to address the extension point and naming consistency problem for
post-3.0.
Comment 9 Nick Edgar CLA 2004-05-27 16:42:29 EDT
Reassigning to Text component owner since they have owned the encoding problem
of late.  Will be happy to discuss a solution.
Comment 10 Dani Megert CLA 2004-05-28 06:31:48 EDT
Someone needs to define the default set of charsetNames. The NLSed display
string can then be obtained via Charset.forName(String).displayName().

Since a plug-in writer can create a text editor without using our Platform/Text
framework this list should be provided by Platform/UI component as did the
encoding preference UI in 2.1. Platform/Text simply copied the list because
there was no API.
Comment 11 Tod Creasey CLA 2004-05-28 08:42:46 EDT
We should add this API in 3.1. The list is the same currently(except for the 
label in the text editor list).
Comment 12 Tod Creasey CLA 2004-05-28 08:43:11 EDT
Marking LATER as this is an API request
Comment 13 Nick Edgar CLA 2004-05-28 14:07:32 EDT
Dani is referring to the Charset type in java.nio.charset (new in 1.4), not
CharSet in java.text.
Comment 14 Tod Creasey CLA 2004-06-28 11:28:23 EDT
Reopening now that 3.0 has shipped
Comment 15 Tod Creasey CLA 2004-07-14 14:37:24 EDT
Dani is the workbench low enough for you - i.e. do you need this list in 
jface.text? If not we will have to put it in Core.
Comment 16 Dani Megert CLA 2004-07-15 04:33:40 EDT
It's OK to have it an UI layer since this is really just an incomplete list of
most important encodings to be presented to the user and it makes no sense to
have it in a non-UI layer since it is not the complete list of valid encodings.

There should be an extension point which enables clients to add encodings to
that list.
Comment 17 Andre Weinand CLA 2004-07-15 05:05:27 EDT
I'm not really sure why we need an extension point for contributing charset names:

- there seems to be API for getting the complete list of charset names of a Java 
   implementation: java.nio.charset.availableCharsets() and as Dani has pointed out
   Charset.forName(String).displayName() would return the UI name.

- it only makes sense to contribute more charset names, if it is also possible to contribute
  the implementation of a charset too. Without this we would always get
  UnsupportedEncodingExceptions.

Or am I missing something?
Comment 18 Dani Megert CLA 2004-07-15 05:23:52 EDT
The list you get is too long in my opinion. As for the extension point: assume
there's a plug-in for some programming language or tool or editor that
needs/uses one or several specific encodings heavily (e.g. the encoding
specified by Java for *.properties files) but they are not in our list. The
extension point enables them to add those encodings.
Comment 19 Andre Weinand CLA 2004-07-15 05:38:27 EDT
If the encoding is not in the list returned by availableCharsets, then we cannot use it.
Comment 20 David Williams CLA 2004-07-15 06:07:08 EDT
I would agree the list from nio.charsets is *way* too long ... several hundred 
for some VM's!. Though an extension seems like overkill. 

Would it help to have a list of "common charset names"? I'll attach a list we 
use, as a property file. Seems this list covers 99.9% of needs. (No one's 
complained). Plus, I've found VM's don't usual provide translated versions 
(though its spec'd that way) so we allow translations of property file. 
Comment 21 David Williams CLA 2004-07-15 06:08:32 EDT
Created attachment 13287 [details]
list of commonly used charset names
Comment 22 Tod Creasey CLA 2004-07-15 08:34:44 EDT
Here is a use case Andre.

Some users in the Far East frequently run with several code pages - usually a 
simplified and complex form of thier spoken language plus a European one 
(usually English) as they may have code, a document written by someone else 
etc. all within thier workbench.

Defining a fragment that adds a couple of popular code pages to the list for a 
particular locale would be very useful to them - especially as many code pages 
are just a number and don't say what characters they are for.

Adding on to that a meaningful label for code pages that are just a number 
would be pretty useful too. 8859-1 is much less meaningful to most people than 
Latin.

Thanks for your input everyone. I am leaning towards Danis suggestion myself.
Comment 23 Tod Creasey CLA 2004-07-21 13:59:44 EDT
I am going to add this to the Workbench as this seems to be the consensus.
Comment 24 Tod Creasey CLA 2004-07-22 16:17:13 EDT
It is actually going to go in IDE as the Core encoding support is in 
core.resources.
Comment 25 Tod Creasey CLA 2004-08-18 10:06:04 EDT
Released to HEAD. I have created both and WorkbenchEncoding and IDEEncoding 
class for the encodoing support at both levels.
Comment 26 Dani Megert CLA 2004-08-27 09:35:32 EDT
I looked at this today and to me it looks as if the list and display strings 
are hard-coded and it's not possible to supply a different list via
extension-point e.g. when installing Eclipse in China or Switzerland. Is the
intended way to configure this by overriding WorkbenchEncoding and IDEEncoding
via fragment or did I miss something?
Comment 27 Tod Creasey CLA 2004-08-27 09:51:06 EDT
We have not added any support for adding via extension point - the only locale 
specific ones you will get is for your current encoding setting - basically 
all we have right now is the 3.0 support in API.

If you think we need the extension point as well then please log a pr to that 
effect.
Comment 28 Dani Megert CLA 2004-08-27 10:22:54 EDT
And how is it NLSed?
There are already bugs targeting in that direction (e.g. bug 21195)
Comment 29 Tod Creasey CLA 2004-11-02 15:06:36 EST
Marking verified as this now has an extension point in M3