Community
Participate
Working Groups
RC1 build I20030307 The list of encodings in the Workbench / Editors pref page is: Cp1252 ISO-8859-1 US-ASCII UTF-16 UTF-16BE UTF-16LE UTF-8 The Edit/Encoding menu has: Cp1252 (Default) ASCII Latin 1 UTF-8 UTF-16 (big endian) UTF-16 (little endian) UTF-16 The pref uses the machine-readable encoding names. It would be better to use human-readable names like the Edit/Encoding menu Cp1252 is the default here, and may differ in other locales. The combo in the pref does not indicate (Default) like the Edit menu does. We should also add more default encodings.
Defer to 2.2.
Should contribute the available encodings (and their human-readable names) via XML. That way, translation packs could add extra entries. UI and Text would then always be consistent.
Kai, where do you get your list of encodings from? Tod has anyone complained about this?
Yes - there is a lot of buzz around encodings. Andre has restored some of this of late so we should recheck.
There really should be an extension point for the supported encodings, used in both the Editors pref page and the text editors' encoding menu. Should consider this for post-3.0.
Recheck and then mark later
These are the same now except that the editor uses Latin-1 as the title for 8859-1.
Reopening to address the extension point and naming consistency problem for post-3.0.
Reassigning to Text component owner since they have owned the encoding problem of late. Will be happy to discuss a solution.
Someone needs to define the default set of charsetNames. The NLSed display string can then be obtained via Charset.forName(String).displayName(). Since a plug-in writer can create a text editor without using our Platform/Text framework this list should be provided by Platform/UI component as did the encoding preference UI in 2.1. Platform/Text simply copied the list because there was no API.
We should add this API in 3.1. The list is the same currently(except for the label in the text editor list).
Marking LATER as this is an API request
Dani is referring to the Charset type in java.nio.charset (new in 1.4), not CharSet in java.text.
Reopening now that 3.0 has shipped
Dani is the workbench low enough for you - i.e. do you need this list in jface.text? If not we will have to put it in Core.
It's OK to have it an UI layer since this is really just an incomplete list of most important encodings to be presented to the user and it makes no sense to have it in a non-UI layer since it is not the complete list of valid encodings. There should be an extension point which enables clients to add encodings to that list.
I'm not really sure why we need an extension point for contributing charset names: - there seems to be API for getting the complete list of charset names of a Java implementation: java.nio.charset.availableCharsets() and as Dani has pointed out Charset.forName(String).displayName() would return the UI name. - it only makes sense to contribute more charset names, if it is also possible to contribute the implementation of a charset too. Without this we would always get UnsupportedEncodingExceptions. Or am I missing something?
The list you get is too long in my opinion. As for the extension point: assume there's a plug-in for some programming language or tool or editor that needs/uses one or several specific encodings heavily (e.g. the encoding specified by Java for *.properties files) but they are not in our list. The extension point enables them to add those encodings.
If the encoding is not in the list returned by availableCharsets, then we cannot use it.
I would agree the list from nio.charsets is *way* too long ... several hundred for some VM's!. Though an extension seems like overkill. Would it help to have a list of "common charset names"? I'll attach a list we use, as a property file. Seems this list covers 99.9% of needs. (No one's complained). Plus, I've found VM's don't usual provide translated versions (though its spec'd that way) so we allow translations of property file.
Created attachment 13287 [details] list of commonly used charset names
Here is a use case Andre. Some users in the Far East frequently run with several code pages - usually a simplified and complex form of thier spoken language plus a European one (usually English) as they may have code, a document written by someone else etc. all within thier workbench. Defining a fragment that adds a couple of popular code pages to the list for a particular locale would be very useful to them - especially as many code pages are just a number and don't say what characters they are for. Adding on to that a meaningful label for code pages that are just a number would be pretty useful too. 8859-1 is much less meaningful to most people than Latin. Thanks for your input everyone. I am leaning towards Danis suggestion myself.
I am going to add this to the Workbench as this seems to be the consensus.
It is actually going to go in IDE as the Core encoding support is in core.resources.
Released to HEAD. I have created both and WorkbenchEncoding and IDEEncoding class for the encodoing support at both levels.
I looked at this today and to me it looks as if the list and display strings are hard-coded and it's not possible to supply a different list via extension-point e.g. when installing Eclipse in China or Switzerland. Is the intended way to configure this by overriding WorkbenchEncoding and IDEEncoding via fragment or did I miss something?
We have not added any support for adding via extension point - the only locale specific ones you will get is for your current encoding setting - basically all we have right now is the 3.0 support in API. If you think we need the extension point as well then please log a pr to that effect.
And how is it NLSed? There are already bugs targeting in that direction (e.g. bug 21195)
Marking verified as this now has an extension point in M3