Community
Participate
Working Groups
Created attachment 123381 [details] Sample dictionary project, using UTF-8 encoded dictionary. Build ID: M20080911-1700 Steps To Reproduce: 1. Run eclipse with attached cut-down portugueuse dictionary, encoded using UTF-8. It contributes a dictionary fragment to the jdt spell checker. The key thing to note is that the dictionary contains non-ascii characters. 2. Everything's fine if your spelling/platform encoding is UTF-8; note "abóbada" is in the dictionary (add to javadoc say). 3. However, run with -Dfile.encoding=Cp1250 and it doesn't look so rosy, its decoded the dictionary using the wrong encoding. It suggests chaging "abóbada" to "abĂłbada" presumably because that's what the Cp1250 mis-decoding results in. More information: I think the dictionary encoding preference only really makes sense for user dictionaries as they stand. In reality any dictionary resource has some fixed character encoding (in this case UTF-8) and you're not going to get very far using the wrong decoder. Perhaps the dictionary should include the encoding in its locale as in pt_PT.UTF-8.dictionary or some such? Linux appears to do something along these lines. With a full-size dictionary that I'm not including the decoder eventually fell over (it got somewhere in the o's):- CoderResult.throwException() line: 261 [local variables unavailable] StreamDecoder.implRead(char[], int, int) line: 319 StreamDecoder.read(char[], int, int) line: 158 InputStreamReader.read(char[], int, int) line: 167 BufferedReader.fill() line: 136 BufferedReader.readLine(boolean) line: 299 BufferedReader.readLine() line: 362 LocaleSensitiveSpellDictionary(AbstractSpellDictionary).load(URL) line: 500 I have absolutely no understanding of Portuguese so apologies if these random words happen to be offensive or some such :)
Looks like the "abĂłbada" got mangled by bugzilla I'm afraid.
We currently only support that all dictionaries have the same encoding and that this encoding is given on the 'Spelling' preference page. Hence what you describe is not a bug but rather a user error. I see your point though. I'm turning this bug into an enhancement to add the encoding information to the dictionary.
Thanks for looking into this :)