Bug 183880 - [spell checking] Javadoc keywords, Java static tokens, and dynamic types missing from dictionary
Summary: [spell checking] Javadoc keywords, Java static tokens, and dynamic types miss...
Status: CLOSED DUPLICATE of bug 68898
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.3   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: JDT-Text-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 236196 276847 (view as bug list)
Depends on:
Blocks: 265780
  Show dependency tree
 
Reported: 2007-04-24 18:52 EDT by Walter Harley CLA
Modified: 2011-05-24 02:24 EDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Walter Harley CLA 2007-04-24 18:52:59 EDT
As of I20070424, many tokens in Java source code are getting undeserved spell check errors.

To repro, edit code like the following:

  // place java.lang imports here

  /**
   * The {@code SpellChecker} class
   */
  public class SpellChecker {}

In this sample, the words "java", "lang", "@code" all fail spellcheck.  Also, if "ignore mixed-case words" is disabled, "SpellChecker" fails, even though the words it is a compound of are all valid words.

Expected: type names should be considered valid words; or if the intent is to spell-check typenames, at least camel case should be considered as separate words.  All javadoc tokens and all common static Java tokens such as javax, java, lang, util should be valid words.

I'm marking severity as "major" because I feel that without fixing this, the feature will be useless for the majority of users; I can't imagine someone leaving it enabled for very long unless legitimate code (like the above sample) can remain error-free.
Comment 1 Boris Bokowski CLA 2007-04-25 12:32:05 EDT
Where did you get the dictionary from?
Comment 2 Walter Harley CLA 2007-04-25 13:57:00 EDT
I didn't do anything to get the directory.  I downloaded I20070424 and began working in an existing workspace; spell checking was never turned on by default before, so I was surprised by it.  When I saw the problems, I created a new workspace and project in order to verify with the reported code.  Didn't change anything from the default except, as I was exploring the problem, the "ignore mixed-case words" setting.
Comment 3 Dani Megert CLA 2007-04-26 03:32:44 EDT
>spell checking was never turned on by default
>before, so I was surprised by it.
Did you at all know that it was there? If so, I assume you did not use the feature, right?

>{@code 
We missed the @code and @literal tags and I just fixed this. In the future we might offer an additional preference to ignore all words that start with '@' - renaming the summary to reflect this.

We could add a Java specific word list that has thing like 'java', 'lang' etc. in it but so far we (who use the spell checking now for years) did not need this. But it probably depends on the style of the written comments. Checking whether a word is a type or method is just too expensive.

The best thing you can do is to register your own user dictionary like you probably also did when using spell checking in MS Word or Mozilla and then add those words that you use in Javadoc and other comments but aren't in the dictionary. Note that as soon as you've set a user dictionary you get an "Add to dictionary" quick fix. You can then also share that dictionary with your team via CVS.
Comment 4 Dani Megert CLA 2007-04-26 04:10:43 EDT
See also bug 68898.
Comment 5 Walter Harley CLA 2007-04-26 12:53:02 EDT
(In reply to comment #3)
> Did you at all know that it was there? 

No, I didn't discover that until I was preparing to file the bug report.

> We could add a Java specific word list that has thing like 'java', 'lang' etc.
> in it but so far we (who use the spell checking now for years) did not need
> this. But it probably depends on the style of the written comments. Checking
> whether a word is a type or method is just too expensive.

At least within javadoc, I think you already have to perform that test, because words that are types or methods become hyperlinks.

But I don't accept the premise.  What you are saying is that treating Java comments as if they contained references to Java is too expensive.  If that is so, then I think what you are saying is that spell checking is too expensive.  I think it would be very surprising if Java source code comments did not contain many references to type names, methods, and Java constructs.


> The best thing you can do is to register your own user dictionary like you
> probably also did when using spell checking in MS Word or Mozilla and then add
> those words that you use in Javadoc and other comments but aren't in the
> dictionary. 

The difference is that Java, unlike English, is a dynamic language: it contains constructs explicitly intended to define new "words", and in normal use many such new "words" are defined every day.  A dictionary of English can be static, needing only minimal extension to support domain-specific terms.

Note that you do not necessarily need to do a context-sensitive typesystem check, to be a Java-sensitive spell checker.  In particular, scoping rules should *not* apply; I need to be able to write comments like "compare this with the way we treat the local variable 'foo' in the method OtherClass.doSomething()", even when OtherClass is not in scope.

At the very very least, the dictionary that ships with Eclipse should be extended to include common Java terminology.  I am just amazed that the word "java" gets flagged, in a Java editor.  (Actually it's odd anyway, since "java" is also an ordinary English word.)  And I'd also put in a vote for "foo".
Comment 6 Dani Megert CLA 2007-04-26 13:01:52 EDT
>No, I didn't discover that until I was preparing to file the bug report.
That happened to many users when we showed them that it's there. That's why we enable it per default with a quick assist to disable it again should you not like it.

I'll see what I can do with 'java' and 'foo'.

We normally use @link and @linkplain in Javadoc that's why our user.dict is pretty small.
Comment 7 Dani Megert CLA 2007-04-26 13:08:44 EDT
Sorry last comment re: @link is not true. Mostly our classes are ignored because camel case names are ignored.
Comment 8 Walter Harley CLA 2007-04-26 17:07:19 EDT
Additional "words" to consider include common abbreviations for days and months.  For instance, I notice that if I view the .settings/org.eclipse.jdt.core.prefs file, it has a comment header like the following:

#Thu Apr 26 13:45:33 PDT 2007

In this text (automatically generated by Eclipse), "Thu" and "Apr" get squiggled by the spell checker.

You mention that you have been using this feature for years.  I imported the org.eclipse.jdt.ui project into a new workspace, without changing spell checking options from the default.  Randomly inspecting files, I found that almost every file that contained strings and/or comments also had spell-check squiggles.  (A few of the squiggles did indicate real misspellings.)  Many of the errors are usages of the common terms "Java", "doc", "javadoc", "sync", "info", and "lang" (as in, for instance, comparisons to the string "java.lang.Object".  Ironically, the default behavior of ignoring camel-cased words is hiding some real misspellings, e.g., "inititalizeColors".

So, I would submit that if you want to get a real sense of what this feature is like for someone upgrading to Eclipse 3.3, you should reset your dictionaries and spell check options to the default.
Comment 9 Dani Megert CLA 2007-04-27 02:41:21 EDT
>You mention that you have been using this feature for years.  I imported the
>org.eclipse.jdt.ui project into a new workspace, without changing spell
>checking options from the default.  Randomly inspecting files, I found that
>almost every file that contained strings and/or comments also had spell-check
>squiggles.
Please go back to the page where you enabled spell checking: do you see any dictonary being set? OK. So - how do you expect that it checks any word? That was the major drawback up to now i.e. we did not have legal clearance for an existing dictionary. This finally made it.
Comment 10 Walter Harley CLA 2007-04-27 03:03:40 EDT
(In reply to comment #9)
> Please go back to the page where you enabled spell checking: do you see any
> dictonary being set? 

There is no user dictionary specified.

My point is that, from the perspective of someone who has the default settings, what it looks like to upgrade from Eclipse 3.2.2 to Eclipse 3.3 is that all of a sudden their code will be full of spell-check squiggles, for comments that are correct.  This seems like a bad thing.  I think that if we are going to change from spell checking disabled by default, to spell checking enabled by default, the dictionary and/or dynamic algorithms need to be good enough that for most correct code, they are not presenting errors to the user.  

Note that although there is a quick-fix to ignore a particular word, there is no quick-fix to disable spell checking altogether.  It would take someone a long time to fix all their problems one by one, and it is not immediately obvious where the spell-checker settings are.

I am happy that we have gotten legal permission to include a dictionary.  But I think that it is not a good enough dictionary to justify turning spell checking on by default.  I think we are going to get a lot of complaints and a lot of questions, and annoy a lot of users.
Comment 11 Dani Megert CLA 2007-04-27 03:13:49 EDT
As said I understand your point and will see what I can do but for now we decided it is better to let users see the feature and then they can either start with their additional user dictionary or simply disable it again, than not surfacing the feature at all without being discovered as it happened with you for more than two years now.
Comment 12 David Williams CLA 2007-04-27 03:46:08 EDT
(In reply to comment #11)
>
> ...  but for now 
> we
> decided it is better to let users see the feature and then they can ...
>

If you're taking a vote ... I agree with Walter. Best to be "off" by default. 
But, I do think it is a good idea to poll community and adopters. 

I know several WTP users have asked/complained because "suddenly" many of their XML, HTML, and JSP files show many spelling errors. 

And, maybe this is covered elsewhere? but I happen to know that some adopters have their own add-in spell checkers for HTML/JSP files. It's currently unclear to me how this new behavior interacts with those existing adopter add-ins. 
(Sorry if I should know this and maybe have just lost track?) 


Comment 13 Dani Megert CLA 2007-04-27 03:49:45 EDT
>I know several WTP users have asked/complained because "suddenly" many of their
>XML, HTML, and JSP files show many spelling errors. 
When? Spelling is enabled since a few days only. This is probably rather the known issue with lots of warnings after enabling due to having no dictionary at all.

> It's currently unclear
>to me how this new behavior interacts with those existing adopter add-ins. 
>(Sorry if I should know this and maybe have just lost track?) 
Spelling engines contributed via the well know extension point are not affected by this.
Comment 14 Dani Megert CLA 2007-04-27 05:40:57 EDT
>Note that although there is a quick-fix to ignore a particular word, there is
>no quick-fix to disable spell checking altogether
It is there if you used the latest N- or I-build (e.g. I20070427-0010).

I have added some more common words. But it really depends on how one writes comments and also, in which editor (e.g. Java, C++, etc.). As said before we use this feature for years now and if I compare our custom dictionary to the one we ship the difference is about 200 words.
Comment 15 Missing name Mising name CLA 2007-05-05 19:33:17 EDT
In @SuppressWarnings("deprecation") "deprecation" is also highlighted as not recognised (both in-built dictionary dialects).
Comment 16 Dani Megert CLA 2007-05-07 05:19:39 EDT
>In @SuppressWarnings("deprecation") "deprecation"
This has been fixed.
Comment 17 Dani Megert CLA 2008-06-10 08:51:46 EDT
*** Bug 236196 has been marked as a duplicate of this bug. ***
Comment 18 Dani Megert CLA 2009-05-19 05:49:22 EDT
*** Bug 276847 has been marked as a duplicate of this bug. ***
Comment 19 Dani Megert CLA 2011-05-24 02:24:38 EDT

*** This bug has been marked as a duplicate of bug 68898 ***