Bug 521249 - [9] Document that Java 9 systematically breaks IScanner
Summary: [9] Document that Java 9 systematically breaks IScanner
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.7   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: 4.8 M6   Edit
Assignee: Stephan Herrmann CLA
QA Contact:
URL:
Whiteboard:
Keywords: readme
Depends on:
Blocks:
 
Reported: 2017-08-22 08:28 EDT by Stephan Herrmann CLA
Modified: 2018-06-01 01:47 EDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stephan Herrmann CLA 2017-08-22 08:28:32 EDT
With the introduction of restricted keywords (see bug 488541) scanning is no longer possible without also parsing. For that reason the interface IScanner is not useful in Java 9 (unless clients ensure to use it only for ordinary compilation units).

This should be documented both in IScanner and ToolFactory.

Since JDT/Core cannot detect when this API is invoked with the source of module-info.java (because the filename is not passed in this scenario), we cannot even issue / log any warning when a module descriptor is passed.

Since this means we cannot distinguish "good" from "bogus" use of the API, it may be best to complete deprecate it.

The only way to "fix" the API would be by introducing additional methods like setIsModuleDescriptor(boolean) and *require* that clients call it prior to scanning. Obviously, this would be a breaking change, too. So I'm not sure it's worth it.

Actually, setIsModuleDescriptor() wouldn't be enough as long as setSource() accepts any snippet of code. If module-info.java is passed, the source *must* include the opening "module" keyword, otherwise classification of restricted keywords is not possible.
Comment 1 Stephan Herrmann CLA 2017-08-22 17:21:48 EDT
Proposed text for the start of the javadoc of IScanner, similar blurp to be added to all ToolFactory.createScanner() methods:

-------------
<strong>Caveat:</strong> With the introduction of "restricted keywords" in Java 9 it is impossible to classify a token without the help of a parser. For that reason, this interface is not suitable for scanning a modular compilation unit ("module-info.java"). It is the client's responsibility to pass only source from ordinary compilation units. For lack of a file name the scanner cannot check this.
-------------

Thoughts?

Should this functionality be deprecated, to make the caveat more visible?
As a compromise we could deprecate only IScanner.getNextToken() which is the only method actually promising to classify tokens (which it no longer can in a reliable way).

In that case I would attach the "caveat" text to
 - IScanner.getNextToken()
 - and similarly to ToolFactory.createScanner() as an "early warning"

At the least, the same blurp should go into the readme / migration guide.
Comment 2 Stephan Herrmann CLA 2017-08-22 18:25:37 EDT
Proposed deprecation notice of IScanner.getToken():

 @deprecated with the introduction of "restricted keywords" in Java 9,
 classification of tokens cannot reliably be done by a scanner,
 and hence the only real solution is to use a parser instead 
 (see also the caveat at the top of the	interface documentation).
Comment 3 Eclipse Genie CLA 2017-08-26 15:37:14 EDT
New Gerrit change created: https://git.eclipse.org/r/103734
Comment 4 Stephan Herrmann CLA 2017-08-27 09:04:16 EDT
Some figures from my JDT workspace:

Method IScanner.getNextToken() is referenced 66 times (not counting tests).

Of these, 40 references are in JDT/UI (only few in /Debug).

Of those 40, 14 locations are interested in distinguishing identifiers vs. keywords. These locations are potentially affected by the restricted keyword problem, if this code could be reached with a modular compilation unit (which I haven't checked).


I think fairness demands to somehow warn adopters that existing use of JDT API may not work as expected in Java 9. For that reason deprecating sounds like a suitable means, but I'm not sure if we'd be unnecessarily spamming adopters.

From the above numbers I think @SW("deprecation") is a suitable means to acknowledge this new problem.

If that's considered too harsh I see three alternatives:

(1) revert @Deprecated and hope that a notice in the readme suffices to alert adopters

(2) or add a new non-deprecated API, one of:
  (a) getNextOrdinaryToken() (using the same implementation internally)
  (b) a subtype IOrdinaryScanner with corresponding factory methods in ToolFactory
    In this type, #getNextToken() will not be deprecated
  Either sub-option allows adopters to acknowledge the new problem by
  simply migrating to an equivalent API, ensuring they know that this is
  not applicable to modular compilation units.

  IF NEEDED, we could then (later) add the *Modular* variants which will have
  restricted keywords enabled - at best effort approach.


(3) instead of deprecating #getNextToken(), what about deprecating the constant ITerminalSymbols#TokenNameIdentifier?

  Numbers in my workspace: 85 references not counting tests.

  No reference in JDT/Debug

  Merely 8 references in JDT/UI.

  This looks much more precise to me.

  (JDT/Core could use an internal alias after checking if each reference is safe)



Final remark: Future Java extensions will aggravate this issue. Sweeping the current problem under the rag will create much bigger surprise when the rag can no longer cover the problem in the future.


Comments? Preferences?
Comment 5 Stephan Herrmann CLA 2017-10-10 09:44:14 EDT
I think more important than moving the target back and forth is: could someone please comment on the options I outlined (and perhaps others I didn't think of)?
Comment 6 Eclipse Genie CLA 2018-02-22 13:11:58 EST
New Gerrit change created: https://git.eclipse.org/r/117971
Comment 7 Stephan Herrmann CLA 2018-02-22 14:18:42 EST
(In reply to Eclipse Genie from comment #6)
> New Gerrit change created: https://git.eclipse.org/r/117971

Consists of:
- Informal caveat at the top of IScanner
- @Deprecated at ITerminalSymbols#TokenNameIdentifier
Comment 9 Eclipse Genie CLA 2018-02-22 14:46:22 EST
New Gerrit change created: https://git.eclipse.org/r/117979
Comment 10 Stephan Herrmann CLA 2018-02-22 14:47:47 EST
(In reply to Eclipse Genie from comment #8)
> Gerrit change https://git.eclipse.org/r/117971 was merged to [master].
> Commit:
> http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/
> ?id=fe21ba1003ebd5cc75c5cefdcd63ad8835d40f97

JDT/Core changes released for 4.8 M6

(In reply to Eclipse Genie from comment #9)
> New Gerrit change created: https://git.eclipse.org/r/117979

Some follow-up changes in JDT/UI: suppress some warnings where it seems safe. Comment other locations.
Comment 12 Stephan Herrmann CLA 2018-05-08 16:45:32 EDT
No more changes planned as of now.
Comment 13 Dani Megert CLA 2018-05-14 05:44:26 EDT
(In reply to Stephan Herrmann from comment #1)
> At the least, the same blurp should go into the readme / migration guide.

Has this been done? See also bug 534459.
Comment 14 Jay Arthanareeswaran CLA 2018-05-15 23:27:21 EDT
(In reply to Dani Megert from comment #13)
> (In reply to Stephan Herrmann from comment #1)
> > At the least, the same blurp should go into the readme / migration guide.
> 
> Has this been done? See also bug 534459.

Will add it now.
Comment 15 Jay Arthanareeswaran CLA 2018-06-01 01:47:45 EDT
(In reply to Jay Arthanareeswaran from comment #14)
> Will add it now.

Done.

The documentation can be seen here:

https://www.eclipse.org/eclipse/development/readme_eclipse_4.8.php#mozTocId521249