Bug 113108 - [API][comments] CompilationUnit.getNodeComments(ASTNode)
Summary: [API][comments] CompilationUnit.getNodeComments(ASTNode)
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.2   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.2 M4   Edit
Assignee: Frederic Fusier CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 95839
  Show dependency tree
 
Reported: 2005-10-19 12:44 EDT by Martin Aeschlimann CLA
Modified: 2005-12-13 07:18 EST (History)
2 users (show)

See Also:


Attachments
This patch fixes the problem (2.74 KB, patch)
2005-10-20 04:47 EDT, Frederic Fusier CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2005-10-19 12:44:49 EDT
20051019

The AST nodes already have the concept of mapping comments to nodes. So far the
only thing one can do is to ask for the extended source ranges of a node (source
range including the mapped comments).
In the AST rewriter I need to know if the mapped comment happens to be a line
comment.
An new API CompilationUnit.getNodeComments(ASTNode) : List
would solve that problem.

The API would return all comments that have been directly mapped to the given
node. The list contains all comments, leading and trailing, ordered by offsets.
(or an empty list if no comments are mapped to the given node)
Comment 1 Frederic Fusier CLA 2005-10-19 12:49:06 EDT
Perhaps 2 methods for leading and trailing comments?
DefaultCommentMapper can distinguish them...
Comment 2 Martin Aeschlimann CLA 2005-10-19 13:08:02 EDT
I think one API would be fine. By looking at the source ranges of the comments I
can find out if a comment is before or after a node.
Comment 3 Philipe Mulet CLA 2005-10-19 13:12:00 EDT
Feels like helper methods only. Would not add them unless there is a big value
addition. 

Committing to associating comments to node feels dangerous. We only offer a
heuristic for extending source ranges.
Comment 4 Frederic Fusier CLA 2005-10-19 13:22:34 EDT
So, WONTFIX as too dangerous to expose this heuristic.

A workaround will be to use list given by CompilationUnit.getCommentList()
method and search for comments inside node extended range (as you intend to do
with single getNodeComments(ASTNode) method).
Comment 5 Martin Aeschlimann CLA 2005-10-20 03:21:33 EDT
I strongly disagree. 
I first don't see why this would expose anything more than we already do.
Extended ranges is about mapping comments to nodes. The result of the API
requested is as much depended on a heuiristic as is the extended source range.

The suggested workaround seems very inefficient to me. The rewriter requests the
extended ranges for almost ever node involved in the rewrite. With the fix as
planed in bug 95839 I will iterate the list of comment for each node again.

Please reopen or tell me how to fix bug 95839.
Comment 6 Frederic Fusier CLA 2005-10-20 04:47:50 EDT
Created attachment 28505 [details]
This patch fixes the problem

Of course it surely needs to be adapted to different rewrite scenarii (ie.
replace, remove, copy, etc.) but I think this could be an option to avoid extra
API methods and heuristic changes...
Comment 7 Martin Aeschlimann CLA 2005-10-20 05:15:27 EDT
The patch is just a particular fix for the case mentioned in bug 95839. The end
of line comment problem can occur everywhere, for every insertion position after
a node:
e.g.
   insert a new parameter in
foo(int size // the number of elements)
->
foo(int size // the number of elements, int new param)

Using your approach we would end up ignoring mapped comments almost everywhere.

I don't think ignoring the comment mapping for some cases is a solution. Either
they are associated to a node, and then we have to accept that (e.g. remove them
when a node is removed, or copy them if copied) or not.

The ast rewriter is the main client of the mapping infrastructure. Having a
different way of mapping comments seems like a bad idea to me. So if we decide
to not map a comment to a node like that, then we better modify the mapping
behaviour in the source mapper.

I think the correct way of fixing the problem is to not ignore the comments but
finding out if the last comment was a line comment and then move behind the
following new line characters. To implement this with good performance I want to
request the API in comment 0.
Comment 8 Frederic Fusier CLA 2005-10-20 06:42:32 EDT
Problem of comment 0 proposed new API is that it can confuse users that returned
comments are those associated with given ASTNode and make him thought that this
is a new property of this node. We definitely want to avoid this confusion!
It should be clear that extended range, and so comments which extend a node, are
result of our specific and naive heuristic implemented in DefaultCommentMapper
to match our needs. These needs may differ depending on users or functionality
and so result cannot be set as a property of the ASTNode...

Second problem of this API would be that it force us to instanciate a new list
for each request and this can be really time and memory consuming although
clients were finally not concerned by the result.

To avoid these two problems there would be 2 solutions:
1) create 2 new API methods returning index of firat and last comments which
extended a node:
    - public int getFirstLeadingCommentIndex(ASTNode)
    - public int getLastTrailingCommentIndex(ASTNode)
   These methods would return -1 in case node had no leading/trailing comments
   You'll then be able to get last comment with:
      int index = getTrailingCommentIndex(node);
      if (index != -1) {
          Comment lastComment = getCommentList(index);
          ...
      }
      
2) create 2 new API methods returning first and last comments which extended a node:
    - public Comment getFirstLeadingComment(ASTNode)
    - public Comment getLastTrailingComment(ASTNode)
   These methods would return null in case node had no leading/trailing comments
   You'll then be able to get last comment with:
      Comment lastComment = getLastTrailingComment(node);
      if (lastComment != null) {
          ...
      }

Solution 1) would be a little bit more complex to retrieve last comment (your
requirement) but more general as they would allow users to rebuild easily list
of comments if necessary...

Would it be OK for you and if so, what would be your preferred solution?
Comment 9 Martin Aeschlimann CLA 2005-10-20 07:25:55 EDT
The spec of the new API should definitly explain the fact that mapping comments
is a heuristic. To make this really clear, I also wouldn't mind having API
CompilationUnit.getDefaultCommentMapper() : ICommentMapperHeuristic
ICommentMapperHeuristic
  getExtendedOffset(node) : int
  getExtendedLength(node) : int
  getCommentsMapped(node) : List

I don't think instantiating a list is expensive. Implementation-wise the list
forwards accesses using your comment index array, and doesn't need its own
storage. But I could also live with the suggestion 2.) you mention. I however
find the original suggestion (comment 0) more general and more powerful.
I don't really see the benefit of solution 1.). I wouldn't introduce the concept
of a index in the comment array as you can't really do anything else than
getting the comment at that index. Having your leading comment at index x, and
trailing comment at index y doesn't mean that all comments between x and y are
mapped to your node! (they might be mapped to a child node of your node). Also
for this reason I think giving access to all mapped comments (comment 0) is
superior.
Comment 10 Frederic Fusier CLA 2005-10-20 09:27:23 EDT
I do not understand how you can say that instanciate a new list each time
clients will ask for node comments will not be memory and time consuming...
Of course, comments won't be duplicated but spaces for list itself will be ever
needed. As you intend to do that for each node of the tree, it surely will not
be negligeable.

By the way, we do not intend to implement an interface for comment mapper
heuristic. We just have one for our own purposes and we offer a way to get
extended ranges using this heuristic. If clients need to have their own
heuristic, they can implement it themselves. Note that we never had such a
request since extended ranges methods have been provided for 3.0, so this does
not seem to be a important functionality we have to think about for 3.2.

About my proposed solutions, I agree that for solution 1), provided indexes do
not allow to build node comments list directly, but can reduce the indexes range
in the whole list to build it... But, I didn't like it either, so to address
your problem, it seems we would agree on solution 2)
Comment 11 Philipe Mulet CLA 2005-10-20 09:45:00 EDT
Re: comment 9

Which implementation of List are you talking about ? ArrayList certainly doesn't
do what you suggest. Allocating even object descriptors is always more expensive
that not allocating them, not talking about induced GC. I believe if you are
concerned about double binary search, then you are also concerned about memory
allocation.
Comment 12 Martin Aeschlimann CLA 2005-10-20 10:34:38 EDT
That would be your own type implementing List (extending AbstractList) (e.g.
similar to what AbstractList.subList(...) does).
Comment 13 Frederic Fusier CLA 2005-10-20 15:37:23 EDT
We finally agreed on solution 2) described in comment 8.

Note that DefaultCommentMapper will not give node comments list for free as it
stores only arrays of leading and trailing nodes. Solution 2) offers both
minimal implementation and best performance response (especially in requested
case, ie. get last trailing comment...).

Clients will be able to retrieve node comments list by themselves using both
first leading and last trailing comments provided by this new API. Scanning
whole CompilationUnit comments list between these two comments excluding those
which are inside node positions will give them expected result.

This will work because DefaultCommentMapper does not allow that a comment was
affected to several nodes and also because extended length of inner node never
overlap end of parent node...
Comment 14 Frederic Fusier CLA 2005-11-09 12:06:10 EST
Following API method will be added to o.e.jdt.core.dom.CompilationUnit:

/**
 * Returns the first comment of leading comments associated with the given node.
 * 
 * @param node the node
 * @return a {@link Comment} or null if node has no associated comment before
 * 	its start position.
 * @since 3.2
 */
public Comment getFirstLeadingComment(ASTNode node)

/**
 * Returns the last comment of leading comments associated with the given node. 
 * 
 * @param node the node
 * @return a {@link Comment} or null if node has no associated comment after
 *		its end position.
 * @since 3.2
 */
public Comment getLastTrailingComment(ASTNode node)
Comment 15 Philipe Mulet CLA 2005-11-10 06:07:45 EST
Unless comments would be chained together, a comment answer is only to solve the
one specific issue raised by Martin. An API addition should be more general if
possible.

It feels the canonical API is to answer the range of the first leading comment.
This would actually subsume the extended source range API (which is simply:
comments.get(leadCommentIndex).sourceStart()); and Martin's problem is then simply:
comments.get(leadCommentIndex).

So I would rather implement option (1) from comment 8.
Comment 16 Martin Aeschlimann CLA 2005-11-10 06:35:48 EST
I'm all for a more general API! It also feels to me that the new API is too
specific to my problem.

But I wouldn't introduce the notation of a 'index to the comment list'. That's a
new notation and can be confused with integers you get back from the source range.
Note that these comment list indexes wouldn't form a 'range'. It's not a range.
Both start and end index can be -1 (=no mapped comment). Not all comments in the
'range' really belong to the same node. Extra testing of source ranges is
required to find out if a comment belongs to a child node.

Comment 17 Frederic Fusier CLA 2005-11-10 06:58:22 EST
So the more general API I can see is to return a *real* indexes range (ie first
and last indexes) for leading and trailing comments as DefaultCommentMapper
already knows it.

So, CompilationUnit may return these indexes in a single long:
 - long getLeadingCommentsIndexes(ASTNode)
 - long getTrailingCommentsIndexes(ASTNode)

Then we'll have:
int firstCommentIndex = (int) (cu.getLeadingCommentsIndexes(node)>>32);
Comment firstComment = cu.getCommentsList().get(leadFirstCommentIndex)
int lastCommentIndex = (int) cu.getTrailingCommentsIndexes(node);
Comment lastComment = cu.getCommentsList().get(lastCommentIndex)

Note that with this implementation, returned long will be really indexes range
as they will represent contiguous leading/trailing comments of given node.
Comment 18 Martin Aeschlimann CLA 2005-11-10 07:21:48 EST
Having the range of all the leading comments brings us dangerously close to
getLeadingComments(ASTNode) : Comment[] or even getNodeComments(ASTNode) :
List/*<Comment>*/ :-)
Comment 19 Philipe Mulet CLA 2005-11-10 07:46:57 EST
But in an efficient manner and lightweight... <g>
Why even bother with the range notion ? The first comment index should be
enough. Clients simply iterate them until the meet another comment which
starting position exceeds the node starting position... is that so hard to write
? But no strong opinion against the range combo.
Comment 20 Frederic Fusier CLA 2005-11-10 07:58:33 EST
I also agree that range combo is not really necessary, that was just the most
general API we can have on this...
Moreover, I think that getting range through a long could be something unusual
for API methods... 

So, I'd agree with Philippe and vote for comment 8 solution 1): 2 methods which
will give index of first leading comment and index of last trailing comments.

Javadoc comments on these should precise that not all comments between these two
indexes may not be associated with given node and describe way to get all
leading/trailing comments from respective index.
Comment 21 Jim des Rivieres CLA 2005-11-10 15:42:20 EST
Forgive me for coming late to this party :-)

The original API suggested in comment 0 seems straightforward enough to me.

CompilationUnit
   public List<IComment> getNodeComments(ASTNode node);

Since most nodes comments have no comments directly associated with them, the 
list returned will usually be empty.

The AST already has the notion of extended source ranges (optional) and 
comment tables (also optional). So we're not admitting anything we haven't 
already bought into.

Is this convenience method worth providing? I take it the implementation would 
search the comment list for the range of comments lying within the extended 
range of the node. The implementation would then knock out all comments in 
that range that fall within the extended source range of a direct child node, 
leaving only those comments that belong to the given node itself. Clearly a 
client can do this for themselves with the API already provided. However, to 
do it efficiently, the comment list should be binary searched to establish the 
comment range. Rather than have the client does this (or use linear search), 
this convenience method probably is something worthwhile providing.

Comment 22 Frederic Fusier CLA 2005-11-11 13:25:59 EST
Jim,

We already agreed that comment 0 method was definitely inefficient in certain
circumstances. We also already agreed that bug 95839 was one of it...
As it only needs the last trailing comment, it would take obviously more time
consuming to instanciate a list of comments instead of returning only one!

Any list creation we could offer would be helpers and may also be done by
clients as well. The only thing we have to do is give us enough information to
let them build whatever list they want (all comments, only leading or trailing
ones or - and this is the case for bug 95839 - only first/last leading/trailing
comment).

For example, if we give all comments list, what about clients who may want only
leading ones? They still would argue that they needed to parse comments list
again to retrieve the last leading comment in the whole list we gave them...

I think there's no ideal solution as there's no precise requirement on this kind
of list. It seems that nobody cares about node comments (whole, leading or
trailing). Currently, the only requirement is Martin's one and we think that the
best approach to match it is to give information which have minimal time and
memory cost for us.

That's why we think indexes are best candidate for these API methods.
DefaultCommentMapper already (and only) knows these indexes and can give it
really for free. So, we definitely want that CompilationUnit provides these
indexes: first, last, leading, trailing, range or whatever... but nothing else
than indexes.
Comment 23 Martin Aeschlimann CLA 2005-11-14 04:16:04 EST
I will agree on comment 0 anytime! It is still my favourite. I think it is is
the simplest, most easy understandable and most powerful API. It can be
implemented with good performance (not using ArrayList but your own List that
uses your own internal structure. Creating an object in _NOT_ considered expensive).

It seems to me that the index based API is mostly motivated that you happen to
use these comment list indexes in your implementation. But shouldn't that stay a
implementation detail? Will your performance still be as good if you have to
decide to change to another structure?

Comment 24 Philipe Mulet CLA 2005-11-14 05:59:17 EST
Frederic - pls release the changes we had discussed lately. We will not add List
based helpers for something this simple to iterate, nor will we make comments be
associated with AST notes. We only provide a heuristic, which clients may chose
to ignore.
Comment 25 Frederic Fusier CLA 2005-11-14 13:50:10 EST
Following methods have been added to CompilationUnit:
/**
 * Return the index in the whole comments list {@link #getCommentList() }
 * of the first leading comments associated with the given node. 
 * 
 * @param node the node
 * @return 0-based index of first leading comment or -1 if node has
 *	no associated comment before its start position.
 * @since 3.2
 */
public int firstLeadingCommentIndex(ASTNode node)
/**
 * Return the index in the whole comments list {@link #getCommentList() }
 * of the last trailing comments associated with the given node. 
 * 
 * @param node the node
 * @return 0-based index of last trailing comment or -1 if node has
 * 	no associated comment after its end position.
 * @since 3.2
 */
public int lastTrailingCommentIndex(ASTNode node)

Released in HEAD.

Test cases added in ASTConverterJavadocTest
Comment 26 Jerome Lanneluc CLA 2005-12-13 07:18:01 EST
Verified for 3.2 M4 using build I20051213-0010