Bug 110173 - [plan] API to extract the Javadoc as HTML from attached HTML
Summary: [plan] API to extract the Javadoc as HTML from attached HTML
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.1   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: 3.2 M4   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 115137 (view as bug list)
Depends on:
Blocks: 41421 98154 127898
  Show dependency tree
 
Reported: 2005-09-21 09:43 EDT by Jerome Lanneluc CLA
Modified: 2006-02-14 16:06 EST (History)
9 users (show)

See Also:


Attachments
Proposed fix (29.06 KB, patch)
2005-11-09 21:41 EST, Olivier Thomann CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jerome Lanneluc CLA 2005-09-21 09:43:05 EDT
I20050920

A new API to extract the Javadoc from attached HTML for a binary Java element is
needed. This API will read the
IClasspathAttribute#JAVADOC_LOCATION_ATTRIBUTE_NAME attribute and extract the
HTML corresponding to a Java element.
Comment 1 Olivier Thomann CLA 2005-09-30 09:00:28 EDT
Dani,

I need details in order to know what part of the HTML should be extracted for
each type of element.
Comment 2 Dani Megert CLA 2005-10-03 09:27:12 EDT
Note that it should not only work for binary elements but also for source
elements where the Javadoc is attached as defined by
IClasspathAttribute#JAVADOC_LOCATION_ATTRIBUTE_NAME.

We'd like to get the information that is listed under 'xyz Detail' for each
element if using the standard doclet (xyz stands for the various Java element
types). Basically we should get the same information as when extracting the
Javadoc from the source which we currently do in JDT UI land. For details see
JavadocContentAccess.
Comment 3 Olivier Thomann CLA 2005-10-03 09:55:11 EDT
That is fine. But what happens if the doclet is changing for future version? How
I am suppose to know what to extract out of the HTML doc?
An API cannot be ambiguous about its return value. The user needs to know what
should be returned in all possible cases.
I don't see why this would be useful for source element as they can always
return the source of the javadoc comment.
Comment 4 Dani Megert CLA 2005-10-03 10:08:42 EDT
Adding Martin to the cc-list since he implemented JDT UI's Javadoc access.
Comment 5 Dani Megert CLA 2005-10-03 10:29:00 EDT
>I don't see why this would be useful for source element as they can always
>return the source of the javadoc comment.
Because the Javadoc might contain more information since it went through a doclet.

Comment 6 Olivier Thomann CLA 2005-10-03 10:32:18 EDT
ok, fine. Then tell me how you would find out what needs to be returned from the
HTML. If we provide an API, we need to handle any doclet.
We also need to support html that might not be well formatted (some ending tags
are optional).
Comment 7 Dani Megert CLA 2005-10-03 10:43:03 EDT
Some time ago Martin did some investigations. He can probably provide more
detailed info.
Comment 8 Martin Aeschlimann CLA 2005-10-03 11:07:55 EDT
I added remarks in bug 110172.

I think only the standard doclet really needs to be understood. What should
always be there is the anchor for the members (places in the html file where you
can jump to -> <A NAME="getPropertyPage()"></A> ).These anchors are kind of
'standardized' as different Javadoc exports can refer to each other.
I would just try to extract from one anchor to the next.

Comment 9 Olivier Thomann CLA 2005-10-03 11:10:26 EDT
I checked this, bu this includes more text than what is currently returned from
the javadoc in the source. We need a consistent result between javadoc from
source and from html javadoc. Or maybe we don't care if they don't match? 
Comment 10 Martin Aeschlimann CLA 2005-10-03 11:21:17 EDT
I think they don't have to match (they never will, our code that generates html
from the Javadoc source uses different formatting rules that the doclet)
Comment 11 Dani Megert CLA 2005-10-03 11:27:39 EDT
The priority is to support binary code which has no source. If we could compile
the source from the class file plus the Javadoc (as outlined by Martin) it would
solve the 80% case without adding any additional API.

The urgent problems that we need to solve and discussed at the JDT summit are
bug 41421 and bug 77475. Not sure whether Martin's suggestion also helps with
bug 77475.
Comment 12 Olivier Thomann CLA 2005-10-05 09:53:53 EDT
Should this API also work for java element defined in source types?
If yes, this means we need additional support for this attribute (we need it per
source folder) as Martin outlined in the bug 110172 comment 8.
This API would return an HTML string extracted from the attached HTML doc.
What kind of attachement should be supported?
A URL? a zip file that contains the doc? a local folder?
Comment 13 Olivier Thomann CLA 2005-10-17 14:29:51 EDT
/**
 * Returns the Javadoc as an html source if this element has an attached javadoc,
 * null otherwise.
 * 
 * <p>The html is extracted from the attached javadoc and provided as is. No
 * transformation or validation is done.</p>
 *
 * @return the extracted javadoc from the attached javadoc
 * @since 3.2
 */
String getJavadocFromAttachment();

would someone have a better name for this one?
Comment 14 Jerome Lanneluc CLA 2005-10-18 04:19:25 EDT
Maybe getHTMLJavadoc() ?

Also the spec should have a:
@see IClasspathAttribute#JAVADOC_LOCATION_ATTRIBUTE_NAME
Comment 15 Dani Megert CLA 2005-10-18 05:07:23 EDT
>Returns the Javadoc as an html source
We should clarify whether the API returns correct/valid HTML or just a fragment.

I prefer getJavadocFromAttachment() over getHTMLJavadoc() because it's closer to
what the method does.

Otherwise +1.
Comment 16 Martin Aeschlimann CLA 2005-10-18 06:48:02 EDT
I would also state that this is done by best effort, e.g. nothing is returned if
the attached Javadoc is in an unknown format.

I also wonder if you don't need a progress monitor passed in and a
JavaModelException thrown.
Comment 17 Olivier Thomann CLA 2005-10-18 08:41:51 EDT
I will add a line with:
@see IClasspathAttribute#JAVADOC_LOCATION_ATTRIBUTE_NAME

I think this line is pretty clear:
<p>The html is extracted from the attached javadoc and provided as is. No
 * transformation or validation is done.</p>

It doesn't try to check the validity of the html tags. It simply extracts the
part corresponding to the java element.
Comment 18 Olivier Thomann CLA 2005-10-18 09:02:51 EDT
Not sure that we need a throws JavaModelException clause in the method
signature. If anything goes wrong, we should return null. I don't see the added
value of an exception.
I will add a note on the format of the HTML. If standard anchors are missing,
there is no way to retrieve the subpart of the HTML.
Comment 19 Martin Aeschlimann CLA 2005-10-18 10:16:46 EDT
I would throw an exception in case problems with the connection occur. E.g.
invalid URL or location does not exists or was time-outed.

Comment 20 Olivier Thomann CLA 2005-10-20 12:17:38 EDT
Same remark than for bug 110172. This new API should be on
org.eclipse.jdt.core.IMember and not on IJavaElement.
Comment 21 Olivier Thomann CLA 2005-10-20 12:51:23 EDT
We would end up with:
/**
 * Returns the Javadoc as an html source if this element has an attached javadoc,
 * null otherwise.
 * 
 * <p>The html is extracted from the attached javadoc and provided as is. No
 * transformation or validation is done.</p>
 *
 * @param monitor the given progress monitor
 * @exception JavaModelException if:<u>
 *  <li>this element does not exist</li>
 *  <li>retrieving the attached javadoc fails (timed-out, invalid URL, ...)
 *  </u>
 * @return the extracted javadoc from the attached javadoc
 * @see IClasspathAttribute#JAVADOC_LOCATION_ATTRIBUTE_NAME
 * @since 3.2
 */
String getAttachedJavadoc(IProgressMonitor monitor) throws JavaModelException;

on org.eclipse.jdt.core.IMember
Comment 22 Olivier Thomann CLA 2005-10-20 13:49:16 EDT
Final proposal

On org.eclipse.jdt.core.IMember#
/**
 * Returns the Javadoc as an html source if this element has an attached javadoc,
 * null otherwise.
 * 
 * <p>The html is extracted from the attached javadoc and provided as is. No
 * transformation or validation is done.</p>
 *
 * @param monitor the given progress monitor
 * @exception JavaModelException if:<u>
 *  <li>this element does not exist</li>
 *  <li>retrieving the attached javadoc fails (timed-out, invalid URL, ...)
 *  <li>the format of the javadoc doesn't match expected standards (different
anchors,...)</li>
 *  </u>
 * @return the extracted javadoc from the attached javadoc
 * @see IClasspathAttribute#JAVADOC_LOCATION_ATTRIBUTE_NAME
 * @since 3.2
 */
String getAttachedJavadoc(IProgressMonitor monitor) throws JavaModelException;

Martin says this is fine.
Let me know asap if anything looks wrong in this API.
Comment 23 Olivier Thomann CLA 2005-10-24 15:10:59 EDT
In fact this one needs to be done on IJavaElement as it makes sense to return a
javadoc contents for a package fragment (this would be the package-summary.html
file).
So we should handle types, fields, methods and package. I don't see other java
elements that need to be handled.
Any comment on that?
Comment 24 Jerome Lanneluc CLA 2005-10-25 05:10:47 EDT
(In reply to comment #23)
> In fact this one needs to be done on IJavaElement as it makes sense to return a
> javadoc contents for a package fragment (this would be the package-summary.html
> file).
> So we should handle types, fields, methods and package. I don't see other java
> elements that need to be handled.
> Any comment on that?
Agreed. We should just mention what the valid IJavaElements are in the spec, and
waht would be returned in other cases (null I guess)
Comment 25 Olivier Thomann CLA 2005-10-25 10:17:17 EDT
Looking at the methods in
org.eclipse.jdt.internal.corext.javadoc.JavaDocLocations, it looks like the java
elements that returns a location for the javadoc are:
- package fragment: package-summary.html
- java project and package fragment root: index.html
- import container: returns the location of its compilation unit
- compilation unit, class file and type: the location of the primary type
- field, method: their javadoc location (using the anchor if required)
- initializer: the location of its declaring type
- import declaration: the location of the corresponding resolved elements
(package fragment or type)
- package declaration: same as package fragment.

I am not sure that we want to preserve the one for initializer and import
container. The one for import declaration should handle static imports. But this
could lead to multiple methods. In this case which one should we choose?
I believe some of the existing code is trying to retrieve doc when no doc should
be returned.
Also it doesn't seem possible for JDT/Core to retrieve the javadoc location for
project. This is using a key that is defined using JavaUI.ID_PLUGIN. And this is
unknown for JDT/Core.
Should we move the key to JDT/Core? What about existing workspaces? Will they be
broken?
I'd like to clarify these points before I continue pushing down code from JDT/UI
to JDT/Core.
Comment 26 Olivier Thomann CLA 2005-11-04 15:59:36 EST
*** Bug 115137 has been marked as a duplicate of this bug. ***
Comment 27 Olivier Thomann CLA 2005-11-09 21:41:21 EST
Created attachment 29674 [details]
Proposed fix

First draft. I'd like to know if this is matching your expectations.
Comment 28 Martin Aeschlimann CLA 2005-11-10 04:06:42 EST
Look good to me.
Minor request: Could you rename the parameter 'encoding' to 'defaultEncoding'? I
know it is described in the spec, but this would make it completly clear.

I saw that you are duplicating some code from JavaUI (getJavadocBaseLocation).
Maybe add this as API as well? 
Comment 29 Olivier Thomann CLA 2005-11-10 09:28:45 EST
I will rename the encoding parameter to default encoding, but I don't think we
need to expose what I moved down from JDT/UI as API. I din't see why this would
be useful as an API if we provide the javadoc.
If you applied my patch, the code assist should now propose argument names
according to the javadoc (see bug 41421).
I can release the first part of this patch (exposing the new API) and I will ask
David to review the part for the code assist.
Comment 30 Dani Megert CLA 2005-11-10 09:34:28 EST
The preview looks good.
Comment 31 Olivier Thomann CLA 2005-11-10 10:04:57 EST
I forgot to check the status of the progress monitor. Should I throw
OperationCanceledException when it is cancelled?
Comment 32 Olivier Thomann CLA 2005-11-10 10:17:22 EST
Ok, fixed and released in HEAD.
Regression tests added in org.eclipse.jdt.core.tests.model.AttachedJavadocTests.
For now I simply check the status of the progress monitor before and after
retrieving the contents of the javadoc URL. If this is not enough, it can easily
be changed.
If the progress monitor is cancelled, an OperationCanceledException is thrown.
Comment 33 Ed Burnette CLA 2005-11-11 00:08:02 EST
Could somebody who has this running make sure it works with the Javadoc format
used at JDocs? It's not totally standard because it has user comments and ads.
http://www.jdocs.com/
Thanks.
Comment 34 Olivier Thomann CLA 2005-11-11 09:49:22 EST
This would mean that we need to support any kind of doc. Why would we support
JDocs more than any other docs?
Comment 35 Olivier Thomann CLA 2005-11-11 10:00:54 EST
Where could I find a tool to generate some docs using JDocs? I can check what we
would support using the actual support.
Comment 36 Ed Burnette CLA 2005-11-11 13:01:56 EST
It's not something you have to run, JDocs is just a central repository for
javadocs that are slightly munged. Here's an example of an API with a comment:

http://www.jdocs.com/cli/1.0/api/org/apache/commons/cli/OptionBuilder.html
Comment 37 Olivier Thomann CLA 2005-11-11 15:23:45 EST
The major problems that I can see with JDocs format is that I might not be able
to determine the right ending for the doc of the last field or last method.
In "standard" javadoc, it is possible to know where the last field/method ends.
With JDocs, there is no such thing. There is an ad at the end of the last
method. Can I rely on this to be there all the time? I need at least to find a
common pattern between all files to find out the end of the last method.
But if we start to support JDocs, we might have to support all kind of javadocs
and this is an endless work.
Dani or Martin, any comment on that?
Comment 38 Ed Burnette CLA 2005-11-11 16:02:29 EST
Thanks for trying it. What would JDocs need to change to let you find the end
correctly, without you having to code in special support?
Comment 39 Dani Megert CLA 2005-11-11 16:08:46 EST
If the standard rules don't work for JDocs then we should not add special code
to support this. Otherwise you'll get request for format2, format3, etc.

What you can do is to provide an extension point that reads the Javadoc from
html files (e.g. for a given URL/web address). This would allow JDocs lovers to
provide an extension. It would also allow clients who use a different Javadoc
format to use this feature.
Comment 40 Gunnar Wagenknecht CLA 2005-11-12 07:51:03 EST
(In reply to comment #39)
> What you can do is to provide an extension point that reads the Javadoc from
> html files (e.g. for a given URL/web address). This would allow JDocs lovers to
> provide an extension. It would also allow clients who use a different Javadoc
> format to use this feature.

IMHO that's a good solution. It would also make you independent from
design/format changes at JDoc.
Comment 41 Frederic Fusier CLA 2005-12-13 05:11:06 EST
Verified for 3.2 M4 with build I20051213-0010