Bug 74394 - [compiler] Provide XML output option for Eclipse compiler
Summary: [compiler] Provide XML output option for Eclipse compiler
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 3.1 M5   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-20 23:35 EDT by Nick Crossley CLA
Modified: 2005-06-01 21:51 EDT (History)
1 user (show)

See Also:


Attachments
Apply on HEAD (33.18 KB, patch)
2005-01-04 13:52 EST, Olivier Thomann CLA
no flags Details | Diff
New patch to apply on HEAD (47.24 KB, patch)
2005-01-07 13:42 EST, Olivier Thomann CLA
no flags Details | Diff
Corresponding xml file (10.29 KB, text/plain)
2005-01-07 13:45 EST, Olivier Thomann CLA
no flags Details
Apply on HEAD (49.09 KB, patch)
2005-01-10 13:41 EST, Olivier Thomann CLA
no flags Details | Diff
Example of log files that can be successfully validated (12.06 KB, text/plain)
2005-01-10 13:42 EST, Olivier Thomann CLA
no flags Details
Apply on HEAD (45.42 KB, patch)
2005-01-10 13:57 EST, Olivier Thomann CLA
no flags Details | Diff
Example of log file (9.87 KB, text/plain)
2005-01-10 13:57 EST, Olivier Thomann CLA
no flags Details
DTD file (2.18 KB, text/plain)
2005-01-10 13:58 EST, Olivier Thomann CLA
no flags Details
New DTD file (2.31 KB, text/plain)
2005-01-10 15:42 EST, Olivier Thomann CLA
no flags Details
New patch to apply on HEAD (45.47 KB, patch)
2005-01-10 15:44 EST, Olivier Thomann CLA
no flags Details | Diff
New DTD file (2.28 KB, text/plain)
2005-01-24 17:15 EST, Olivier Thomann CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Crossley CLA 2004-09-20 23:35:19 EDT
In order to integrate the Eclipse batch compiler into processes and tools, it 
would be useful to have an option for the compiler to write its messages in 
XML.  For examples, FindBugs and Checkstyle provide XML output options.

The Eclipse compler XML output could include:
1.  The compiler version
2.  Command line arguments
3.  Names of source files compiled
4.  Names of class files written
5.  Errors and warnings
6.  Summary and statistics

For each error or warning element written, it would be useful if a problem 
type was given, in addition to the specific message.  For example, if the 
compiler produced a warning of the form "Local variable abcdef is never read", 
the XML error element should include some attribute or element of the 
form "LocalVariableUnused".  This would allow surrounding tools to group and 
report error and warning messages by type without complex regular expression 
matching.
Comment 1 Olivier Thomann CLA 2004-09-23 22:48:52 EDT
It would be trivial to add this into the batch compiler. Don't hesitate to
provide a patch.
Comment 2 Nick Crossley CLA 2004-09-30 17:42:44 EDT
OK, I'll work on a patch/suggested implementation.  It probably won't be until 
December that I'll get a chance to do this.  Would you prefer my patch to be 
based on the CVS HEAD at that time, or against the latest 3.1 milestone or 
integration build?  Could you also please point me at some existing code in 
Eclipse that shows your preferred techniques for writing XML.
Comment 3 Jerome Lanneluc CLA 2004-11-04 06:25:48 EST
A patch against HEAD would be better.
You can see an example of writing XML in
org.eclipse.jdt.internal.core.JavaProject#encodeClasspath(...)
Comment 4 Olivier Thomann CLA 2004-12-16 11:30:30 EST
Any news on that front?
Comment 5 Nick Crossley CLA 2004-12-16 14:22:18 EST
I expect to work on this during my Christmas break over the next two weeks.  
Look for an update in the first week in January when I return.
Comment 6 Olivier Thomann CLA 2004-12-16 14:39:12 EST
ok, if you haven't started yet, I'd like to provide a first implementation that
you could review.
Comment 7 Nick Crossley CLA 2004-12-16 14:50:20 EST
That would be great!
Comment 8 Olivier Thomann CLA 2005-01-04 13:49:18 EST
Do you think such output would be sufficient?

<?xml version="1.0" encoding="UTF-8"?>
<compiler name="Eclipse Java Compiler" version="0.529, pre-3.1.0 milestone-4">
	<problems>
		<problem start="78" end="81" severity="ERROR" line="3"
source="C:\tests_sources\Test.java" id="IncompatibleReturnType">
			<message value="The return type is incompatible with Writer.append(char),
PrintWriter.append(char)"/>
			<arguments>
				<argument value="java.io.Writer.append(char),
java.io.PrintWriter.append(char)"/>
			</arguments>
		</problem>
		<problem start="78" end="81" severity="ERROR" line="3"
source="C:\tests_sources\Test.java" id="IncompatibleReturnType">
			<message value="The return type is incompatible with
Writer.append(CharSequence, int, int), PrintWriter.append(CharSequence, int, int)"/>
			<arguments>
				<argument value="java.io.Writer.append(CharSequence, int, int),
java.io.PrintWriter.append(CharSequence, int, int)"/>
			</arguments>
		</problem>
		<problem start="78" end="81" severity="ERROR" line="3"
source="C:\tests_sources\Test.java" id="IncompatibleReturnType">
			<message value="The return type is incompatible with
Writer.append(CharSequence), PrintWriter.append(CharSequence)"/>
			<arguments>
				<argument value="java.io.Writer.append(CharSequence),
java.io.PrintWriter.append(CharSequence)"/>
			</arguments>
		</problem>
		<problem start="287" end="297" severity="WARNING" line="11"
source="C:\tests_sources\Test.java" id="UnnecessaryCast">
			<message value="Unnecessary cast from String to String"/>
			<arguments>
				<argument value="java.lang.String"/>
				<argument value="java.lang.String"/>
			</arguments>
		</problem>
	</problems>
	<problem_summary problems="4" errors="3" warnings="1"/>
	<command_line>
		<argument value="C:\tests_sources\Test.java"/>
		<argument value="-1.5"/>
		<argument value="-source"/>
		<argument value="1.4"/>
		<argument value="-g"/>
		<argument value="-d"/>
		<argument value="c:\tests_sources"/>
		<argument value="-verbose"/>
		<argument value="-classpath"/>
		<argument value="C:\tests_sources"/>
		<argument value="-log"/>
		<argument value="c:\log.xml"/>
		<argument value="-warn:+uselessTypeCheck"/>
	</command_line>
</compiler>
Comment 9 Olivier Thomann CLA 2005-01-04 13:52:00 EST
Created attachment 16918 [details]
Apply on HEAD

Here is the corresponding implementation. We can still discuss if we want more
information. The log is generated only in case of errors or warnings.
Comment 10 Nick Crossley CLA 2005-01-04 15:04:19 EST
This is very much what I was looking for - thanks!

A couple of questions:

1.  I presume the 'start' and 'end' attributes are character or byte locations 
of the error in the given source line?  What if the error extends over more 
than one line? (The current messages produced by the compiler sometimes 
include more than one line of source.)

2.  You say the log is only produced in case of errors or warnings - wouldn't 
that make it a little harder to script?  And wouldn't the compiler and command 
line information be useful in the event of a successful compiler run?  I would 
suggest producing an XML log if a command line flag was given to request it, 
regardless of the number of errors or warnings.
Comment 11 Olivier Thomann CLA 2005-01-06 16:34:43 EST
I suggest the following format:
<compiler ....>
    <sources>
        <source path="......">
              <problem_summary problems="4" errors="3" warnings="1">
                   <problem ...>
              </problem_summary>
              <classfile path="...."/>
              <tasks>
                  <task message="...."/>
              </tasks>
        </source>
     </sources>
     <command-line>
          ...
     </command-line>
     <stats ... />
</compiler>

This would allow to get all source files compiled and all errors for each source
files + all class files generated for each source files.
Would this be good enough?
Comment 12 Philipe Mulet CLA 2005-01-06 19:33:24 EST
Shouldn't the classpath and options also be surfaced ? In command line, it
requires to be decoded.
Comment 13 Olivier Thomann CLA 2005-01-07 10:17:14 EST
The command-line part would include the whole command line argument with this
format:
	<command_line>
		<argument value="C:\tests_sources\Test.java"/>
		<argument value="-1.5"/>
		<argument value="-source"/>
		<argument value="1.4"/>
		<argument value="-g"/>
		<argument value="-d"/>
		<argument value="c:\tests_sources"/>
		<argument value="-verbose"/>
		<argument value="-classpath"/>
		<argument value="C:\tests_sources"/>
		<argument value="-log"/>
		<argument value="c:\log.xml"/>
		<argument value="-warn:+uselessTypeCheck"/>
	</command_line>

Sorry if this was unclear. Is this enough? If not, let me know what you expect.
Comment 14 Olivier Thomann CLA 2005-01-07 13:42:47 EST
Created attachment 16996 [details]
New patch to apply on HEAD
Comment 15 Olivier Thomann CLA 2005-01-07 13:45:51 EST
Created attachment 16997 [details]
Corresponding xml file

The xml log is generated as soon as the log file name ends with ".xml". The
source start/source end values includes the characters on multiple lines if the
error spawns on more than one line. The line value is the line number where the
problem starts.
Hope this is close to what you want. If yes, I will release that first draft
shortly.
Comment 16 Nick Crossley CLA 2005-01-07 16:20:21 EST
It all looks very good, and is just what I was looking for - but I still do 
not fully understand the 'start' and 'end' attributes on an error.  Suppose I 
see one of your examples:
      <problem start="78" end="81" severity="ERROR" line="3" ...>

From reading the XML, how do I distinguish between an error that starts at 
position 78 of line 3 and ends at position 81 of line 3, vs. an error that 
starts at position 78 of line 3 and ends at position 81 of line 4 or 5?  Do I 
do so by counting the number of lines in the detailed_message element - that 
is, do I assume the first line shown in the detailed message line is the start 
line, and the last line before the ^^^^ indicators is the last list of the 
error?  That's possible, but seems a little fragile.  Perhaps the start and 
end attributes should contain the corresponding line numbers:
      <problem start="3.78" end="4.81" severity="ERROR" ...>

That's a little harder to parse in the most common single-line error case, but 
better than parsing the detailed_message in the more general multi-line case.
Comment 17 Olivier Thomann CLA 2005-01-07 16:37:49 EST
sourceStart and sourceEnd are character positions in the source code. They are
not relative the the corresponding line. They are absolute positions in the
source code. The first character of the source file is 0 and the last one is
file.length - 1.
So you don't know if there is a new line in the middle, but if you extract the
characters between sourceStart (inclusive) and sourceEnd (exclusive) from the
source code, you get the piece of code that is causing the problem.
Isn't this enough?
Comment 18 Nick Crossley CLA 2005-01-07 16:50:01 EST
Thanks - now I understand.  Yes, that is perfectly acceptable!  As far as I am 
concerned, you can resolve the bug - will it get into M5?
Comment 19 Olivier Thomann CLA 2005-01-07 16:52:22 EST
Hopefully yes.
I will write a DTD for the corresponding format.
I will close this PR when everything is released in HEAD.
Comment 20 Philipe Mulet CLA 2005-01-07 19:29:27 EST
Maybe rename start/end into charStart/charEnd to be more obvious.
Comment 21 Olivier Thomann CLA 2005-01-10 13:41:22 EST
Created attachment 17043 [details]
Apply on HEAD

Latest patch. I changed the name. I also include an internal DTD in each log
file. I could use an external DTD, but I didn't have a URL to specify. Maybe we
can provide the DTD on the JDT/Core web page in the development section?
Comment 22 Olivier Thomann CLA 2005-01-10 13:42:15 EST
Created attachment 17044 [details]
Example of log files that can be successfully validated
Comment 23 Olivier Thomann CLA 2005-01-10 13:57:03 EST
Created attachment 17046 [details]
Apply on HEAD

This patch makes the log file to point to an external DTD file called
compiler.dtd that is located in the same folder than the log file. It makes the
log file a bit smaller.
Comment 24 Olivier Thomann CLA 2005-01-10 13:57:45 EST
Created attachment 17047 [details]
Example of log file
Comment 25 Olivier Thomann CLA 2005-01-10 13:58:13 EST
Created attachment 17048 [details]
DTD file
Comment 26 Olivier Thomann CLA 2005-01-10 15:42:29 EST
Created attachment 17056 [details]
New DTD file
Comment 27 Olivier Thomann CLA 2005-01-10 15:44:19 EST
Created attachment 17057 [details]
New patch to apply on HEAD

This should be the final one. Let me know if this fits your expectations. If
yes, it will be released shortly after I made some benchmarks and if the
performances are acceptable.
Comment 28 Nick Crossley CLA 2005-01-11 15:31:56 EST
The sample output all looks good to me.  I have not actually tried to build 
and run the patched compiler myself, but feel no specific need to do so.
Comment 29 Olivier Thomann CLA 2005-01-11 22:35:59 EST
First draft has been released.
Fixed in HEAD.
I will reopen if major problems are found.
Comment 30 Olivier Thomann CLA 2005-01-24 17:15:40 EST
Created attachment 17405 [details]
New DTD file

Working on converting this xml to html I realized that I introduced unnecessary
complexity in the element.
The problem element should contain the problem_source and the message as
parameters and not nested element.
This is a proposal and it has not been released yet.
What do you think?
Comment 31 Olivier Thomann CLA 2005-01-24 17:23:29 EST
I also don't know what to do with the problem_source. The idea of this entry was
to provide the source that is causing the problem, but if I don't provide any
context I don't find this useful.
In the batch compiler, we do provide some context by underlying the
corresponding part of the line.
For example, we provide this:
 (at line 14)
	return (String) "";
	       ^^^^^^^^^^^
Unnecessary cast from String to String

In order to let the user render the context like he wants, I'd like to provide
the following information inside the problem_source.
return #(String) ""#;

So I don't preserve the underlines. The relevant part of the line is between
'#'. This also the converters to parse that string and extract what they want.
Then a HTML converter could underline the guilty part of the line using HTML
tags, whereas a TXT converter would underline the part of the source code using '^'.

Do you have a better idea?
Comment 32 Nick Crossley CLA 2005-01-25 02:54:18 EST
I thought there was an issue with white space normalization in XML attributes -
 that is, white space was always normalized in attributes, whereas that was 
controllable in elements?  Since you might not want white space normalization 
in source and messages, do you really want those as attributes?

As for marking the source, I agree that some context is useful.  I have no 
objection to your proposal in principle, though if you use '#' as the marker, 
how would an actual # character be shown?

One alternative to marking the source would be to provide an attribute 
indicating the start position in the file of the source string.  By 
subtracting this from the start position of the error, a reader could find the 
relative position of the error in the given string.
Comment 33 Olivier Thomann CLA 2005-01-25 10:32:15 EST
Ok, I didn't know that. So I leave them as is. Meaning they will be argument and
not attributes.
I'll try to find a better solution for positions in the context. Adding two
attributes to the problem_source element could be a solution. This would not
pollute the problem itself and like these values would be meaningful only in the
context of the source, I think they are good candidates for attributes in the
problem_source element. The positions would be relative to the source specified
in the element.
Comment 34 Olivier Thomann CLA 2005-02-11 12:52:29 EST
I updated the DTD in HEAD to include the number of tasks in the problem summary.
It is now:

<!ATTLIST problem_summary problems CDATA #REQUIRED
                          errors   CDATA #REQUIRED
                          warnings CDATA #REQUIRED
                          tasks    CDATA #REQUIRED
>

instead of:

<!ATTLIST problem_summary problems CDATA #REQUIRED
                          errors   CDATA #REQUIRED
                          warnings CDATA #REQUIRED
>

Before the number of warnings included the number of tasks. This was not
consistent with the file format that makes the distinction between tasks and
warnings.
Comment 35 Frederic Fusier CLA 2005-02-16 12:15:58 EST
Verified with 3.1 M5 candidate (I20040215-2300)
Comment 36 Nick Crossley CLA 2005-06-01 21:49:13 EDT
Is there any plan to update the file jdt-core-home/howto/batch 
compile/batchCompile.html with information on the XML reports, accessibility 
rules, and other options added to the batch compiler in 3.1?
Comment 37 Olivier Thomann CLA 2005-06-01 21:51:25 EDT
Yes, we will work on the doc before 3.1 is out.