Community
Participate
Working Groups
In order to integrate the Eclipse batch compiler into processes and tools, it would be useful to have an option for the compiler to write its messages in XML. For examples, FindBugs and Checkstyle provide XML output options. The Eclipse compler XML output could include: 1. The compiler version 2. Command line arguments 3. Names of source files compiled 4. Names of class files written 5. Errors and warnings 6. Summary and statistics For each error or warning element written, it would be useful if a problem type was given, in addition to the specific message. For example, if the compiler produced a warning of the form "Local variable abcdef is never read", the XML error element should include some attribute or element of the form "LocalVariableUnused". This would allow surrounding tools to group and report error and warning messages by type without complex regular expression matching.
It would be trivial to add this into the batch compiler. Don't hesitate to provide a patch.
OK, I'll work on a patch/suggested implementation. It probably won't be until December that I'll get a chance to do this. Would you prefer my patch to be based on the CVS HEAD at that time, or against the latest 3.1 milestone or integration build? Could you also please point me at some existing code in Eclipse that shows your preferred techniques for writing XML.
A patch against HEAD would be better. You can see an example of writing XML in org.eclipse.jdt.internal.core.JavaProject#encodeClasspath(...)
Any news on that front?
I expect to work on this during my Christmas break over the next two weeks. Look for an update in the first week in January when I return.
ok, if you haven't started yet, I'd like to provide a first implementation that you could review.
That would be great!
Do you think such output would be sufficient? <?xml version="1.0" encoding="UTF-8"?> <compiler name="Eclipse Java Compiler" version="0.529, pre-3.1.0 milestone-4"> <problems> <problem start="78" end="81" severity="ERROR" line="3" source="C:\tests_sources\Test.java" id="IncompatibleReturnType"> <message value="The return type is incompatible with Writer.append(char), PrintWriter.append(char)"/> <arguments> <argument value="java.io.Writer.append(char), java.io.PrintWriter.append(char)"/> </arguments> </problem> <problem start="78" end="81" severity="ERROR" line="3" source="C:\tests_sources\Test.java" id="IncompatibleReturnType"> <message value="The return type is incompatible with Writer.append(CharSequence, int, int), PrintWriter.append(CharSequence, int, int)"/> <arguments> <argument value="java.io.Writer.append(CharSequence, int, int), java.io.PrintWriter.append(CharSequence, int, int)"/> </arguments> </problem> <problem start="78" end="81" severity="ERROR" line="3" source="C:\tests_sources\Test.java" id="IncompatibleReturnType"> <message value="The return type is incompatible with Writer.append(CharSequence), PrintWriter.append(CharSequence)"/> <arguments> <argument value="java.io.Writer.append(CharSequence), java.io.PrintWriter.append(CharSequence)"/> </arguments> </problem> <problem start="287" end="297" severity="WARNING" line="11" source="C:\tests_sources\Test.java" id="UnnecessaryCast"> <message value="Unnecessary cast from String to String"/> <arguments> <argument value="java.lang.String"/> <argument value="java.lang.String"/> </arguments> </problem> </problems> <problem_summary problems="4" errors="3" warnings="1"/> <command_line> <argument value="C:\tests_sources\Test.java"/> <argument value="-1.5"/> <argument value="-source"/> <argument value="1.4"/> <argument value="-g"/> <argument value="-d"/> <argument value="c:\tests_sources"/> <argument value="-verbose"/> <argument value="-classpath"/> <argument value="C:\tests_sources"/> <argument value="-log"/> <argument value="c:\log.xml"/> <argument value="-warn:+uselessTypeCheck"/> </command_line> </compiler>
Created attachment 16918 [details] Apply on HEAD Here is the corresponding implementation. We can still discuss if we want more information. The log is generated only in case of errors or warnings.
This is very much what I was looking for - thanks! A couple of questions: 1. I presume the 'start' and 'end' attributes are character or byte locations of the error in the given source line? What if the error extends over more than one line? (The current messages produced by the compiler sometimes include more than one line of source.) 2. You say the log is only produced in case of errors or warnings - wouldn't that make it a little harder to script? And wouldn't the compiler and command line information be useful in the event of a successful compiler run? I would suggest producing an XML log if a command line flag was given to request it, regardless of the number of errors or warnings.
I suggest the following format: <compiler ....> <sources> <source path="......"> <problem_summary problems="4" errors="3" warnings="1"> <problem ...> </problem_summary> <classfile path="...."/> <tasks> <task message="...."/> </tasks> </source> </sources> <command-line> ... </command-line> <stats ... /> </compiler> This would allow to get all source files compiled and all errors for each source files + all class files generated for each source files. Would this be good enough?
Shouldn't the classpath and options also be surfaced ? In command line, it requires to be decoded.
The command-line part would include the whole command line argument with this format: <command_line> <argument value="C:\tests_sources\Test.java"/> <argument value="-1.5"/> <argument value="-source"/> <argument value="1.4"/> <argument value="-g"/> <argument value="-d"/> <argument value="c:\tests_sources"/> <argument value="-verbose"/> <argument value="-classpath"/> <argument value="C:\tests_sources"/> <argument value="-log"/> <argument value="c:\log.xml"/> <argument value="-warn:+uselessTypeCheck"/> </command_line> Sorry if this was unclear. Is this enough? If not, let me know what you expect.
Created attachment 16996 [details] New patch to apply on HEAD
Created attachment 16997 [details] Corresponding xml file The xml log is generated as soon as the log file name ends with ".xml". The source start/source end values includes the characters on multiple lines if the error spawns on more than one line. The line value is the line number where the problem starts. Hope this is close to what you want. If yes, I will release that first draft shortly.
It all looks very good, and is just what I was looking for - but I still do not fully understand the 'start' and 'end' attributes on an error. Suppose I see one of your examples: <problem start="78" end="81" severity="ERROR" line="3" ...> From reading the XML, how do I distinguish between an error that starts at position 78 of line 3 and ends at position 81 of line 3, vs. an error that starts at position 78 of line 3 and ends at position 81 of line 4 or 5? Do I do so by counting the number of lines in the detailed_message element - that is, do I assume the first line shown in the detailed message line is the start line, and the last line before the ^^^^ indicators is the last list of the error? That's possible, but seems a little fragile. Perhaps the start and end attributes should contain the corresponding line numbers: <problem start="3.78" end="4.81" severity="ERROR" ...> That's a little harder to parse in the most common single-line error case, but better than parsing the detailed_message in the more general multi-line case.
sourceStart and sourceEnd are character positions in the source code. They are not relative the the corresponding line. They are absolute positions in the source code. The first character of the source file is 0 and the last one is file.length - 1. So you don't know if there is a new line in the middle, but if you extract the characters between sourceStart (inclusive) and sourceEnd (exclusive) from the source code, you get the piece of code that is causing the problem. Isn't this enough?
Thanks - now I understand. Yes, that is perfectly acceptable! As far as I am concerned, you can resolve the bug - will it get into M5?
Hopefully yes. I will write a DTD for the corresponding format. I will close this PR when everything is released in HEAD.
Maybe rename start/end into charStart/charEnd to be more obvious.
Created attachment 17043 [details] Apply on HEAD Latest patch. I changed the name. I also include an internal DTD in each log file. I could use an external DTD, but I didn't have a URL to specify. Maybe we can provide the DTD on the JDT/Core web page in the development section?
Created attachment 17044 [details] Example of log files that can be successfully validated
Created attachment 17046 [details] Apply on HEAD This patch makes the log file to point to an external DTD file called compiler.dtd that is located in the same folder than the log file. It makes the log file a bit smaller.
Created attachment 17047 [details] Example of log file
Created attachment 17048 [details] DTD file
Created attachment 17056 [details] New DTD file
Created attachment 17057 [details] New patch to apply on HEAD This should be the final one. Let me know if this fits your expectations. If yes, it will be released shortly after I made some benchmarks and if the performances are acceptable.
The sample output all looks good to me. I have not actually tried to build and run the patched compiler myself, but feel no specific need to do so.
First draft has been released. Fixed in HEAD. I will reopen if major problems are found.
Created attachment 17405 [details] New DTD file Working on converting this xml to html I realized that I introduced unnecessary complexity in the element. The problem element should contain the problem_source and the message as parameters and not nested element. This is a proposal and it has not been released yet. What do you think?
I also don't know what to do with the problem_source. The idea of this entry was to provide the source that is causing the problem, but if I don't provide any context I don't find this useful. In the batch compiler, we do provide some context by underlying the corresponding part of the line. For example, we provide this: (at line 14) return (String) ""; ^^^^^^^^^^^ Unnecessary cast from String to String In order to let the user render the context like he wants, I'd like to provide the following information inside the problem_source. return #(String) ""#; So I don't preserve the underlines. The relevant part of the line is between '#'. This also the converters to parse that string and extract what they want. Then a HTML converter could underline the guilty part of the line using HTML tags, whereas a TXT converter would underline the part of the source code using '^'. Do you have a better idea?
I thought there was an issue with white space normalization in XML attributes - that is, white space was always normalized in attributes, whereas that was controllable in elements? Since you might not want white space normalization in source and messages, do you really want those as attributes? As for marking the source, I agree that some context is useful. I have no objection to your proposal in principle, though if you use '#' as the marker, how would an actual # character be shown? One alternative to marking the source would be to provide an attribute indicating the start position in the file of the source string. By subtracting this from the start position of the error, a reader could find the relative position of the error in the given string.
Ok, I didn't know that. So I leave them as is. Meaning they will be argument and not attributes. I'll try to find a better solution for positions in the context. Adding two attributes to the problem_source element could be a solution. This would not pollute the problem itself and like these values would be meaningful only in the context of the source, I think they are good candidates for attributes in the problem_source element. The positions would be relative to the source specified in the element.
I updated the DTD in HEAD to include the number of tasks in the problem summary. It is now: <!ATTLIST problem_summary problems CDATA #REQUIRED errors CDATA #REQUIRED warnings CDATA #REQUIRED tasks CDATA #REQUIRED > instead of: <!ATTLIST problem_summary problems CDATA #REQUIRED errors CDATA #REQUIRED warnings CDATA #REQUIRED > Before the number of warnings included the number of tasks. This was not consistent with the file format that makes the distinction between tasks and warnings.
Verified with 3.1 M5 candidate (I20040215-2300)
Is there any plan to update the file jdt-core-home/howto/batch compile/batchCompile.html with information on the XML reports, accessibility rules, and other options added to the batch compiler in 3.1?
Yes, we will work on the doc before 3.1 is out.