Community
Participate
Working Groups
- invoke JDT compiler in Chinese locale with -log option - log file header indicates UTF-8 encoding - try to read file in UTF-8 (e.g. using Jazz's jdtCompileLogPublisher Ant task) - it fails complaining: XML parsing error: "1 字节 UTF-8 序列的无效字节 1。" at line "2". which roughly translates as 1 byte UTF-8 sequence's invalid byte 1" Line 2 is the date line, e.g. the first few lines of a log in English are: <?xml version="1.0" encoding="UTF-8"?> <!-- 07/05/09 1:31:24 EDT PM --> <!DOCTYPE compiler PUBLIC "-//Eclipse.org//DTD Eclipse JDT 3.2.003 Compiler//EN" "http://www.eclipse.org/jdt/core/compiler_32_003.dtd"> <compiler copyright="Copyright IBM Corp 2000, 2007. All rights reserved." name="Eclipse Java Compiler" version="0.780_R33x, 3.3.1"> <command_line> The problem appears to be in Main$Logger.setLog where it does: this.log = new PrintWriter(new FileOutputStream(logFileName, false)); This uses the default encoding. It should instead use this.log = new PrintWriter(new FileOutputStream(logFileName, Util.UTF_8)); This is blocking TVT testing of Jazz, but we are looking into a workaround, e.g. specifying -Dfile.encoding=UTF-8 on the command line.
Note: I'd expect the date line to be in the locale-specific format, which would likely use double-bytes in the Chinese ('zh') locale. I also noticed that the line that writes the date: this.log.println("<!-- " + new String(dateFormat.format(date).getBytes(), Util.UTF_8) + " -->");//$NON-NLS-1$//$NON-NLS-2$ converts between encodings incorrectly: it's converting to bytes using the default encoding, then back to a string using UTF-8. There's no need for this conversion. it should just do: this.log.println("<!-- " + dateFormat.format(date) + " -->");//$NON-NLS-1$//$NON-NLS-2$
Hm, the use of the default encoding for the PrintWriter might not be the problem. The name of the log file we're using (in the Ant script) is declared as: <property name="compileLog" value="${java.io.tmpdir}/compilelog.xml"/> The Logger code tries to handle XML files differently: int index = logFileName.lastIndexOf('.'); if (index != -1) { if (logFileName.substring(index).toLowerCase().equals(".xml")) { //$NON-NLS-1$ this.log = new GenericXMLWriter(new OutputStreamWriter(new FileOutputStream(logFileName, false), Util.UTF_8), Util.LINE_SEPARATOR, true); which looks good to me. We're invoking the Ant javac task with: <javac destdir="${build.output}" failonerror="false" debug="on" debuglevel="2" includes="**/*.java, *.java" srcdir="${workingDir}"> <compilerarg line="-log ${compileLog}"/> </javac> It may be that the expansion of ${java.io.tmpdir} is confusing things (though it works OK for me on WinXP in English Canada locale). I'll dig further.
Turns out we were running an older version of the compiler (from 3.3). Looks like the main issue with the encoding was fixed in 3.4. Earlier versions (3.2 and 3.3) use the default encoding: this.log = new GenericXMLWriter(new FileOutputStream(logFileName, false), Util.LINE_SEPARATOR, true); You might still want to consider the minor issue in comment 1.
Reduce to minor as the problem is only with the date encoding.
Created attachment 134997 [details] Proposed fix
Patch fixes problem mentionned in comment 1. David, please review.
Patch looks good.
Released for 3.5RC1. Code verification is required in order to verify this fix.
Verified using I20090513-2000