Community
Participate
Working Groups
Created attachment 274203 [details] Reproducer For a while we have relied on setting the file.encoding property in order to compile files under various encodings with Eclipse (I am aware that it's publicly documented that users should not set this). With Java 9, that's no longer possible due to differences in how file.encoding is treated on MacOSX (and I believe also Solaris SPARC). Notably, only encodings in the Basic Encoding Set are respected. I'm not sure if this is intentional because the behavior varies based on platform. This used to work under java 8: $ cat Test.java import java.nio.charset.Charset; public class Test { public static void main(String[] args) { System.err.println("Default charset is: " + Charset.defaultCharset()); } } $ /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/bin/javac Test.java $ /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/bin/java Test Default charset is: UTF-8 $ /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/bin/java -Dfile.encoding=Shift_JIS Test Default charset is: Shift_JIS But under java 9: $ java --version java 9.0.4 Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode) $ java -Dfile.encoding=Shift_JIS Test Default charset is: UTF-8 $ java -Dfile.encoding=CESU-8 Test Default charset is: CESU-8 Note that CESU-8 is in the Basic Encoding Set so file.encoding respects the value. I also validated this by recompiling OpenJDK with Shift JIS moved to the Basic Encoding Set: $ ../jdk-9-modified/bin/java -version openjdk version "9-internal" OpenJDK Runtime Environment (build 9-internal+0-adhoc.rulch.openjdk) OpenJDK 64-Bit Server VM (build 9-internal+0-adhoc.rulch.openjdk, mixed mode) $ ../jdk-9-modified/bin/java -Dfile.encoding=Shift_JIS Test Default charset is: Shift_JIS (and for sanity, an unmodified JDK I compiled): $ ../jdk-9-unmodified/bin/java -Dfile.encoding=Shift_JIS Test Default charset is: UTF-8 I'm on MacOSX 10.12.6: $ uname -a Darwin rulch-mac 16.7.0 Darwin Kernel Version 16.7.0: Tue Jan 30 11:27:06 PST 2018; root:xnu-3789.73.11~1/RELEASE_X86_64 x86_64 Also note that this behavior is *not* present on a JDK distribution on Linux (ubuntu 16.04): $ uname -a Linux d-ubuntu16x64-19 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ java -version java version "9.0.4" Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode) $ java -Dfile.encoding=Shift_JIS Test Default charset is: Shift_JIS Anyways, enough about the JRE behavior differences. I eventually tracked down the real cause of the failure to the reliance on the default charset when a CompilationUnit is created with a null encoding during path processing in eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java::processPathEntries Specifically, addNewEntry is invoked with a null customEncoding argument and then eventually a CompilationUnit is created with that same value. The addNewEntry method (@ org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java): 1565 protected void addNewEntry(ArrayList<FileSystem.Classpath> paths, String currentClasspathName, 1566 ArrayList<String> currentRuleSpecs, String customEncoding, 1567 String destPath, boolean isSourceOnly, 1568 boolean rejectDestinationPathOnJars) { What I've done to fix this for ourselves is when customEncoding is null, I substitute in the value from CompilerOptions.OPTION_Encoding. That may also be null if the option is not specified, but then it is appropriate to rely on the default charset. Using the attached reproducer: 1) Extract via tar -xf reproducer.tar.gz 2) Make sure java on path is a java 9 JRE for the sanity test in the script 3) set JDT_CORE_JAR envvar to point at an org.eclipse.jdt.core jar 4) run test.sh script EG: rulch@rulch-mac: ~/work/testing/native-ecj-shift-jis/src/subdir $ tar -xf reproducer.tar.gz rulch@rulch-mac: ~/work/testing/native-ecj-shift-jis/src/subdir rulch@rulch-mac: ~/work/testing/native-ecj-shift-jis/src/subdir $ JDT_CORE_JAR=/Users/rulch/work/prevent/packages/eclipse/org.eclipse.jdt.core-3.13.102-SNAPSHOT.jar ./test.sh *snipped output* And as a reference, here is a snippet of the output from the failing invocations: ---------- 1. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/表丕表/丕表丕/AAA示言示zzz.java (at line 2) import b示丕.BBB示丕yyy; ^^^^^^^^^^^^ The import b示丕.BBB示丕yyy cannot be resolved ---------- 2. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/表丕表/丕表丕/AAA示言示zzz.java (at line 5) BBB示丕yyy.printBBB示丕yyy(); ^^^^^^^^ BBB示丕yyy cannot be resolved ---------- ---------- 3. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/示丕/b示丕/BBB示丕yyy.java (at line 1) package b����; ^^^^ Syntax error on tokens, delete these tokens ---------- 4. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/示丕/b示丕/BBB示丕yyy.java (at line 2) public class BBB����yyy { ^^^ The public type BBB must be defined in its own file ---------- *more errors*
Reference with a list of encodings in the Basic Encoding Set and Extended Encoding Set: https://docs.oracle.com/javase/9/intl/supported-encodings.htm
This looks to still be an issue in R4_10_maintenance. Note that in the giant wall of text that is the original comment, I have a suggested fix. A quote of the relevant portion: (In reply to Ryan Ulch from comment #0) > I eventually tracked > down the real cause of the failure to the reliance on the default charset > when a CompilationUnit is created with a null encoding during path > processing in > eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java:: > processPathEntries > > Specifically, addNewEntry is invoked with a null customEncoding argument and > then eventually a CompilationUnit is created with that same value. The > addNewEntry method (@ > org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main. > java): > > 1565 protected void addNewEntry(ArrayList<FileSystem.Classpath> paths, > String currentClasspathName, > 1566 ArrayList<String> currentRuleSpecs, String customEncoding, > 1567 String destPath, boolean isSourceOnly, > 1568 boolean rejectDestinationPathOnJars) { > > > What I've done to fix this for ourselves is when customEncoding is null, I > substitute in the value from CompilerOptions.OPTION_Encoding. That may also > be null if the option is not specified, but then it is appropriate to rely > on the default charset. And a potential fix to remove the reliance on the default charset: $ git diff diff --git a/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java b/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java index ec77b2ada8..453bea519e 100644 --- a/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java +++ b/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java @@ -1576,7 +1576,9 @@ protected void addNewEntry(ArrayList<FileSystem.Classpath> paths, String current ArrayList<String> currentRuleSpecs, String customEncoding, String destPath, boolean isSourceOnly, boolean rejectDestinationPathOnJars) { - + if (customEncoding == null) { + customEncoding = this.options.get(CompilerOptions.OPTION_Encoding); + } int rulesSpecsSize = currentRuleSpecs.size(); AccessRuleSet accessRuleSet = null; if (rulesSpecsSize != 0) {
Ryan, we use Gerrit to submit and review patches. See for example https://wiki.eclipse.org/Platform_UI/How_to_Contribute
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie.