Bug 535135 - Encodings in the Extended Encoding Set not respected for imports when executing batch compiler under JRE9 on MacOSX
Summary: Encodings in the Extended Encoding Set not respected for imports when executi...
Status: NEW
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.8   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: JDT-Core-Inbox CLA
QA Contact:
URL:
Whiteboard: stalebug
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-25 13:30 EDT by Ryan Ulch CLA
Modified: 2023-01-07 09:53 EST (History)
3 users (show)

See Also:


Attachments
Reproducer (1.24 KB, application/x-gzip)
2018-05-25 13:30 EDT, Ryan Ulch CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Ulch CLA 2018-05-25 13:30:27 EDT
Created attachment 274203 [details]
Reproducer

For a while we have relied on setting the file.encoding property in order to compile files under various encodings with Eclipse (I am aware that it's publicly documented that users should not set this). With Java 9, that's no longer possible due to differences in how file.encoding is treated on MacOSX (and I believe also Solaris SPARC). Notably, only encodings in the Basic Encoding Set are respected. I'm not sure if this is intentional because the behavior varies based on platform. This used to work under java 8:

$ cat Test.java
import java.nio.charset.Charset;

public class Test {
    public static void main(String[] args) {
        System.err.println("Default charset is: " + Charset.defaultCharset());
    }
}

$ /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/bin/javac Test.java

$ /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/bin/java Test
Default charset is: UTF-8

$ /Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home/bin/java -Dfile.encoding=Shift_JIS Test
Default charset is: Shift_JIS

But under java 9:

$ java --version
java 9.0.4
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

$ java -Dfile.encoding=Shift_JIS Test
Default charset is: UTF-8

$ java -Dfile.encoding=CESU-8 Test
Default charset is: CESU-8

Note that CESU-8 is in the Basic Encoding Set so file.encoding respects the value. I also validated this by recompiling OpenJDK with Shift JIS moved to the Basic Encoding Set:

$ ../jdk-9-modified/bin/java -version
openjdk version "9-internal"
OpenJDK Runtime Environment (build 9-internal+0-adhoc.rulch.openjdk)
OpenJDK 64-Bit Server VM (build 9-internal+0-adhoc.rulch.openjdk, mixed mode)

$ ../jdk-9-modified/bin/java -Dfile.encoding=Shift_JIS Test
Default charset is: Shift_JIS

(and for sanity, an unmodified JDK I compiled):
$ ../jdk-9-unmodified/bin/java -Dfile.encoding=Shift_JIS Test
Default charset is: UTF-8

I'm on MacOSX 10.12.6:

$ uname -a
Darwin rulch-mac 16.7.0 Darwin Kernel Version 16.7.0: Tue Jan 30 11:27:06 PST 2018; root:xnu-3789.73.11~1/RELEASE_X86_64 x86_64

Also note that this behavior is *not* present on a JDK distribution on Linux (ubuntu 16.04):

$ uname -a
Linux d-ubuntu16x64-19 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ java -version
java version "9.0.4"
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

$ java -Dfile.encoding=Shift_JIS Test
Default charset is: Shift_JIS

Anyways, enough about the JRE behavior differences. I eventually tracked down the real cause of the failure to the reliance on the default charset when a CompilationUnit is created with a null encoding during path processing in
eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java::processPathEntries

Specifically, addNewEntry is invoked with a null customEncoding argument and then eventually a CompilationUnit is created with that same value. The addNewEntry method (@ org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java):

1565 protected void addNewEntry(ArrayList<FileSystem.Classpath> paths, String currentClasspathName,
1566         ArrayList<String> currentRuleSpecs, String customEncoding,
1567         String destPath, boolean isSourceOnly,
1568         boolean rejectDestinationPathOnJars) {


What I've done to fix this for ourselves is when customEncoding is null, I substitute in the value from CompilerOptions.OPTION_Encoding. That may also be null if the option is not specified, but then it is appropriate to rely on the default charset.

Using the attached reproducer:
1) Extract via tar -xf reproducer.tar.gz
2) Make sure java on path is a java 9 JRE for the sanity test in the script
3) set JDT_CORE_JAR envvar to point at an org.eclipse.jdt.core jar
4) run test.sh script

EG:
rulch@rulch-mac: ~/work/testing/native-ecj-shift-jis/src/subdir
$ tar -xf reproducer.tar.gz
rulch@rulch-mac: ~/work/testing/native-ecj-shift-jis/src/subdir
rulch@rulch-mac: ~/work/testing/native-ecj-shift-jis/src/subdir
$ JDT_CORE_JAR=/Users/rulch/work/prevent/packages/eclipse/org.eclipse.jdt.core-3.13.102-SNAPSHOT.jar ./test.sh
*snipped output*

And as a reference, here is a snippet of the output from the failing invocations:

----------
1. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/表丕表/丕表丕/AAA示言示zzz.java (at line 2)
	import b示丕.BBB示丕yyy;
	       ^^^^^^^^^^^^
The import b示丕.BBB示丕yyy cannot be resolved
----------
2. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/表丕表/丕表丕/AAA示言示zzz.java (at line 5)
	BBB示丕yyy.printBBB示丕yyy();
	^^^^^^^^
BBB示丕yyy cannot be resolved
----------
----------
3. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/示丕/b示丕/BBB示丕yyy.java (at line 1)
	package b����;
	         ^^^^
Syntax error on tokens, delete these tokens
----------
4. ERROR in /Users/rulch/work/testing/native-ecj-shift-jis/src/subdir/test/src/示丕/b示丕/BBB示丕yyy.java (at line 2)
	public class BBB����yyy {
	             ^^^
The public type BBB must be defined in its own file
----------
*more errors*
Comment 1 Ryan Ulch CLA 2018-05-25 16:01:47 EDT
Reference with a list of encodings in the Basic Encoding Set and Extended Encoding Set:
https://docs.oracle.com/javase/9/intl/supported-encodings.htm
Comment 2 Ryan Ulch CLA 2019-01-23 12:43:41 EST
This looks to still be an issue in R4_10_maintenance. Note that in the giant wall of text that is the original comment, I have a suggested fix. A quote of the relevant portion:

(In reply to Ryan Ulch from comment #0)
> I eventually tracked
> down the real cause of the failure to the reliance on the default charset
> when a CompilationUnit is created with a null encoding during path
> processing in
> eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java::
> processPathEntries
> 
> Specifically, addNewEntry is invoked with a null customEncoding argument and
> then eventually a CompilationUnit is created with that same value. The
> addNewEntry method (@
> org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.
> java):
> 
> 1565 protected void addNewEntry(ArrayList<FileSystem.Classpath> paths,
> String currentClasspathName,
> 1566         ArrayList<String> currentRuleSpecs, String customEncoding,
> 1567         String destPath, boolean isSourceOnly,
> 1568         boolean rejectDestinationPathOnJars) {
> 
> 
> What I've done to fix this for ourselves is when customEncoding is null, I
> substitute in the value from CompilerOptions.OPTION_Encoding. That may also
> be null if the option is not specified, but then it is appropriate to rely
> on the default charset.

And a potential fix to remove the reliance on the default charset:

$ git diff
diff --git a/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java b/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java
index ec77b2ada8..453bea519e 100644
--- a/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java
+++ b/org.eclipse.jdt.core/batch/org/eclipse/jdt/internal/compiler/batch/Main.java
@@ -1576,7 +1576,9 @@ protected void addNewEntry(ArrayList<FileSystem.Classpath> paths, String current
                ArrayList<String> currentRuleSpecs, String customEncoding,
                String destPath, boolean isSourceOnly,
                boolean rejectDestinationPathOnJars) {
-
+    if (customEncoding == null) {
+        customEncoding = this.options.get(CompilerOptions.OPTION_Encoding);
+    }
        int rulesSpecsSize = currentRuleSpecs.size();
        AccessRuleSet accessRuleSet = null;
        if (rulesSpecsSize != 0) {
Comment 3 Andrey Loskutov CLA 2019-01-24 08:45:01 EST
Ryan, we use Gerrit to submit and review patches.

See for example
https://wiki.eclipse.org/Platform_UI/How_to_Contribute
Comment 4 Eclipse Genie CLA 2021-01-14 13:00:15 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 5 Eclipse Genie CLA 2023-01-07 09:53:09 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.