Bug 578796 - Move to utf-8 encoding by default across all OS configurations
Summary: Move to utf-8 encoding by default across all OS configurations
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: IDE (show other bugs)
Version: 4.23   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.24 M2   Edit
Assignee: Andrey Loskutov CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
Depends on: 479450 479451 516583
Blocks:
  Show dependency tree
 
Reported: 2022-02-17 03:25 EST by Sravan Kumar Lakkimsetti CLA
Modified: 2022-08-12 16:57 EDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sravan Kumar Lakkimsetti CLA 2022-02-17 03:25:18 EST
With https://openjdk.java.net/jeps/400 Java is making utf-8 as default for all standard java api. 

With this We should move to UTF-8 encoding by default. 

See: Bug 516583 for previous discussions

With java expecting UTF-8 We will definitely have problems in windows when we move to java 18 and above. My suggestion is to move to utf-8 in 4.24 when we start supporting java 18
Comment 1 Lars Vogel CLA 2022-02-17 03:57:10 EST
+1 as PMC member
Comment 2 Thomas Wolf CLA 2022-02-17 04:08:34 EST
I have used in the past Eclipse instances running on machines that were explicitly set to some non-UTF-8 locale for reasons unrelated to Eclipse.

Bear in mind that in such an environment you have to be careful wherever you communicate with external processes. These may produce non-UTF-8 on their stdout, and expect non-UTF-8 on their stdin.

In JGit and EGit we already made sure we never rely on any default encoding; and where JGit or EGit call external programs, we always make sure we use the locale-dependent encoding obtained the way JEP-400 recommends.

See https://git.eclipse.org/r/c/jgit/jgit/+/185995 .
Comment 3 Andrey Loskutov CLA 2022-02-17 04:32:51 EST
Switching *workspace* default to UTF8 without explicit project encoding set would lead to the problem, that existing workspaces that were using "wrong" encoding by default with projects inheriting this setting would change encoding of existing data.

See bug 516583 comment 9 and bug 479450 comment 12 for things that are needed to have utf-8 enabled by default for *new* projects, and bug 479451 for the existing to migrate.

So to switch workspace default to explicit UTF-8 from implicit system encoding we must make sure that:

1) All new projects have explicit UTF-8 encoding set (bug 479450).

2) All existing projects in the workspace without explicit encoding set are either get "previous" implicit system encoding explicitly set (how? is this automatic assignment safe?) OR get an error marker that explicit encoding should be set on project (bug 479451), and ideally a quick fix to do so.

3) Change the code to explicitly set workspace encoding to UTF-8 if not yet changed by user.

For 1) an old patch is available, but as commented, it would lead to many test fails in various component tests making assumptions about project content after project creation - so some effort is needed to fix all of them.

For 2) no patch is available, but I think that can be done automatically on project opening. This should be doable.

For 3) there is no bug and not patch yet, I haven't made an assessment how much effort that could be, assuming not that much.
Comment 4 Andrey Loskutov CLA 2022-04-14 12:45:46 EDT
(In reply to Andrey Loskutov from comment #3)
> Switching *workspace* default to UTF8 without explicit project encoding set
> would lead to the problem, that existing workspaces that were using "wrong"
> encoding by default with projects inheriting this setting would change
> encoding of existing data.
> 
> See bug 516583 comment 9 and bug 479450 comment 12 for things that are
> needed to have utf-8 enabled by default for *new* projects, and bug 479451
> for the existing to migrate.
> 
> So to switch workspace default to explicit UTF-8 from implicit system
> encoding we must make sure that:
> 
> 1) All new projects have explicit UTF-8 encoding set (bug 479450).

Done (explicit encoding is now always, but not necessarily UTF-8)

> 2) All existing projects in the workspace without explicit encoding set are
> either get "previous" implicit system encoding explicitly set (how? is this
> automatic assignment safe?) OR get an error marker that explicit encoding
> should be set on project (bug 479451), and ideally a quick fix to do so.

Done (warning marker + quick fix)

> 3) Change the code to explicitly set workspace encoding to UTF-8 if not yet
> changed by user.

Bug 516583, PR is set.
Comment 5 Andrey Loskutov CLA 2022-04-16 03:23:22 EDT
Bug 516583 is fixed now with I20220415-1800 build.

Sravan, anything else missing?
Comment 6 Sravan Kumar Lakkimsetti CLA 2022-04-16 03:29:39 EDT
(In reply to Andrey Loskutov from comment #5)
> Bug 516583 is fixed now with I20220415-1800 build.
> 
> Sravan, anything else missing?

My wish is to convert existing projects to utf-8. Since we already handle the different encodings I don't think that would be necessary now. 

I think we can close this now
Comment 7 Andrey Loskutov CLA 2022-04-16 03:33:08 EDT
(In reply to Sravan Kumar Lakkimsetti from comment #6)
> My wish is to convert existing projects to utf-8. Since we already handle
> the different encodings I don't think that would be necessary now. 

I think this shouldn't be done automatically.

> I think we can close this now

OK.
Comment 8 Andrey Loskutov CLA 2022-04-16 03:34:57 EDT
One thing is missing - N&N entry.
Comment 9 Andrey Loskutov CLA 2022-04-30 15:28:06 EDT
(In reply to Andrey Loskutov from comment #8)
> One thing is missing - N&N entry.

Added via https://github.com/eclipse-platform/www.eclipse.org-eclipse-news/pull/16