Bug 391791 - Switch HTML files to UTF-8 encoding
Summary: Switch HTML files to UTF-8 encoding
Status: RESOLVED FIXED
Alias: None
Product: Orion (Archived)
Classification: ECD
Component: Client (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 6.0 M2   Edit
Assignee: Ken Walker CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks:
 
Reported: 2012-10-12 09:53 EDT by John Arthorne CLA
Modified: 2014-06-16 12:18 EDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2012-10-12 09:53:25 EDT
Most or all of our HTML pages specify ISO-8859-1, but this should really be UTF8 to support non-latin locales.
Comment 1 Paul Webster CLA 2014-01-31 16:44:34 EST
Hi Hongying.

What about starting with 3 to 5 html files in orion.client/bundles/org.eclipse.orion.client.ui/web

PW
Comment 2 Hongying Zhang CLA 2014-02-03 00:08:16 EST
(In reply to Paul Webster from comment #1)
> Hi Hongying.
> 
> What about starting with 3 to 5 html files in
> orion.client/bundles/org.eclipse.orion.client.ui/web
> 
> PW

Hi Paul:)

I've tried switch the encoding to UTF-8 for a small amount of files manually and I think they worked:) But it seems to be very time-consuming and inefficient to do it like that, there are soooo many html files:(...
And here is one of the possible approach I am thinking about:

I can use OutputStreamWriter to change the encoding of the content of those files to UTF8, however, I think I am also supposed to change the meta tags in those files to have "charset = 'utf8'"...so when I am using the OutputStreamWriter to write the files line by line I can search for that tag and try to modify it to have the right charset...

Does it sounds right to you? 

And also, I am not sure how can I test whether I succeed in change the encoding.
Is having <meta charset = 'utf-8'> and calling file -I on a html/php file and getting something like "temp.php: text/x-php; charset=utf-8" enough?


Thank you so much:)

Best Wishes,
Hongying Zhang
Comment 3 Paul Webster CLA 2014-02-04 11:49:44 EST
(In reply to Hongying Zhang from comment #2)
> I can use OutputStreamWriter to change the encoding of the content of those
> files to UTF8, however, I think I am also supposed to change the meta tags
> in those files to have "charset = 'utf8'"...so when I am using the
> OutputStreamWriter to write the files line by line I can search for that tag
> and try to modify it to have the right charset...

I talked to John A and the files themselves are probably full of english ASCII, so I don't think we need to sanitize the files themselves.  So we need to update the header with charset=utf-8"

I would still pick a directory at a time, convert them, and make sure you can enter a pull request.

I realize this devolves into a search/replace exercise, so it's just practice until Ken and you can find something larger in scope.  I've asked John A to suggest something for this weekend.

PW
Comment 4 Akihiko Takajo CLA 2014-04-22 01:12:30 EDT
(In reply to Paul Webster from comment #3) 
> I talked to John A and the files themselves are probably full of english
> ASCII, so I don't think we need to sanitize the files themselves.  So we
> need to update the header with charset=utf-8"
> 
> I would still pick a directory at a time, convert them, and make sure you
> can enter a pull request.
> 
> I realize this devolves into a search/replace exercise, so it's just
> practice until Ken and you can find something larger in scope.  I've asked
> John A to suggest something for this weekend.
> 
> PW

Before starting test(search/replace etc) with non-latin language. 
I hope charset is updated to utf-8. Is there a plan to update in https://orion.eclipse.org ?
Comment 5 Ken Walker CLA 2014-04-30 13:28:47 EDT
Updated client repo .html files with the following commit

http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=3ed3d7ec0701e084454fcec7cd0a6ce36e720241
Comment 6 Akihiko Takajo CLA 2014-05-05 03:17:15 EDT
verified to be updated to UTF-8 on orion.eclipse.org
Comment 7 Anthony Hunter CLA 2014-06-16 12:18:16 EDT
(In reply to Akihiko Takajo from comment #6)
> verified to be updated to UTF-8 on orion.eclipse.org

Marking resolved then.