[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [platform-help-dev] Is UTF-8 encoding assumed for all languages?
|
Hi Konrad:
The files I am viewing do contain the expected <meta HTTP-EQUIV blah>
element specifying the code page.
Another data point to consider: the only time I see corrupted characters is
when I'm viewing the help system through a proxied URL, rather than viewing
the help system directly through the port (e.g.
http://<hostname>:<port>/help/ works fine, but http://<hostname>/infocenter
shows corrupted characters for non-latin1 encodings).
This probably explains why a colleague of mine couldn't reproduce the
problem (and why he thought I was crazy, heh).
I'm running the Eclipse help system on a Linux machine proxied through
Apache 1.3.26. Just noticed that Apache 1.3.27 has been released with the
following bug fix:
<quote>
The following bugs were found in Apache 1.3.26 and have been fixed in
Apache 1.3.27:
mod_proxy fixes:
The cache in mod_proxy was incorrectly updating the
Content-Length value from 304 responses when doing validation.
Fix a problem in proxy where headers from other modules were
added to the response headers when this was already done in the
core already.
</quote>
I wondered whether Apache 1.3.26 was adding a charset header to the
returned document in the proxied help system, so I played with wget asking
for a Russian document (which is encoded in 'win1252'). The wget output is
below; but you can clearly see that in the first case (proxied URL) the web
server is adding a "charset=iso-8859-1" header, which we don't see in the
second case (connecting directly to help system port).
I'll see if I can upgrade to Apache 1.3.27 to reproduce the test (but
hopefully see better test results!). If it turns out that Apache 1.3.27
solves the problem, this will probably be a useful warning to document in
the 'Installing the help system as an infocenter' topic.
wget output:
dan@daniels:~$ wget -S --header='Accept-Language: ru'
http://daniels.hostname.com/prod/infocenter/topic/com.prod.doc/core/filename.htm
--08:45:53--
http://daniels.hostname.com/prod/infocenter/topic/com.prod.doc/core/filename.htm
=> `filename.htm.1'
Resolving daniels.hostname.com... done.
Connecting to daniels.hostname.com[9.26.162.217]:80... connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Date: Thu, 08 May 2003 12:45:53 GMT
3 Server: Apache Tomcat/4.0.6 (HTTP/1.1 Connector)
4 Content-Type: text/html; charset=iso-8859-1
5 Cache-Control: max-age=10000
6 X-Cache: MISS from daniels.hostname.com
7 Connection: close
[ <=>
] 5,581 5.32M/s
08:45:53 (5.32 MB/s) - `filename.htm.1' saved [5581]
dan@daniels:~$ vim filename.htm.1
dan@daniels:~$ wget -S --header='Accept-Language: ru'
http://daniels.hostname.com:8084/help/topic/com.prod.doc/core/filename.htm
--08:46:38--
http://daniels.hostname.com:8084/help/topic/com.prod.doc/core/filename.htm
=> `filename.htm.2'
Resolving daniels.hostname.com... done.
Connecting to daniels.hostname.com[9.26.162.217]:8084... connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Content-Type: text/html
3 Date: Thu, 08 May 2003 12:46:38 GMT
4 Server: Apache Tomcat/4.0.6 (HTTP/1.1 Connector)
5 Cache-Control: max-age=10000
6 Connection: close
[ <=>
] 5,581 5.32M/s
08:46:38 (5.32 MB/s) - `filename.htm.2' saved [5581]
Dan
--
Dan Scott
Konrad
Kolosowski/Toronto/IBM@I To: platform-help-dev@xxxxxxxxxxx
BMCA cc:
Sent by: Subject: Re: [platform-help-dev] Is UTF-8 encoding assumed for all languages?
platform-help-dev-admin@
eclipse.org
07/05/2003 05:45 PM
Please respond to
platform-help-dev
Hi Dan.
There is no assumption on which encoding documents come in. I think your
problem might be that some documents do not specify encoding correctly (for
example, <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=big5">, that Eclipse translation have in the head, for Chinese),
and the browser has to resort to auto detection. The auto detection part
of a particular browser may look at the containing frameset document to
guess.
If the charset is specified as above and you still see the problem, open a
bug against help and we will investigate it.
Konrad Kolosowski
Eclipse Help System
Dan
Scott/Toronto/IBM@IBMCA To:
platform-help-dev@xxxxxxxxxxx
Sent by: cc:
platform-help-dev-admin@ Subject:
[platform-help-dev] Is UTF-8 encoding assumed for all languages?
eclipse.org
05/07/2003 05:13 PM
Please respond to
platform-help-dev
Hi:
I'm experiencing some strangeness with NL content in the help system. I
have Russian documents (navigation and help files) encoded in windows-1251
code page that sometimes display as gibberish.
It looks to me like the frameset document (index.jsp) encoding of UTF-8 is
interfering with the browser's interpretation of the encodings in the
individual frames.
This problem occurs in both Mozilla 1.3.1 and Internet Explorer 6. Is this
a known limitation of the help system or of browsers?
I suppose a workaround would be to convert all of our help content to UTF-8
before generating the doc plugins... yikes.
Dan
--
Dan Scott
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev