Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-help-dev] Is UTF-8 encoding assumed for all languages?

Hi Konrad:

The files I am viewing do contain the expected <meta HTTP-EQUIV blah>
element specifying the code page.

Another data point to consider: the only time I see corrupted characters is
when I'm viewing the help system through a proxied URL, rather than viewing
the help system directly through the port (e.g.
http://<hostname>:<port>/help/ works fine, but http://<hostname>/infocenter
shows corrupted characters for non-latin1 encodings).

This probably explains why a colleague of mine couldn't reproduce the
problem (and why he thought I was crazy, heh).

I'm running the Eclipse help system on a Linux machine proxied through
Apache 1.3.26. Just noticed that Apache 1.3.27 has been released with the
following bug fix:



<quote>


The following bugs were found in Apache 1.3.26 and have been fixed in
Apache 1.3.27:
      mod_proxy fixes:
            The cache in mod_proxy was incorrectly updating the
            Content-Length value from 304 responses when doing validation.
            Fix a problem in proxy where headers from other modules were
            added to the response headers when this was already done in the
            core already.
</quote>

I wondered whether Apache 1.3.26 was adding a charset header to the
returned document in the proxied help system, so I played with wget asking
for a Russian document (which is encoded in 'win1252'). The wget output is
below; but you can clearly see that in the first case (proxied URL) the web
server is adding a "charset=iso-8859-1" header, which we don't see in the
second case (connecting directly to help system port).

I'll see if I can upgrade to Apache 1.3.27 to reproduce the test (but
hopefully see better test results!). If it turns out that Apache 1.3.27
solves the problem, this will probably be a useful warning to document in
the 'Installing the help system as an infocenter' topic.

wget output:

dan@daniels:~$ wget -S --header='Accept-Language: ru'
http://daniels.hostname.com/prod/infocenter/topic/com.prod.doc/core/filename.htm
--08:45:53--
http://daniels.hostname.com/prod/infocenter/topic/com.prod.doc/core/filename.htm
           => `filename.htm.1'
Resolving daniels.hostname.com... done.
Connecting to daniels.hostname.com[9.26.162.217]:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.1 200 OK
 2 Date: Thu, 08 May 2003 12:45:53 GMT
 3 Server: Apache Tomcat/4.0.6 (HTTP/1.1 Connector)
 4 Content-Type: text/html; charset=iso-8859-1
 5 Cache-Control: max-age=10000
 6 X-Cache: MISS from daniels.hostname.com
 7 Connection: close

    [ <=>
] 5,581          5.32M/s

08:45:53 (5.32 MB/s) - `filename.htm.1' saved [5581]

dan@daniels:~$ vim filename.htm.1
dan@daniels:~$ wget -S --header='Accept-Language: ru'
http://daniels.hostname.com:8084/help/topic/com.prod.doc/core/filename.htm
--08:46:38--
http://daniels.hostname.com:8084/help/topic/com.prod.doc/core/filename.htm
           => `filename.htm.2'
Resolving daniels.hostname.com... done.
Connecting to daniels.hostname.com[9.26.162.217]:8084... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.1 200 OK
 2 Content-Type: text/html
 3 Date: Thu, 08 May 2003 12:46:38 GMT
 4 Server: Apache Tomcat/4.0.6 (HTTP/1.1 Connector)
 5 Cache-Control: max-age=10000
 6 Connection: close

    [ <=>
] 5,581          5.32M/s

08:46:38 (5.32 MB/s) - `filename.htm.2' saved [5581]


Dan

--
Dan Scott



                                                                                                                                              
                      Konrad                                                                                                                  
                      Kolosowski/Toronto/IBM@I        To:       platform-help-dev@xxxxxxxxxxx                                                 
                      BMCA                            cc:                                                                                     
                      Sent by:                        Subject:  Re: [platform-help-dev] Is UTF-8 encoding assumed for all languages?          
                      platform-help-dev-admin@                                                                                                
                      eclipse.org                                                                                                             
                                                                                                                                              
                                                                                                                                              
                      07/05/2003 05:45 PM                                                                                                     
                      Please respond to                                                                                                       
                      platform-help-dev                                                                                                       
                                                                                                                                              
                                                                                                                                              



Hi Dan.

There is no assumption on which encoding documents come in.  I think your
problem might be that some documents do not specify encoding correctly (for
example, <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=big5">, that Eclipse translation have in the head, for Chinese),
and the browser has to resort to auto detection.  The auto detection part
of a particular browser may look at the containing frameset document to
guess.

If the charset is specified as above and you still see the problem, open a
bug against help and we will investigate it.

Konrad Kolosowski
Eclipse Help System




                      Dan

                      Scott/Toronto/IBM@IBMCA         To:
platform-help-dev@xxxxxxxxxxx

                      Sent by:                        cc:

                      platform-help-dev-admin@        Subject:
[platform-help-dev] Is UTF-8 encoding assumed for all languages?

                      eclipse.org



                      05/07/2003 05:13 PM

                      Please respond to

                      platform-help-dev






Hi:

I'm experiencing some strangeness with NL content in the help system. I
have Russian documents (navigation and help files) encoded in windows-1251
code page that sometimes display as gibberish.

It looks to me like the frameset document (index.jsp) encoding of UTF-8 is
interfering with the browser's interpretation of the encodings in the
individual frames.

This problem occurs in both Mozilla 1.3.1 and Internet Explorer 6. Is this
a known limitation of the help system or of browsers?

I suppose a workaround would be to convert all of our help content to UTF-8
before generating the doc plugins... yikes.

Dan
--
Dan Scott

_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev





Back to the top