Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-help-dev] Re: query criteria encoding (was: Lucene analyzers for double-byte languages?)

Erik,

If you do not need to support Internet Explorer 5.0, than changing
occurrences of escape() to encodeURIComponent (and parameter name) makes
sense.  There is however more occurrences of escape that can take NL string
as parameter in the org.eclipse.help.webapp project.  For example
doAdvancedSearch() in advanced.jsp, and setting home page title in toc.jsp.

I think your solution will work, but am not sure.  You should test not only
with double byte characters (Japanese), but also single byte non ASCII
characters (German).  Verify both simple search and search from the
advanced dialog.

Konrad Kolosowski
Eclipse Help System



|---------+----------------------------------->
|         |           Erik                    |
|         |           Hennum/Oakland/IBM@IBMUS|
|         |           Sent by:                |
|         |           platform-help-dev-admin@|
|         |           eclipse.org             |
|         |                                   |
|         |                                   |
|         |           07/22/2003 06:27 PM     |
|         |           Please respond to       |
|         |           platform-help-dev       |
|---------+----------------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                              |
  |       To:       platform-help-dev@xxxxxxxxxxx                                                                                |
  |       cc:                                                                                                                    |
  |       Subject:  [platform-help-dev] Re: query criteria encoding (was: Lucene analyzers for double-byte languages?)           |
  |                                                                                                                              |
  |                                                                                                                              |
  >------------------------------------------------------------------------------------------------------------------------------|







Hi, Konrad:

For what it's worth, I've confirmed that the problem with encoding of
double-byte search criteria that occurs with Internet Explorer and Eclipse
2.0.2 on WebSphere Application Server Express V5.0 (as on Tomcat 4.1.*).

Here's the top of the error log:

java.net.MalformedURLException: unknown protocol: search
      at java.net.URL.<init>(URL.java(Compiled Code))
      at java.net.URL.<init>(URL.java(Inlined Compiled Code))
      at java.net.URL.<init>(URL.java(Compiled Code))
      at
org.eclipse.help.internal.HelpApplication.openConnection(HelpApplication.java:33)

      at
org.eclipse.help.internal.HelpApplication.run(HelpApplication.java:163)
      at java.lang.reflect.Method.invoke(Native Method)
      at org.eclipse.help.servlet.Eclipse.openConnection(Eclipse.java:59)
      at
org.eclipse.help.servlet.EclipseConnector.openConnection(EclipseConnector.java:148)

      at
org.eclipse.help.servlet.EclipseConnector.openStream(EclipseConnector.java:39)

      at org.eclipse.help.servlet.ContentUtil.loadXML(ContentUtil.java:120)
      at
org.eclipse.help.servlet.ContentUtil.loadSearchResults(ContentUtil.java:83)
      at
org.apache.jsp._search_5F_results._jspService(_search_5F_results.java:104)
      at
com.ibm.ws.webcontainer.jsp.runtime.HttpJspBase.service(HttpJspBase.java:89)

      at javax.servlet.http.HttpServlet.service(HttpServlet.java(Compiled
Code))

When I changed search.jsp to use encodeURIComponent() instead of escape()
and to hardcode the parameter name to suppress special decoding on the
Eclipse backend, the double-byte query  worked.  The modified lines:

//    advancedDialog =
window.open("advanced.jsp?<%=searchWordParName%>="
+escape(document.getElementById("searchWord").value),
 "advancedDialog", "resizeable=no,height="+h+",width="+w );
      advancedDialog =
window.open("advanced.jsp?searchWord="
+encodeURIComponent(document.getElementById("searchWord").value),
 "advancedDialog", "resizeable=no,height="+h+",width="+w );
...
//
parent.doSearch("<%=searchWordParName%>="+escape(searchWord)+"&maxHits="
+maxHits);

parent.doSearch("searchWord="+encodeURIComponent(searchWord)+"&maxHits="
+maxHits);

If I needed to fix the query encoding problem with the least amount of
change (and only need to support Internet Explorer 6 and Netscape 6), would
this approach be approprate for Eclipse 2.0.2?


Thanks,


Erik Hennum
ehennum AT us.ibm.com




                      "Konrad Kolosowski"

                      <konradk@xxxxxxxxxx>            To:
platform-help-dev@xxxxxxxxxxx
                      Sent by:                        cc:

                      platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] Lucene analyzers for double-byte languages?
                      eclipse.org



                      07/17/2003 08:25 PM

                      Please respond to

                      platform-help-dev





Erik,

Thanks for sugestions.  For 3.0, we have already changed to use the
encodeURIComponent()Javascript 1.5 function.  Since it does not exist in IE
earlier than 5.5, older IE browsers will now be redirected to basic
implementation of help UI (the same as presented to Netscape 4 users in
Eclipse >= 2.1).  That basic UI, without any DHTML, does not use Javascript
and simply submits a form that is correctly encoded by the browsers.  Hence
the problem is gone for 3.0 and we could eliminate all methods for decoding
parameters from URLUtil class.  Had this work not been done already, we
would be forced to do it since in 3.0 stream we have upgraded Tomcat to 4.1
(4.1.24).  It turns out we were initially too ambitious in creating fancy
UIs for browsers that do not have the necessary support in place.

2.1.x Eclipse uses Tomcat 4.0.y, and there is no additional application
server required to setup an infocenter, so no problem visible there.

Hence it looks like, we only care to find out which application servers
work correctly with infocenter 2.0.2.

Konrad Kolosowski
Eclipse Help System




                      Erik

                      Hennum/Oakland/IBM@IBMUS        To:
platform-help-dev@xxxxxxxxxxx

                      Sent by:                        cc:
platform-help-dev@xxxxxxxxxxx, platform-help-dev-admin@xxxxxxxxxxx

                      platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] Lucene analyzers for double-byte languages?
                      eclipse.org



                      07/17/2003 07:21 PM

                      Please respond to

                      platform-help-dev









Hi, Konrad:

Right as usual!

The search works fine in Netscape 7.1 with the following URL for the search
results:

http://localhost:8080/help_2_0_2/search_results.jsp?searchWord
=%E3%82%A2%E3%82%AF%E3%82%BB%E3%82%B7%E3%83%93%E3%83%AA%E3%83%86%E3%82%A3%E3%83%BC&maxHits=500




The search succeeds in Internet Explorer 5.5 on Tomcat 4.0.6 with the
following URL for the search results:

http://localhost:8080/help_2_0_2/search_results.jsp?searchWordJS13
=%u30A2%u30AF%u30BB%u30B7%u30D3%u30EA%u30C6%u30A3%u30FC&maxHits=500

The search fails in Internet Explorer 5.5 on Tomcat 4.1.10 with the same
URL for the search results.

The "java.io.CharConversionException: isHexDigit" Tomcat exception seems to
be triggered in Tomcat 4.1.10
by the initServlet dispatch when the Internet Explorer 5.5 searchWordJS13
parameter includes double byte characters - that is, by the following line:


application.getRequestDispatcher
("/servlet/org.eclipse.help.servlet.InitServlet").include(request,response);




For what it's worth, the encodeURI() JavaScript function doesn't seem to
trigger the Tomcat error (though URLUtil isn't set up to decode it).

Let me see if WebSphere Application Server has the same problem or whether
fixing the encoding of our source documents was enough to fix the initial
problem that we discovered there.  Tomcat 4.1 isn't a release platform for
us, so if the problem is isolated to it, we don't need to fix it.


Thanks for the suggestions,


Erik Hennum
ehennum AT us.ibm.com




                      "Konrad Kolosowski"

                      <konradk@xxxxxxxxxx>            To:
platform-help-dev@xxxxxxxxxxx
                      Sent by:                        cc:

                      platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] Lucene analyzers for double-byte languages?
                      eclipse.org



                      07/17/2003 02:50 PM

                      Please respond to

                      platform-help-dev








Erik,

In Eclipse < 3.0, requests for search are encoded using Javascipt encode()
method to support older browsers.  What is worse, encode() produces
different results on different browsers.  On IE and Netscape it does not
support encoding all characters and results in non standard encoding.  Try
searching help from Mozilla (that encodes correctly), I think the search
should work.  You can use TCP/IP monitor to record URLs that the different
browsers are sending to the infocenter.

Since some browsers use non standard encoding, help does not call server
API to obtain URL parameters, but contains custom code for parsing the
requests.  There was no problem with that observed when running internal
Tomcat or running infocenter on Tomcat 4.0.x.  You are running infocenter
using Tomcat 4.1, with Coyote connector, which might parses the request
when not asked to do so, and fails.  Try setting up the infocenter on
Tomcat 4.0.x and see if it eliminates exception for Japanese searches.

Konrad Kolosowski
Eclipse Help System



|---------+----------------------------------->
|         |           Dorian                  |
|         |           Birsan/Toronto/IBM@IBMCA|
|         |           Sent by:                |
|         |           platform-help-dev-admin@|
|         |           eclipse.org             |
|         |                                   |
|         |                                   |
|         |           07/17/2003 03:48 PM     |
|         |           Please respond to       |
|         |           platform-help-dev       |
|---------+----------------------------------->

>
------------------------------------------------------------------------------------------------------------------------------|




  |
|
  |       To:       platform-help-dev@xxxxxxxxxxx
|
  |       cc:
|
  |       Subject:  Re: [platform-help-dev] Lucene analyzers for
double-byte languages?                                          |
  |
|
  |
|

>
------------------------------------------------------------------------------------------------------------------------------|








Erik,

You may be hitting a number of bugs fixed in 2.1, this one being the most
likely candidate:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=25935.
Basically, it is about the machine locale having to match the document
locale, unless UTF-8.

Also https://bugs.eclipse.org/bugs/show_bug.cgi?id=30138 could affect the
results you see.

-Dorian



   Erik
   Hennum/Oakland/IBM@IBMUS         To:
                            platform-help-dev@xxxxxxxxxxx
   Sent by:                         cc:
   platform-help-dev-admin@         Subject:        Re:
   eclipse.org              [platform-help-dev] Lucene analyzers for
                            double-byte languages?

   07/17/2003 02:55 PM
   Please respond to
   platform-help-dev









Hi, Dorian:

Part of the problem seems to have been in the encoding, which was
ISO-8859-1, so all of the Japanese characters were represented as text
entities (suboptimal, to put it mildly).  I understand that text entities
aren't indexed.

I tried the Shift-JIS encoding, but the Tomcat console reported parse
errors during indexing.

So, I switched to UTF-8 encoding, which had some benefit.  Searching for a
Japanese string now creates the fulltext search index without errors.
However, searching on a Japanese string still doesn't match anything, while
searching on an embedded English string does.

Regarding declaring the locale, I've been testing as follows:

*  The order of locale preferences in my browser are:  ja, en-us, en.

*  The only documents are localized Japanese documents in
...\eclipse\plugins\our.plugin.doc\nl\ja
  (I removed the English default documents from the plugin on my latest
tests to eliminate any potential for matching the wrong language.)

*  The localized Japanese pages for this plugin are displaying correctly
when I navigate within InfoCenter Eclipse 2.0.2

*  If I search on an English string embedded in the Japanese documents (and
now on a Japanese string, too), the search indexes are created in

   ...\jakarta-tomcat-4.1.10
\work\Standalone\localhost\help\.metadata\.plugins\org.eclipse.help\nl\ja

Doesn't this imply that Eclipse is receiving the correct locale?  If not,
what else needs to be done to select the correct locale?

By the way, I was mistaken in reporting that
"java.io.CharConversionException: isHexDigit" is thrown by the search.
Watching more carefully, I see that it was thrown earlier when I select the
"book" for the localized plugin in the table of contents.  Despite the
exception, the table of contents for the plugin displays correctly.

Because we'd like the user to be able to select the locale via the browser
preference, I don't think we would want to hard-code the locale in a proxy
web application.

Do you have any suggestions of other things to try?


Thanks in advance,


Erik Hennum
ehennum AT us.ibm.com




                     "Dorian Birsan"

                     <birsan@xxxxxxxxxx>             To:
platform-help-dev@xxxxxxxxxxx
                     Sent by:                        cc:

                     platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] Lucene analyzers for double-byte languages?
                     eclipse.org



                     07/17/2003 05:34 AM

                     Please respond to

                     platform-help-dev








Erik, other groups have successfully tested 2.0.2 infocenter on many
languages, including those your mentioned, so something must be different
in the setup.
Since things work fine in the stand-alone, it appears that the locale
passed to the infocenter is not correct. Unlike the stand-alone, the locale
is picked up from the request, not from the host machine. So you must
ensure your browser sends the appropriate locale.
An alternative to detecting the browser locale is to proxy the infocenter
by another webapp, and have a dispatcher servlets that wraps the incoming
request, changes the locale to a desired locale and then delegates to the
real infocenter.
There is a fix in 3.0 to fix some locale related issues, but in your case
the problem is likely caused by your browser not sending the expected
locale.

-Dorian



  Erik
  Hennum/Oakland/IBM@IBMUS         To:
                           platform-help-dev@xxxxxxxxxxx
  Sent by:                         cc:
  platform-help-dev-admin@         Subject:        Re:
  eclipse.org              [platform-help-dev] Lucene analyzers for
                           double-byte languages?

  07/16/2003 10:19 PM
  Please respond to
  platform-help-dev









Hi, Dorian:

That's reassuring!

And, in fact, when I try the search using the standalone version of Eclipse
2.0.2, I can select a Japanese string from a topic, copy it into the search
field, run the search, and return a list of search results.  When I
selected a search results, the matched Japanese string is correctly
highlighted in the displayed topic.

However, if I do the same thing in the InfoCenter version of Eclipse 2.0.2,
the query string doesn't seem to get to the backend Eclipse web
application. The status bar shows a correct-looking URL for a time, but
nothing comes back to change the prompt in the search results frame.  The
Tomcat console reports the error "java.io.CharConversionException:
isHexDigit" during conversion of parameters (I've appended the full
exception).

The Java topics do contain some untranslated English strings.  If I query
on an English string in the InfoCenter version, the search succeeds,
listing the Japanese topics in the search results frame and highlighting
the matched string in the displayed topic.

Could there be an issue with encoding, transmitting, and decoding the query
string via the web server / servlet container?

I noticed that escape is deprecated in ECMAScript v3 (equivalent to
Netscape 6 JavaScript 1.5 or IE 5.5 JScript 5.5) and so tried the
encodeURIComponent() JavaScript function in search.jsp but got the same
result.

I'm trying to confirm that InfoCenter Eclipse 2.0.2 on WebSphere
Application Server is also failing to pass the query string through to the
Eclipse web application.  Our tester reports the following message:

  There was an error in your action:
  Java.lang.IllegalArgumentException

Any suggestions on how we might fix this problem?  We're quite late in a
release cycle.


Thanks,


Erik Hennum
ehennum AT us.ibm.com


java.io.CharConversionException: isHexDigit
      at org.apache.tomcat.util.buf.UDecoder.convert(UDecoder.java:124)
      at org.apache.tomcat.util.buf.UDecoder.convert(UDecoder.java:87)
      at
org.apache.tomcat.util.http.Parameters.processParameters(Parameters.j
ava:408)
      at
org.apache.tomcat.util.http.Parameters.processParameters(Parameters.j
ava:495)
      at
org.apache.tomcat.util.http.Parameters.handleQueryParameters(Paramete
rs.java:278)
      at
org.apache.coyote.tomcat4.CoyoteRequest.parseRequestParameters(Coyote
Request.java:1920)
      at
org.apache.coyote.tomcat4.CoyoteRequest.getParameterNames(CoyoteReque
st.java:942)
      at
org.apache.coyote.tomcat4.CoyoteRequest.getParameterMap(CoyoteRequest
.java:922)
      at
org.apache.coyote.tomcat4.CoyoteRequestFacade.getParameterMap(CoyoteR
equestFacade.java:193)
      at
org.apache.catalina.core.ApplicationHttpRequest.setRequest(Applicatio
nHttpRequest.java:525)
      at
org.apache.catalina.core.ApplicationHttpRequest.<init>(ApplicationHtt
pRequest.java:125)
      at
org.apache.catalina.core.ApplicationDispatcher.wrapRequest(Applicatio
nDispatcher.java:921)
      at
org.apache.catalina.core.ApplicationDispatcher.doInclude(ApplicationD
ispatcher.java:547)
      at
org.apache.catalina.core.ApplicationDispatcher.include(ApplicationDis
patcher.java:498)
      at
org.apache.jsp.search_results_jsp._jspService(search_results_jsp.java
:48)
      at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:136)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
      at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper
.java:202)
      at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:2
89)
      at
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:240)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
      at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:247)
      at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:193)
      at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
alve.java:260)
      at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContex
t.invokeNext(StandardPipeline.java:643)
      at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav
a:480)
      at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)

      at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
alve.java:191)
      at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContex
t.invokeNext(StandardPipeline.java:643)
      at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav
a:480)
      at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)

      at
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:
2397)
      at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
ava:180)
      at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContex
t.invokeNext(StandardPipeline.java:643)
      at
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatche
rValve.java:170)
      at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContex
t.invokeNext(StandardPipeline.java:641)
      at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
ava:171)
      at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContex
t.invokeNext(StandardPipeline.java:641)
      at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav
a:480)
      at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)

      at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
ve.java:174)
      at
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContex
t.invokeNext(StandardPipeline.java:643)
      at
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.jav
a:480)
      at
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995)

      at
org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:22
3)
      at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
:405)
      at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
ssConnection(Http11Protocol.java:380)
      at
org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java
:508)
      at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadP
ool.java:533)
      at java.lang.Thread.run(Thread.java:513)




                    "Dorian Birsan"

                    <birsan@xxxxxxxxxx>             To:
platform-help-dev@xxxxxxxxxxx
                    Sent by:                        cc:

                    platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] Lucene analyzers for double-byte languages?
                    eclipse.org



                    07/16/2003 10:25 AM

                    Please respond to

                    platform-help-dev








Erik,

Search should work in the languages you listed, as eclipse provides a
default analyzer.
The English and German analyzers are a bit smarter, as they deal with
stemming, stop words, etc.
You could certainly pick up 3rd party plugins and contribute them as
plugins in your product (that's the intention of the analyzer extension
point).

-Dorian



 Erik
 Hennum/Oakland/IBM@IBMUS          To:
 Sent by:                  platform-help-dev@xxxxxxxxxxx
 platform-help-dev-admin@e         cc:
 clipse.org                        Subject:        [platform-help-dev]
                           Lucene analyzers for double-byte languages?

 07/16/2003 12:51 PM
 Please respond to
 platform-help-dev









Help Developers:

With regard to

 http://dev.eclipse.org/mhonarc/lists/platform-help-dev/msg00082.html

has there been any success in creating Lucene analyzers for Japanese,
Traditional Chinese, Simplified Chinese, and Korean?

Our need is to enable search in these languages on Eclipse 2.0.2

If there aren't any analyzers, would we need to hack the JSPs to disable or
hide the search UI?  (Merely to confirm.)

I did notice in looking at the Lucene site that an analyzer is available
for Simplified Chinese:

 http://marc.theaimsgroup.com/?l=lucene-dev&m=100705753831746&q=p3

I presume that would need a wrapper to plug into the
org.eclipse.help.luceneAnalyzer extension point.


Thanks,


Erik Hennum
ehennum AT us.ibm.com


_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev


_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev




Back to the top