Bug 397865 - Search term containing - causes 400 error when self hosting
Summary: Search term containing - causes 400 error when self hosting
Status: CLOSED WONTFIX
Alias: None
Product: Orion (Archived)
Classification: ECD
Component: Server (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Anthony Hunter CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
: 398546 419281 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-01-10 09:48 EST by Mark Macdonald CLA
Modified: 2017-01-10 15:37 EST (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Macdonald CLA 2013-01-10 09:48:49 EST
1. Start a self-hosting site and log in to it
2. Press Ctrl+Shift+F and search for "git-*.css"
3. You get a "Search failed" message.

The failing request looks like

> GET /filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower:git\-*.css*+Location:/file*

It returns
> {"HttpCode":400,
>  "DetailedMessage":"Illegal character in query at index 86: http://localhost:8080/filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower:git\\-*.css*+Location:/file*",
>  "Message":"Could not create target URI\t",
>  "Severity":"Error",
>  "Code":0}

There seems to be a superfluous backslash preceding the - in the query.
Comment 1 Mark Macdonald CLA 2013-01-18 16:52:42 EST
*** Bug 398546 has been marked as a duplicate of this bug. ***
Comment 2 Mark Macdonald CLA 2013-01-30 11:38:34 EST
This is strange because in bug 390732, I explicitly stopped HostedSiteServlet from trying to decode the query string, to prevent query-mangling of this sort. So /filesearch should receive exactly the same query when invoked through the self-hosting proxy as it does normally.
Comment 3 Mark Macdonald CLA 2013-01-30 14:14:00 EST
The problem is: the backslash in the query string is actually a raw backslash, not a %5C (as required by the spec [1]). 

My claim about HostedSiteServlet not decoding the query string was wrong, since it relies on java.net.URI when constructing the target URI to proxy to, and java.net.URI does not allow invalid URIs to be constructed. When their parser sees the backslash, it barfs and we return a 400 error. So effectively, invalid URIs cannot pass through HostedSiteServlet.

Since the use of java.net.URI is pervasive in HostedSiteServlet, rather than rip apart the servlet, I will try to fix the client-side code (fileImpl.js) such that its calls to /filesearch are correctly URI-encoded.

And obviously, there was no "superfluous backslash" as I suggested in Comment 0, that was just me misunderstanding how JSON represents a single backslash.

[1] http://www.ietf.org/rfc/rfc2396.txt
Comment 4 Mark Macdonald CLA 2013-01-30 18:30:16 EST
These are the client side changes: http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=44de1cf2c1263880881dda2bcc0d4658638488fc

The query is now URL encoded. For example:

Old request:
> /filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower:git\-*+Location:/file*

New request:
> /filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower%3Agit%5C-*%2BLocation%3A%2Ffile*

I also had to change SearchServlet on the server, because the code there relied on the client sending a wrongly-encoded query (i.e. was not decoding the "q" parameter). I tested the cases from bug 389844 (scoped search in projects/folders that have spaces in their names) and they seem to still work. Will revert if it turns out I've overlooked anything and busted this somehow.

Server-side changes:
http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=2e21b7273e0cfe8b5435adaf9ec3d46e7aac45e8
Comment 5 Mark Macdonald CLA 2013-01-30 19:35:57 EST
My initial server-side changes caused a bunch of test failures. I was not able to fix them all, so I reverted both parts. I give up.
Comment 6 John Arthorne CLA 2013-01-31 10:17:40 EST
I'll take a look.
Comment 7 Mark Macdonald CLA 2013-01-31 10:45:50 EST
FWIW, the additional changes I had to make to fix some tests are in this branch. 
http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?h=search_400
I changed the delimiter to just a "+" because whitespace now appears inside folder names and shouldn't be interpreted as a Solr query delimiter.

The remaining failure is in testPathWithSpaces.
Comment 8 Mark Macdonald CLA 2013-05-28 09:49:49 EDT
The right thing to do, to fix this bug, is change HostedSiteServlet to stop using java.net.URI and use java.net.URLs instead, which don't perform any encoding.

This will require rewriting most of the servlet (and possibly also RemoteURLProxyServlet, a helper class).
Comment 9 John Arthorne CLA 2013-05-28 10:11:48 EDT
(In reply to comment #8)
> The right thing to do, to fix this bug, is change HostedSiteServlet to stop
> using java.net.URI and use java.net.URLs instead, which don't perform any
> encoding.
> 
> This will require rewriting most of the servlet (and possibly also
> RemoteURLProxyServlet, a helper class).

I strongly disagree with switching from URI to URL. URL tolerates both encoded and unencoded strings but it is lossy and ambiguous. If you have %20 in a URL you can't tell if it's an encoded space char or an unencoded % char followed by the digits 2 and 0. Similarly a # could indicate start of the URL fragment or it could indicate an unencoded # character. URL will seem easier but will lead to situations where you can't consistently manipulate them. There are other problems with URL, such as the fact the URL#equals method requires network access and is therefore extremely slow.

URI is completely unforgiving strict interpretation of the URI spec, which makes it a bit harder to work with, but on the other hand you always know it is correctly encoded.
Comment 10 Mark Macdonald CLA 2013-05-28 10:33:29 EDT
(In reply to comment #9)
> URI is completely unforgiving strict interpretation of the URI spec, which
> makes it a bit harder to work with, but on the other hand you always know it
> is correctly encoded.

Sigh. In that case, we're going to have to fix all the other Orion servlets to demand correctly URI-encoded parameters, and the client side to provide them.

And it means the site hosting feature will never really be a transparent proxy, since it will reject incorrectly-encoded URLs (which browsers seem to be comfortable sending, despite the specs).
Comment 11 Mark Macdonald CLA 2013-10-11 15:25:09 EDT
*** Bug 419281 has been marked as a duplicate of this bug. ***
Comment 12 libing wang CLA 2013-10-11 15:57:05 EDT
I will try to figure this out in 5.0 time frame.
Comment 14 John Arthorne CLA 2015-05-05 16:30:30 EDT
This open bug report had a target milestone in the past. The target milestone has been removed. Please target for a date in the future or leave the target blank if it is not known.
Comment 15 Michael Rennie CLA 2017-01-10 15:37:14 EST
Closing as part of a mass clean up of inactive bugs. Please reopen if this problem still occurs or is relevant to you. For more details see:

https://dev.eclipse.org/mhonarc/lists/orion-dev/msg04002.html