Community
Participate
Working Groups
1. Start a self-hosting site and log in to it 2. Press Ctrl+Shift+F and search for "git-*.css" 3. You get a "Search failed" message. The failing request looks like > GET /filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower:git\-*.css*+Location:/file* It returns > {"HttpCode":400, > "DetailedMessage":"Illegal character in query at index 86: http://localhost:8080/filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower:git\\-*.css*+Location:/file*", > "Message":"Could not create target URI\t", > "Severity":"Error", > "Code":0} There seems to be a superfluous backslash preceding the - in the query.
*** Bug 398546 has been marked as a duplicate of this bug. ***
This is strange because in bug 390732, I explicitly stopped HostedSiteServlet from trying to decode the query string, to prevent query-mangling of this sort. So /filesearch should receive exactly the same query when invoked through the self-hosting proxy as it does normally.
The problem is: the backslash in the query string is actually a raw backslash, not a %5C (as required by the spec [1]). My claim about HostedSiteServlet not decoding the query string was wrong, since it relies on java.net.URI when constructing the target URI to proxy to, and java.net.URI does not allow invalid URIs to be constructed. When their parser sees the backslash, it barfs and we return a 400 error. So effectively, invalid URIs cannot pass through HostedSiteServlet. Since the use of java.net.URI is pervasive in HostedSiteServlet, rather than rip apart the servlet, I will try to fix the client-side code (fileImpl.js) such that its calls to /filesearch are correctly URI-encoded. And obviously, there was no "superfluous backslash" as I suggested in Comment 0, that was just me misunderstanding how JSON represents a single backslash. [1] http://www.ietf.org/rfc/rfc2396.txt
These are the client side changes: http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=44de1cf2c1263880881dda2bcc0d4658638488fc The query is now URL encoded. For example: Old request: > /filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower:git\-*+Location:/file* New request: > /filesearch?sort=NameLower%20asc&rows=100&start=0&q=NameLower%3Agit%5C-*%2BLocation%3A%2Ffile* I also had to change SearchServlet on the server, because the code there relied on the client sending a wrongly-encoded query (i.e. was not decoding the "q" parameter). I tested the cases from bug 389844 (scoped search in projects/folders that have spaces in their names) and they seem to still work. Will revert if it turns out I've overlooked anything and busted this somehow. Server-side changes: http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=2e21b7273e0cfe8b5435adaf9ec3d46e7aac45e8
My initial server-side changes caused a bunch of test failures. I was not able to fix them all, so I reverted both parts. I give up.
I'll take a look.
FWIW, the additional changes I had to make to fix some tests are in this branch. http://git.eclipse.org/c/orion/org.eclipse.orion.server.git/commit/?h=search_400 I changed the delimiter to just a "+" because whitespace now appears inside folder names and shouldn't be interpreted as a Solr query delimiter. The remaining failure is in testPathWithSpaces.
The right thing to do, to fix this bug, is change HostedSiteServlet to stop using java.net.URI and use java.net.URLs instead, which don't perform any encoding. This will require rewriting most of the servlet (and possibly also RemoteURLProxyServlet, a helper class).
(In reply to comment #8) > The right thing to do, to fix this bug, is change HostedSiteServlet to stop > using java.net.URI and use java.net.URLs instead, which don't perform any > encoding. > > This will require rewriting most of the servlet (and possibly also > RemoteURLProxyServlet, a helper class). I strongly disagree with switching from URI to URL. URL tolerates both encoded and unencoded strings but it is lossy and ambiguous. If you have %20 in a URL you can't tell if it's an encoded space char or an unencoded % char followed by the digits 2 and 0. Similarly a # could indicate start of the URL fragment or it could indicate an unencoded # character. URL will seem easier but will lead to situations where you can't consistently manipulate them. There are other problems with URL, such as the fact the URL#equals method requires network access and is therefore extremely slow. URI is completely unforgiving strict interpretation of the URI spec, which makes it a bit harder to work with, but on the other hand you always know it is correctly encoded.
(In reply to comment #9) > URI is completely unforgiving strict interpretation of the URI spec, which > makes it a bit harder to work with, but on the other hand you always know it > is correctly encoded. Sigh. In that case, we're going to have to fix all the other Orion servlets to demand correctly URI-encoded parameters, and the client side to provide them. And it means the site hosting feature will never really be a transparent proxy, since it will reject incorrectly-encoded URLs (which browsers seem to be comfortable sending, despite the specs).
*** Bug 419281 has been marked as a duplicate of this bug. ***
I will try to figure this out in 5.0 time frame.
Another case: https://orion.eclipse.org/cfapi/orgs?Target={%22Url%22:%22https://api.ng.bluemix.net%22} WORKS! -- http://sk.orion.eclipse.org:8080/cfapi/orgs?Target={%22Url%22:%22https://api.ng.bluemix.net%22} FAILS!
This open bug report had a target milestone in the past. The target milestone has been removed. Please target for a date in the future or leave the target blank if it is not known.
Closing as part of a mass clean up of inactive bugs. Please reopen if this problem still occurs or is relevant to you. For more details see: https://dev.eclipse.org/mhonarc/lists/orion-dev/msg04002.html