Bug 339422 - URIUtil#toURI(URL) encodes properly encoded URLs again
Summary: URIUtil#toURI(URL) encodes properly encoded URLs again
Status: CLOSED WONTFIX
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Components (show other bugs)
Version: 3.7   Edit
Hardware: PC Windows 7
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: equinox.components-inbox CLA
QA Contact:
URL:
Whiteboard: stalebug
Keywords:
: 337479 (view as bug list)
Depends on:
Blocks: 337479
  Show dependency tree
 
Reported: 2011-03-09 15:28 EST by Markus Keller CLA
Modified: 2020-12-07 02:00 EST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Keller CLA 2011-03-09 15:28:29 EST
HEAD, found through bug 338212
 
URIUtil#toURI(URL) encodes properly encoded URLs again. But its Javadoc says:

 * Returns the URL as a URI. This method will handle URLs that are
 * not properly encoded (for example they contain unencoded space characters).

This sounds like the method will only touch URLs that are not properly encoded, but in fact it re-encodes everything. If that was the intention, then this needs to be clarified in the Javadoc, and clients should be advised to use URL#toURI() (>= 1.5) or "new URI(url.toExternalForm())" if the URL is already correct. Better yet, they should avoid dealing with incorrect URLs like those from java.io.File#toURL() at all.

I find it quite misleading that chains of toURI(URL) and toURL(URI) are not stable:

package org.eclipse.core.runtime;

import java.io.File;
import java.net.URI;
import java.net.URL;

class URIUtilTest {

	public static void main(String[] args) throws Exception {
		File file = new File("C:\\Program Files\\");
		System.out.println(file);
		System.out.println(file.exists());

		URI uri = file.toURI();
		System.out.println("uri: " + uri);

		File file2 = URIUtil.toFile(uri);
		System.out.println(file2);
		System.out.println(file2.exists());

		System.out.println("---");

		URL url = URIUtil.toURL(uri);
		System.out.println("url: " + url);

		// Bug: Properly encoded %20 is converted to %2520:
		URI uri2 = URIUtil.toURI(url);
		System.out.println("uri2: " + uri2);

		URL url2 = URIUtil.toURL(uri2);
		System.out.println("url2: " + url2);

		URI uri3 = URIUtil.toURI(url2);
		System.out.println("uri3: " + uri3);

		File file3 = URIUtil.toFile(uri3);
		System.out.println(file3);
		System.out.println(file3.exists());
	}
}
Comment 1 John Arthorne CLA 2011-03-09 15:37:14 EST
I will work on the javadoc. We can't handle both properly encoded and unencoded URLs with a single method. I.e., if the URL contains "%20" this is either a properly encoded space character, or an unencoded string containing the three chars (%, 2, 0). We can't reliably determine which that is.
Comment 2 John Arthorne CLA 2011-03-28 15:09:38 EDT
There is no perfect answer for all cases, but it bothers me that this method treats file: URLs different from any other URL. For non-file URLs, it starts with the assumption that the URL is a well-formed URI, and if it fails then it assumes the URL is not encoded. For file: URLs, it always assumes the URL is not properly encoded.

However the more I think about it, I think we should leave it alone. Since java.io.File#toURL always returns unencoded URLs, it is most likely that if someone has a file: URL then it needs to be encoded. For non file URLs I like the fact that we prefer well-formed URI's, since it encourages clients to do the right thing, and generally transition their code to well-formed encoded URIs.

I would be happy to add javadoc here saying java.net.URL should be avoided at all costs because it is inherently ambiguous and doesn't conform to RFC specified behaviour for URLs and URIs.
Comment 3 John Arthorne CLA 2011-03-28 15:10:49 EDT
*** Bug 337479 has been marked as a duplicate of this bug. ***
Comment 4 Markus Keller CLA 2011-03-29 08:47:14 EDT
(In reply to bug 337479 comment #2)
java.io.File is not supposed to know about fragments, so all # in a file path should be considered file name characters.

java.io.File#toURL() is indeed broken and has been deprecated in 1.6. Since it loses information, nobody should use that method any more (also pre 1.6).

We have to decide what URIUtil#toURI(URL) is meant for. If it is meant for "fixing" unencoded URLs, then the Javadoc should tell explicitly that this is not a recommended method, and that clients should instead try to avoid File#toURL() in the first place and use the File#toURI(). Or if they need a URL, they should use file.toURI().toURL().

But I also see some special handling for UNC paths. I'm not sure what bugs exactly the implementation should fix there. But if the UNC special cases are also required for properly encoded URLs, then we need a separate API that only fixes the UNC problems but does not encode the URL again.

UNC handling in the wiki also needs a closer look:
http://wiki.eclipse.org/Eclipse/UNC_Paths needs to be updated, since it recommends the broken File#toURL() and unconditionally recommends URIUtil.

(In reply to comment #2)
> I would be happy to add javadoc here saying java.net.URL should be avoided at
> all costs ...
I don't think URL is the problem. File#toURL() is broken.
Comment 5 Eclipse Genie CLA 2018-12-17 05:33:57 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 6 Eclipse Genie CLA 2020-12-07 02:00:24 EST
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug.

If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.