Bug 220605 - IMetadataRepositoryManager will only load an old style update site if "site.xml" is part of the URL
Summary: IMetadataRepositoryManager will only load an old style update site if "site.x...
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 critical (vote)
Target Milestone: 3.4 M6   Edit
Assignee: P2 Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed
: 220606 220607 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-02-27 12:06 EST by Ray Braithwood CLA
Modified: 2009-06-02 00:54 EDT (History)
5 users (show)

See Also:


Attachments
proposed fix to allow update sites with and without 'site.xml' at the end (5.03 KB, patch)
2008-02-27 16:09 EST, Ray Braithwood CLA
no flags Details | Diff
proposed patch (27.68 KB, patch)
2008-03-03 17:07 EST, Simon Kaegi CLA
no flags Details | Diff
proposed patch v2 (28.03 KB, patch)
2008-03-03 18:30 EST, Simon Kaegi CLA
no flags Details | Diff
site map screenshot (5.67 KB, image/png)
2009-05-29 09:26 EDT, Andrey Loskutov CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ray Braithwood CLA 2008-02-27 12:06:15 EST
I cannot use p2 to retrieve the contents of an old style (uses site.xml rather than content.xml) update site without appending site.xml to the end of the repository URL.

for example:
http://beust.com/eclipse/ will not load (TestNG's update site)
http://beust.com/eclipse/site.xml will load

upon looking at the code I found this in UpdateSiteMetadataRepositoryFactory line 26

if (!location.getPath().endsWith("site.xml")) //$NON-NLS-1$
   return null;

I thought p2 would automatically try the both styles of update sites without needing the suffix in the URL.
Comment 1 Simon Kaegi CLA 2008-02-27 12:20:57 EST
Yes, the discovery process needs work. If you have the time to take a look I'd be interested in working with any patches you can provide.
Comment 2 Ray Braithwood CLA 2008-02-27 12:37:09 EST
*** Bug 220607 has been marked as a duplicate of this bug. ***
Comment 3 Ray Braithwood CLA 2008-02-27 12:38:02 EST
*** Bug 220606 has been marked as a duplicate of this bug. ***
Comment 4 Ray Braithwood CLA 2008-02-27 13:00:19 EST
Sure thing.  I'll let you know what I find.
Comment 5 Susan McCourt CLA 2008-02-27 13:18:24 EST
note that as part of bug #204184, the validation code has moved to the UpdateSiteMetadataRepositoryFactory.validateAndLoad(URL location, IProgressMonitor monitor, boolean doLoad) throws ProvisionException {

	

Comment 6 Susan McCourt CLA 2008-02-27 13:21:59 EST
Another note:  when you relax the validation code, you'll also need to alter the suffix information for the factory extension in the plugin.xml
Comment 7 Ray Braithwood CLA 2008-02-27 16:09:28 EST
Created attachment 90931 [details]
proposed fix to allow update sites with and without 'site.xml' at the end

Here's little patch that allows me to use p2 with update sites that do not site.xml in the location.  I have also made it backward compatible in that you can still have site.xml at the end.  I will caveat these changes with I have only tested them with the part of the p2 world I care about (mainly loading a repository and retrieving a list of the InstallableUnits).  I more than likely missed something so larger scale testing in necessary.
Comment 8 Simon Kaegi CLA 2008-03-03 12:26:08 EST
Thanks Ray.
On the p2 call we talked about your contribution and it's likely something we're going to require for our initial integration with the SDK this week. I'll take a look today and commit something along the lines of your patch.
Comment 9 Simon Kaegi CLA 2008-03-03 17:07:20 EST
Created attachment 91450 [details]
proposed patch

This patch builds on Rays and provides similar path calculation logic for artifact repositories. In addition there are a few changes in the repo managers to favour trying the native artifact/metadata repositories before trying the site.xml (or other variants).
Comment 10 John Arthorne CLA 2008-03-03 17:54:35 EST
From testing this patch, the only case that had trouble was loading an artifact repository that was missing both site.xml and missing a trailing slash:

http://download.eclipse.org/tools/mylyn/update/e3.4

Using admin UI, I can load a metadata repository with this URL, but it fails loading an artifact repository.
Comment 11 Simon Kaegi CLA 2008-03-03 18:30:01 EST
Created attachment 91457 [details]
proposed patch v2

Patch updated.
Good catch. Thanks John.
Comment 12 Simon Kaegi CLA 2008-03-04 00:07:37 EST
Fixed in HEAD.

Thanks Ray. Could you re-try your use-cases with the code from HEAD.
Comment 13 Ray Braithwood CLA 2008-03-04 10:28:13 EST
It actually failed when reading Checkstyle's update site (http://eclipse-cs.sourceforge.net/update).  The problem here is that the site.xml defines an absolute URL for the features rather than relative.

<description url="http://eclipse-cs.sourceforge.net/update"> Eclipse Checkstyle Plug-in Update Site </description>
&#8722;
	<feature url="http://downloads.sourceforge.net/eclipse-cs/com.atlassw.tools.eclipse.checkstyle_4.3.3-feature.jar?use_mirror=mesh" id="com.atlassw.tools.eclipse.checkstyle" version="4.3.3">
<category name="Checkstyle"/>
</feature>
&#8722;
	<feature url="http://downloads.sourceforge.net/eclipse-cs/com.atlassw.tools.eclipse.checkstyle_4.4.0-feature.jar?use_mirror=mesh" id="com.atlassw.tools.eclipse.checkstyle" version="4.4.0">
<category name="Checkstyle"/>
</feature>

I'll also show you the start of the stack trace I got:
java.io.FileNotFoundException: http://eclipse-cs.sourceforge.net/update/http://downloads.sourceforge.net/eclipse-cs/com.atlassw.tools.eclipse.checkstyle_4.4.0-feature.jar?use_mirror=mesh
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1147)
	at java.net.URL.openStream(URL.java:1007)
	at org.eclipse.equinox.internal.p2.updatesite.metadata.UpdateSiteMetadataRepository.parseFeature(UpdateSiteMetadataRepository.java:292)
	at org.eclipse.equinox.internal.p2.updatesite.metadata.UpdateSiteMetadataRepository.loadFeaturesFromSiteFeatures(UpdateSiteMetadataRepository.java:248)
	at org.eclipse.equinox.internal.p2.updatesite.metadata.UpdateSiteMetadataRepository.<init>(UpdateSiteMetadataRepository.java:97)
	at org.eclipse.equinox.internal.p2.updatesite.metadata.UpdateSiteMetadataRepositoryFactory.load(UpdateSiteMetadataRepositoryFactory.java:43)
	at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.loadRepository(MetadataRepositoryManager.java:395)
	at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.loadRepository(MetadataRepositoryManager.java:311)
Comment 14 Ray Braithwood CLA 2008-03-04 10:33:44 EST
The above problem seems to be the only issue (absolute URL in site.xml); all of the other update sites I tried are giving me what I expect.  Good work and thanks for the fix.
Comment 15 Simon Kaegi CLA 2008-03-04 11:16:19 EST
Tee absolute URL issue you mention is currently being tracked in bug 219904. I've added you to the CC.
Comment 16 Andrey Loskutov CLA 2009-05-29 03:10:10 EDT
I see a regression in 3.5 RC2 (3.5 M7 worked fine):

http://andrei.gmxhome.de/eclipse will not load
http://andrei.gmxhome.de/eclipse/site.xml will load.

Should I open a new bug or can you reopen this one?

This is a MAJOR regression since "classic" update sites are not more working with Eclipse 3.5 by using old URL.
Comment 17 Andrey Loskutov CLA 2009-05-29 03:11:11 EDT
P.S.
Stack trace from error log:

org.eclipse.equinox.internal.provisional.p2.core.ProvisionException: No repository found at http://andrei.gmxhome.de/eclipse.
at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.fail(AbstractRepositoryManager.java:380)
at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.loadRepository(AbstractRepositoryManager.java:606)
at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.loadRepository(MetadataRepositoryManager.java:92)
at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.loadRepository(MetadataRepositoryManager.java:88)
at org.eclipse.equinox.internal.provisional.p2.ui.operations.ProvisioningUtil.loadMetadataRepository(ProvisioningUtil.java:88)
at org.eclipse.equinox.internal.provisional.p2.ui.QueryableMetadataRepositoryManager.doLoadRepository(QueryableMetadataRepositoryManager.java:55)
at org.eclipse.equinox.internal.provisional.p2.ui.QueryableRepositoryManager.loadRepository(QueryableRepositoryManager.java:195)
at org.eclipse.equinox.internal.provisional.p2.ui.QueryableRepositoryManager.loadAll(QueryableRepositoryManager.java:108)
at org.eclipse.equinox.internal.p2.ui.sdk.PreloadingRepositoryHandler$2.run(PreloadingRepositoryHandler.java:71)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 18 Henrik Lindberg CLA 2009-05-29 06:32:58 EDT
From the stacktrace it is possible to see that you are getting a "not found" that was caught earlier in the provisioning process. Could you try this again after having either pressed "Test Connection", or after a restart of Eclipse - both will clear the repository cached state.

Comment 19 Andrey Loskutov CLA 2009-05-29 06:43:01 EDT
First time I press "test conection" it says "invalid repository location: /eclipse/artifacts.xml", but does not log anything. If I then try to use the created update site from the update dialog, it reports already provided stack trace.

I've tried with fresh unziped Eclipse instance with new workspace, there can't be any caches at all.
Comment 20 Henrik Lindberg CLA 2009-05-29 08:45:25 EDT
I took a look at your update site and there seems to be strange things going on.

One of the first requests will be to get compositeContent.jar. On some requests, I got a jar file with some contents, but on subsequent attempts I got back a 0 byte file.

You also have to provide proper 404 responses for files that do not exists. Your site returns 300 ("multiple choices") and then lists a single alternative with a similar name. 

p2 will try to figure out what kind of site you have by asking for particular files, starting with compositeContent.jar, then compositeContent.xml, the content.jar, etc. until it as the final step asks for "site.xml" if no other index file was found.  So, returning 404 for a non existing file is essential.

I suspect that a HTTP Status of 300 will be handled as a "read error" as opposed to a 404, and it will then stop the search, but I am not sure without further investigation. 

In any case you need to make sure that:
- the site responds with either the requested content, or a 404.
- that content is not delivered with 0 size



Comment 21 Andrey Loskutov CLA 2009-05-29 09:26:13 EDT
Created attachment 137638 [details]
site map screenshot

(In reply to comment #20)
> I took a look at your update site and there seems to be strange things going
> on.
> 
> One of the first requests will be to get compositeContent.jar. On some
> requests, I got a jar file with some contents, but on subsequent attempts I got
> back a 0 byte file.

The file is just 0 bytes long. There is no other magic, see the picture.

> You also have to provide proper 404 responses for files that do not exists.
> Your site returns 300 ("multiple choices") and then lists a single alternative
> with a similar name. 

For files wich does not exists it returns 300 if there is a file with similar name, but this is a gmx (free hoster) global site policy which I can't change.

> p2 will try to figure out what kind of site you have by asking for particular
> files, starting with compositeContent.jar, then compositeContent.xml, the
> content.jar, etc. until it as the final step asks for "site.xml" if no other
> index file was found.  So, returning 404 for a non existing file is essential.

This is not true for Eclipse before 3.5 RC2. With 3.4 or 3.5 M7 the site AS IS works just fine, you can try it anytime.

> In any case you need to make sure that:
> - the site responds with either the requested content, or a 404.
> - that content is not delivered with 0 size

Unfortunately this lead to thousands of 404 in the server log, which "hides" the real issues, so returning 0 size was a practical solution until RC2...

The point is: there seems to be some changes in the way how p2 handles "bad" sites. In my opinion it should always try to use "classic" site.xml mechanism as the last resort, which worked perfectly until RC2 (or RC1). With the new code the "failback" solution didn't work anymore, at least not in my case. I guess I'm not alone with "bad" site content.
Comment 22 Henrik Lindberg CLA 2009-05-29 09:46:02 EDT
(In reply to comment #21)
> Created an attachment (id=137638) [details]
> site map screenshot
> 
> (In reply to comment #20)
> > I took a look at your update site and there seems to be strange things going
> > on.
> > 
> > One of the first requests will be to get compositeContent.jar. On some
> > requests, I got a jar file with some contents, but on subsequent attempts I got
> > back a 0 byte file.
> 
> The file is just 0 bytes long. There is no other magic, see the picture.
> 
I think you should remove this file - it indicates to p2 that this site is a composite repo.

> > You also have to provide proper 404 responses for files that do not exists.
> > Your site returns 300 ("multiple choices") and then lists a single alternative
> > with a similar name. 
> 
> For files wich does not exists it returns 300 if there is a file with similar
> name, but this is a gmx (free hoster) global site policy which I can't change.
> 
Suggest you find different hosting.
I have opened Bug 278383 to fix this in 3.6. You still need to remove the empty files.

> > p2 will try to figure out what kind of site you have by asking for particular
> > files, starting with compositeContent.jar, then compositeContent.xml, the
> > content.jar, etc. until it as the final step asks for "site.xml" if no other
> > index file was found.  So, returning 404 for a non existing file is essential.
> 
> This is not true for Eclipse before 3.5 RC2. With 3.4 or 3.5 M7 the site AS IS
> works just fine, you can try it anytime.
> 
Unfortunately, the changes you suggest would have a negative impact on those that
have updated to use p2 enables sites. 

> > In any case you need to make sure that:
> > - the site responds with either the requested content, or a 404.
> > - that content is not delivered with 0 size
> 
> Unfortunately this lead to thousands of 404 in the server log, which "hides"
> the real issues, so returning 0 size was a practical solution until RC2...
> 
You do need to remove the empty files. There is no way to differentiate between a broken composite repo and a non existing composite repo otherwise.

> The point is: there seems to be some changes in the way how p2 handles "bad"
> sites. In my opinion it should always try to use "classic" site.xml mechanism
> as the last resort, which worked perfectly until RC2 (or RC1). With the new
> code the "failback" solution didn't work anymore, at least not in my case. I
> guess I'm not alone with "bad" site content.
> 
It will find the site.xml if you remove the empty files. The handling of "bad sites" is indeed an improvement - you have masked your old update manager site with a broken p2 site to try to trick it. As the error handling has been improved (as people want to be told that their p2 content is broken and not that it could not find a site.xml) you are affected by this improvement.

The use of "300 multiple choices" is just silly - and I strongly suggest you move to some other hosting arrangement that generates 404 for non existing files.

Comment 23 Andrey Loskutov CLA 2009-05-29 10:03:30 EDT
Thanks Henrik, I see that I have no choice... I will definitely not change my (free) hoster, but I will delete the 0 size files :-(
Comment 24 Henrik Lindberg CLA 2009-05-29 10:39:35 EDT
(In reply to comment #23)
> Thanks Henrik, I see that I have no choice... I will definitely not change my
> (free) hoster, but I will delete the 0 size files :-(
> 
I don't know if that is enough since it still returns HTTP 300 for compositeContent.jar
Comment 25 Henrik Lindberg CLA 2009-05-30 10:37:13 EDT
(In reply to comment #24)
> (In reply to comment #23)
> > Thanks Henrik, I see that I have no choice... I will definitely not change my
> > (free) hoster, but I will delete the 0 size files :-(
> > 
> I don't know if that is enough since it still returns HTTP 300 for
> compositeContent.jar
> 
I opened Bug 278383 to deal with the HTTP 300 status code.
Comment 26 Henrik Lindberg CLA 2009-06-01 20:42:50 EDT
(In reply to comment #25)
> (In reply to comment #24)
> > (In reply to comment #23)
> > > Thanks Henrik, I see that I have no choice... I will definitely not change my
> > > (free) hoster, but I will delete the 0 size files :-(
> > > 
> > I don't know if that is enough since it still returns HTTP 300 for
> > compositeContent.jar
> > 
> I opened Bug 278383 to deal with the HTTP 300 status code.
> 
As you may have seen. Bug 278383 was just fixed and a patch released to HEAD - and it will be in 3.5RC4.
While testing it was also reported that your site loaded just fine - so maybe you found some work around for the HTTP 300 problem.

Comment 27 Andrey Loskutov CLA 2009-06-02 00:54:31 EDT
(In reply to comment #26)
> While testing it was also reported that your site loaded just fine - so maybe
> you found some work around for the HTTP 300 problem.

Thank you for quick fix!
I've just deleted all the zero length files, as you've proposed.
... going to count 404 errors ... ;-)