Bug 233201 - Some servers return a webpage instead of the requested file.
Summary: Some servers return a webpage instead of the requested file.
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.4 RC3   Edit
Assignee: Pascal Rapicault CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed
: 233357 (view as bug list)
Depends on: 234518
Blocks:
  Show dependency tree
 
Reported: 2008-05-21 08:36 EDT by Benno Baumgartner CLA
Modified: 2008-05-30 09:41 EDT (History)
8 users (show)

See Also:
jeffmcaffer: review+
simon_kaegi: review+


Attachments
log (46.51 KB, text/plain)
2008-05-21 08:36 EDT, Benno Baumgartner CLA
no flags Details
file which leads to the error (1.14 KB, application/x-java-vm/java-beans)
2008-05-22 10:14 EDT, Benno Baumgartner CLA
no flags Details
ZipVerification processing step (4.64 KB, patch)
2008-05-27 13:37 EDT, Pascal Rapicault CLA
no flags Details | Diff
Patch (7.98 KB, patch)
2008-05-28 22:07 EDT, Pascal Rapicault CLA
no flags Details | Diff
patch to add artifact descriptor for zip download content (4.02 KB, text/plain)
2008-05-29 09:15 EDT, Benson Wong CLA
wong: review?
Details
Another patch (11.63 KB, patch)
2008-05-29 15:41 EDT, Pascal Rapicault CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benno Baumgartner CLA 2008-05-21 08:36:48 EDT
Created attachment 101240 [details]
log

I20080520-2000

I'm trying to install mylyn through Software Updates dialog from the site:
http://download.eclipse.org/tools/mylyn/update/e3.4

After a while I got an error dialog telling me that "Failed to prepare partial IU: [R]org.eclipse.mylyn 2.3.2.v20080402-2100."

I've attached my complete log since I have no idea what that means.
Comment 1 John Arthorne CLA 2008-05-21 19:14:06 EDT
*** Bug 233357 has been marked as a duplicate of this bug. ***
Comment 2 Pascal Rapicault CLA 2008-05-21 22:41:02 EDT
When this error occurs, please immediately navigate to your plugins or features folder and see if the referenced file is available on disk and if it is valid.
I suspect that we are downloading a file expecting it to be a bundle but it is not (e.g. we get a webpage with a message saying the file is missing).
Comment 3 Pascal Rapicault CLA 2008-05-22 07:21:19 EDT
Another thing to do in this case is to try to contact the server using your web browser and download the file. Most of the time features will be under <url>/features/ and plugins under <url>/plugins/
Comment 4 Benno Baumgartner CLA 2008-05-22 10:14:28 EDT
Created attachment 101496 [details]
file which leads to the error

And this is the files content:

<HTML><HEAD><TITLE>Authentication Proxy Login Page</TITLE><script language="JavaScript"><!-- Begin
var pxypromptwindow1;var pxysubmitted = false;function doreload() {if(pxypromptwindow1.closed) {window.location.reload(true);} else {reloadtimeout=setTimeout("doreload()", 500);}}function submitreload() {if(pxysubmitted == false) {pxypromptwindow1=window.open('', 'pxywindow1','resizable=no,width=300,height=300,scrollbars=yes');reloadtimeout=setTimeout("doreload()", 1000);pxysubmitted = true;return true;} else {alert("This page can not be submitted twice.");return false;}}// -->
</script>
</HEAD>
<BODY onLoad="document.AuthenForm.uname.focus()" BGCOLOR="#FFFFFF" LINK="#ffcc00" ALINK="#ffffff" VLINK="#ffcc00" ><H1>SWG RTP B500 BSO: Enter your IBM intranet credentials  <BR><BR>  HTTPS Authentication<BR><BR><p><FORM name="AuthenForm" method=post action="https://9.42.96.226:443" target="pxywindow1"><input type=hidden name=au_pxytimetag value="281014010">Username: <input type=text name=uname><BR><BR>Password: <input type=password name=pwd><BR><BR><input type=submit name=ok value=OK onClick="return submitreload()"></H1></FORM></script></BODY></HTML>
Comment 5 Benno Baumgartner CLA 2008-05-22 10:15:20 EDT
(In reply to comment #3)
> Another thing to do in this case is to try to contact the server using your web
> browser and download the file. Most of the time features will be under
> <url>/features/ and plugins under <url>/plugins/
> 

How can I find out from which server this file is coming from? It could be any mirror, right?
Comment 6 Pascal Rapicault CLA 2008-05-22 10:57:43 EDT
Thx for getting that !!! This is really informative. You could get this information by looking at the progress view while the dl is on-going.
Comment 7 Pascal Rapicault CLA 2008-05-22 11:04:45 EDT
This file seems to come from the eclipse mirrors that we have in IBM RTP.
Comment 8 Pascal Rapicault CLA 2008-05-22 22:32:19 EDT
I have discussed the issue with Jeff and Matt F. Unfortunately in these kinds of situations nothing is returned in the http header indicating a failure (at least that we can think of).

Here are some potential solutions we have discussed:
- Add an MD5 in the artifacts.jar, this way we can detect the failure. This will not work when connecting to an old style update site since we don't have md5.
- Verify the size of the bytes, again this will not work for an old style update site.
- Do some wacko stuffs checking that we are not getting something starting with <html>

A few other things that needs to be verified:
- why is not this error caught by the signature verifier
- how was update manager handling this issue
Comment 9 Kevin McGuire CLA 2008-05-23 11:53:39 EDT
I think solutions that aren't backwards compatible with existing update sites are reasonable. Also header checks that the thing being returns is the right shape seems reasonable.
Comment 10 Simon Kaegi CLA 2008-05-23 14:08:18 EDT
Here's the repsonse we get from http://www.developer.com/java/ent/article.php/3663041/site.xml for a missing file. ... a blank screen

--

HTTP/1.1 200 OK
Date: Fri, 23 May 2008 17:46:57 GMT
Server: Apache
Transfer-Encoding: chunked
Content-Type: text/html

1  


0

--

Content-Type: text/html might be a good indicator. If a resource really is there a Last-Modified (also Cache-Control and ETag for http requests) header is normally present so this might also help determine if we're getting an error message.


Here's another url that we're likely to run into that will fail with a 200 ok.

http://downloads.sourceforge.net/somebadurl
Comment 11 Pascal Rapicault CLA 2008-05-27 09:47:02 EDT
At the core, this problem occurs because our transport layer is content agnostic. Therefore since we don't know what we expect we just trust whatever the server gives us back.
What Simon and I thought about is to define a content type in the artifact descriptor, thus allowing us to have a processing step validating the file as it is being downloaded.
This processing step would be set by the metadata generator and thus would cover both the case of old update sites and artifact repos.
Comment 12 Pascal Rapicault CLA 2008-05-27 13:37:02 EDT
Created attachment 102200 [details]
ZipVerification processing step

Attached is a first stab at a zip file verification processing step.
Comment 13 Pascal Rapicault CLA 2008-05-28 22:07:27 EDT
Created attachment 102553 [details]
Patch

This patch will verify any artifact descriptor which has a content type defined on it. 
The patch for the generation of the artifact markup will be done a bit later.
Comment 14 Benson Wong CLA 2008-05-29 09:15:37 EDT
Created attachment 102630 [details]
patch to add artifact descriptor for zip download content
Comment 15 Jeff McAffer CLA 2008-05-29 09:27:20 EDT
I'm ok with the direction but have some questions on the actual patch
1) should it not be enough to throw an exception from write()?  Under what cases does write() throw an exception yet we continue to try an write to the stream.

2) In SAR we add the zip verifier.  what about a GZ verifier (are we going to do one of those) and a foo verifier?  This likely needs to be extensible.  What is the intention here?

3) what is the end user experience. They get an error presumably.  Can we give them any help?  "Invalid archive" does not say much.  Perhaps something like "Content type of <URL here> does not match repository description".  Maybe even something hinting that they should try the URL manually or that there may be a firewall in the middle.
Comment 16 Kevin McGuire CLA 2008-05-29 11:38:59 EDT
(In reply to comment #15)

> 3) what is the end user experience. They get an error presumably.  Can we give
> them any help?  "Invalid archive" does not say much.  Perhaps something like
> "Content type of <URL here> does not match repository description".  Maybe even
> something hinting that they should try the URL manually or that there may be a
> firewall in the middle.

Good point.  If there's a firewall you get what a 404 error?  Unreachable vs. malformed is important data for user to diagnose.

Comment 17 Pascal Rapicault CLA 2008-05-29 13:56:35 EDT
> 1) should it not be enough to throw an exception from write()?  
 I wanted to be extra careful here and also set the status since on close the status may still be obtained by the ProcessingStepHandler. Also I'm not sure that throwing the exception is what is best for the end user experience (see point 3)
  
2)This likely needs to be extensible.  What is the intention here?
  I agree and we will see that post 3.4.

> 3) what is the end user experience. They get an error presumably.  Can we give
them any help?  
  Currently we just say "invalid archive" because in the step we don't have enough context, and there is no one catching / rethrowing the exception and enriching it with more context. (This is probably a more general problem that we don't want to tackle now).
  If instead of throwing the exception we were getting a status, then the proper information would be propagated back to the user (like it is done when we encounter a tampered signed jar file).

Also note that most real 404s are handled differently and we end up providing a better message should we need to show it.
Comment 18 Jeff McAffer CLA 2008-05-29 14:23:18 EDT
(In reply to comment #17)
> > 1) should it not be enough to throw an exception from write()?  
>  I wanted to be extra careful here and also set the status since on close the
> status may still be obtained by the ProcessingStepHandler. Also I'm not sure
> that throwing the exception is what is best for the end user experience (see
> point 3)

Sorry, I was a little unclear.  I was on about the "valid" logic.  It seems to be prepared to handle cases where write() is called and valid == -1.  But that should only happen if soeone calls write after an exceptoin has been thrown.  That is confusing to me.  Having said that, it is not harmful to have in there...

> 2)This likely needs to be extensible.  What is the intention here?
>   I agree and we will see that post 3.4.

The other part of my question was what to do about GZ files?  Are you going to add a GZVerifier?

> > 3) what is the end user experience. They get an error presumably.  Can we give
> them any help?  
>   Currently we just say "invalid archive" because in the step we don't have
> enough context, and there is no one catching / rethrowing the exception and

Interesting that the steps have so little context. Good thing on one hand, not so good on the other.  Perhaps we could get something from the OutputStream in the future...  In the more immediate term then how about we go for an error like "Downloaded stream not a valid archive.  Check the artifact repo".  Again, this is assuming that at some point the user is going to see this message.  If they are not, what are they going to see?

Either way, please confirm what the user experience will be.

>   If instead of throwing the exception we were getting a status, then the
> proper information would be propagated back to the user (like it is done when
> we encounter a tampered signed jar file).

We are setting a status.  the issue is (as you point out) that we don't have enough info to make a decent status.
Comment 19 Pascal Rapicault CLA 2008-05-29 15:30:58 EDT
> The other part of my question was what to do about GZ files?  Are you going to
add a GZVerifier?
   Nothing since for now the only GZ files we transfer are pack.gz and an invalid content will fail during the unpacking. Therefore we are safe.
   Long term I think the validation of content could be done independently of the content type using MD5, as described in a previous comment.


>   Currently we just say "invalid archive" because in the step we don't have
> enough context, and there is no one catching / rethrowing the exception and
  I take that back I just ran additional tests and it turns out that the error is properly reported to the user. We say "transfer exception" and indicate the source and in the details we have something saying "invalid archive".
  The only problem with throwing the exception is that since it is wrapped into a job, then it gets shown to the user whereas there may be no problem since we can eventually go and get the file from the another repo.

>>   If instead of throwing the exception we were getting a status, then the proper information would be propagated back to the user (like it is done when we encounter a tampered signed jar file).
> We are setting a status.  the issue is (as you point out) that we don't have
enough info to make a decent status.
  Here I was referring to 404s.
Comment 20 Pascal Rapicault CLA 2008-05-29 15:41:35 EDT
Created attachment 102724 [details]
Another patch

This patch changes the following things:
 - Do not throw an exception because otherwise the user would get confused
 - Remove unnecessary property from the artifact descriptor
 - Move the super.close at the end of the ZipVerifier on Simon's notice that when ArtifactOutputStream was calling back at the ZipVerifier for status check the status would not have been updated.
Comment 21 Jeff McAffer CLA 2008-05-29 16:19:55 EDT
Now that we are not throwing an exception in write() will write be called multiple times after the stream is found to be invalid?  It looks like it.  There is no need to have the valid == -1 test and set the status.  The status should already be set from the byte pattern test (end).  Rather, make is look like 
+	public void write(int b) throws IOException {
+		getDestination().write(b);
+		if (valid >=0 && valid <= 3 && b != ZIP_HEADER[valid++]) {
+			valid = -1;
+			setStatus(new Status(IStatus.ERROR, Activator.ID, Messages.ZipVerifierStep_invalid_archive));
+		}
+	}
Comment 22 Simon Kaegi CLA 2008-05-29 17:11:28 EDT
+1 with a minimal check for -1.
I've verified for the update site and generated case for a simpleartifact repo to validate the portion in Ben's patch.

The new error message is fine:
--
An error occurred while collecting items to be installed
  Problems downloading artifact: osgi.bundle,BrandingPlugin,1.0.0.
    Downloaded stream not a valid archive. Check the server.
--

I also re-tested with:
if (valid == -1)
return;


Comment 23 Pascal Rapicault CLA 2008-05-29 18:02:52 EDT
Patch from Ben and from from Pascal released in HEAD.
Comment 24 Pascal Rapicault CLA 2008-05-29 18:05:26 EDT
really closing
Comment 25 Benno Baumgartner CLA 2008-05-30 03:51:15 EDT
(In reply to comment #22)
> The new error message is fine:
> --
> An error occurred while collecting items to be installed
>   Problems downloading artifact: osgi.bundle,BrandingPlugin,1.0.0.
>     Downloaded stream not a valid archive. Check the server.
> --

Just a quick question: Does the fix for this bug just change the error message, or will I now be able to actually install mylyn? It will select another mirror automatically if it downloads an invalid archive, right?
Comment 26 Pascal Rapicault CLA 2008-05-30 09:41:54 EDT
> It will select another mirror automatically if it downloads an invalid archive, right?
  Yes