Bug 131956 - Mirroing system expectations seem too high
Summary: Mirroing system expectations seem too high
Status: RESOLVED INVALID
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: Website (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: phoenix.ui CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 128187 128556 131026 131031
  Show dependency tree
 
Reported: 2006-03-15 10:49 EST by Eclipse Webmaster CLA
Modified: 2006-04-03 18:13 EDT (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eclipse Webmaster CLA 2006-03-15 10:49:11 EST
I've seen a few bugs open about mirrors - some don't have file X, some return 404's, some have corrupted files.

I feel that the expectations are set too high for mirror sites. Mirror are free hardware, free bandwidth and free sysadmin resources that are generally reliable, but not 100% - and this is true for any project, be it Eclipse, Apache or anything.

If the expectation is that mirrors should be 100% accurate, then public, non-Foundation-hosted mirrors should be replaced with an elaborate, Foundation-hosted and Foundation-funded global mirroring system. The cost of this is astronomical, as mirror sites currently handle 93% of the zip/gz downloads.

Otherwise, it should be generally acknowledged by the community as a whole that mirrors are what they are - gifts of bandwidth, servers and human resources by people and organizations we don't know, and that their unpredictable nature should be expected by all who use them - be it a human via a web page, or a piece of software.

D.
Comment 1 Doug Schaefer CLA 2006-03-15 13:05:37 EST
I agree with you Denis. Expectations need to be set to keep costs in check. I think the root of the problem with the latest episode of Callisto vs. the Mirror Monsters, was that the update manager responded poorly in the face of broken mirrors. (I'll stop the puns right here). The user experience could reflect poorly on the quality of Eclipse, which is a cause of concern.

I also think we dove down the update site route for Callisto without thinking very far ahead about the affects of a large update site with many files on the mirror system, although we had suspicions. But sometimes it's a better lesson to try and fail. We probably need to go back to the table and rethink this thing a little.
Comment 2 Alex Blewitt CLA 2006-03-15 16:02:28 EST
A working mirror is great. A broken mirror is bad; 7 years bad luck, anyone?

What about daily pinging the mirror sites to find out if they've got the files hosted? It only need be a HEAD command. Or even update the update manager, so that when trying to download a file from a mirror, it does a HEAD check first, and if not, falls back to another mirror.

But also, I think that it's Eclipse fighting the way it does things. Having to download an SDK (which contains help, source for the entire platform, PDE stuff for people who don't even want PDE ...) Ideally, all you'd really need is to have the bootstrap RCP-type client, and then pull down the features that you were interested in.

The problem is, the build stuff generates one Uber-ZIP file, and you can't download individual plugins as they're updated. Even when you're downloading WTP, you have to download a fresh Eclipse install because the dependencies are screwed ...

The Callisto release will really help matters. Instead of having to download everything, you can pick what you want. Of course, it will also require that the individual projects make things easier to split apart (e.g. JDT without the PDE, not without help etc.) 

There's also the problem of people not being able to access FTP sites (and really, there's no point in FTP for downloads any more; HTTP has surpassed it in capability) which results more often than not (for me) in downloding it from Eclipse.org, simply because I know that's an HTTP site.

I put a proposal in to change the update manager to allow plugins to be d/l individually, rahter than the Uber-zip files, but it's not going to be considered until Eclipse really changes direction in the 4.x releases at the earliest. I think that ultimately, viewing Eclipse as an OSGi-bundle updating platform is probably going to be the solution, and looking at an Eclipse repository of plugins rather than a place where ZIPs are downloaded is going to be the future.

But of course by then, more people will be using Eclipse so you'll still have the same level of traffic ;-)

You do a good job guys, keep it up.
Comment 3 Eclipse Webmaster CLA 2006-03-15 16:20:35 EST
(In reply to comment #2)
> There's also the problem of people not being able to access FTP sites (and
> really, there's no point in FTP for downloads any more; HTTP has surpassed it
> in capability) which results more often than not (for me) in downloding it from
> Eclipse.org, simply because I know that's an HTTP site.

Are you talking about FTP access from Update Manager? They have a way of filtering out FTP sites from the mirrorsURL. Have you opened a bug for this? If not, then I'll hack some rewrite rules that shift you to our own FTP server ... just for you  :)

D.
Comment 4 Alex Blewitt CLA 2006-03-15 18:05:58 EST
Re comment #3; Last time I mentioned it, you said you'd see what you could do ...

http://www.eclipsezone.com/forums/thread.jspa?messageID=91988404&#91988404

Raising a bug seems like a good first step :-)
Comment 5 Darrell Kundel CLA 2006-03-15 21:03:33 EST
how about an automated script for making bittorrent links for _all_ content available from eclipse - notably the webtools project & dependences like emf?  often times there's simply not a seed for the eclipse content that i want to download so i'm more dependent on fast mirrors.  bittorrent only works for popular files - but for those, like new releases, it is probably the best way.  since you're probably gonna get hammered when 3.2 comes out and also the rest of the 'train', it might be a good time to start planning...
Comment 6 David Williams CLA 2006-03-15 23:30:36 EST
Lets see ... this is a bug about expecations being wrong ... hmm, whose responsible for fixing that wetware?! :)

Just kidding ... I did want to voice my thanks for all the patient education taking place, such as https://dev.eclipse.org/committers/help/howdoi.php#downloads.put

I've googled some for "file mirror reliability" to educate my self more, anyone 
know of especially good references to help "fix" our "expectations" -- or is it a hopeless task for us mere mortals? :)

While not exactly the same thing being discussed here, I find statistics such as following link very interesting -- it is instructive to see how much variety and variability there is.
http://ftp.iasi.roedu.net/mirrors-status/

Comment 7 Eclipse Webmaster CLA 2006-03-16 11:26:16 EST
(In reply to comment #5)
> bittorrent only works for
> popular files - but for those, like new releases, it is probably the best way. 

No it isn't.  The most popular files (during a new Eclipse release) are already available in BitTorrent, with a link displayed prominently on www.eclipse.org/downloads/ - and only a very small spec of our users pick the torrents. Until a bittorrent client is incorporated in FireFox/MSIE (without the need for users to download a plugin/application), I won't invest any time whatsoever in a permaseed solution for bittorrent - it's just not the bandwidth savior BitTorrent fanboys make it out to be.

D.
Comment 8 Kim Moir CLA 2006-03-16 17:36:10 EST
Just wondering, does eclipse.org have any statistics on the reliability of its mirrors? If so, could we limit the list of mirrors that the callisto update site refers to the most reliable?
Comment 9 Eclipse Webmaster CLA 2006-03-16 19:56:28 EST
(In reply to comment #8)
> Just wondering, does eclipse.org have any statistics 

I haven't made any effort in measuring reliability after jan. 2005.  I've invested a lot of time and energy in our current mirror system, and I hesitate to invest even more for little yield.

I think a major issue is that mirrors sync the TIME file first (before gigabytes of new files, which could take hours) so they report themselves as being up-to-date without that being true.  I've added a 3-hr delay between the mirror's TIMEstamp and the file's timestamp to compensate for this.  Mirrors will take longer to show up on the list for new files, but the listed mirrors have a better chance of having the file.


BTW: download.php *is* open-source, so patches are welcome.

D.
Comment 10 Ed Burnette CLA 2006-03-16 19:59:54 EST
Mirrors are always going to be unreliable unless you pay for them, ala akamai.
In the face of that reality, the software needs to be robust enough to tolerate
that with retries, resumes, checksums, etc..

Here's wishing that Google will use those tractor-trailer data centers for free
and reliable mirrors/caching someday.
Comment 11 Florian Priester CLA 2006-03-17 01:53:11 EST
Perhaps the expectation (if users of free open source software are allowed
to have any) is not so much that "mirrors should be 100% accurate".

Rather, it's that the Eclipse website should not present someone who's
trying to download a file with a list of mirrors which is -at least for
some types of files- rather regularly useless. Here's a typical example:

http://download.eclipse.org/eclipse/downloads/index.php
3.2 Stream Nightly Build
swt-N20060315-0010-win32-win32-x86.zip

Results:

Selected mirror (bold label "Download from:"): 404 Not Found

Mirror #1:  An error has occurred. [...] 404 File not found.
Mirror #2:  Sorry, there is a problem accessing this item. It may not exist.
Mirror #3:  Object not found! Error 404
Mirror #4:  404 Not Found
Mirror #5:  404 Not Found
Mirror #6:  Object not found! The requested URL was not found on this server.
Mirror #7:  530-Sorry, I'm now too busy. 530 Login incorrect.
Mirror #8:  404 Not Found
Mirror #9:  550 [...] No such file or directory.
Mirror #10: 550 [...] No such file or directory

Main Download Site (eclipse.org): Has the file.
Comment 12 Alex Blewitt CLA 2006-03-17 03:57:07 EST
Re: comment #11, I agree that this is the kind of thing that can annoy users. But I didn't think that many of the mirrors took nightly builds?  Of course, if they don't carry nightly builds they probably shouldn't be shown in the list :-)
Comment 13 Aaron Digulla CLA 2006-03-17 04:55:15 EST
The root of the problem is the mirror dialog in Eclipse.

Basically, the main site should know which mirror has which files (and can actually deliver them).

What I envision is something along these lines:

- User wants feature X
- Eclipse asks the main site for X
- Main site tells Eclipse which mirrors have X
- Eclipse chooses the best mirror based on the user prefs
- Eclipse downloads X
- Eclipse tells the main site about success/failure.
- If there was a failure, Eclipse will try with the next mirror

This would also allow to find any existing plugin, if we forced all plugin writers to register their work on the main site.

Anyone ever tried to install GMF? You just have to add five(!) non-standard update sites, install the various parts in the correct order (which you will get right after attempt 3) and when you finished this, you'll download and install the ZIPs because the update won't work.
Comment 14 Eclipse Webmaster CLA 2006-03-17 08:26:03 EST
(In reply to comment #11)

> http://download.eclipse.org/eclipse/downloads/index.php
> 3.2 Stream Nightly Build
> swt-N20060315-0010-win32-win32-x86.zip

This is a bug with download.php - we recently excluded Nightly builds from our RSYNC configuration to save disk space (and bandwidth) but the mirror list doesn't know about the excluded directories.  See bug 132324.

D.
Comment 15 Eclipse Webmaster CLA 2006-03-17 08:27:15 EST
(In reply to comment #13)

Your comments are not unlike Ed's, comment 10. Thanks.

D.
Comment 16 David Williams CLA 2006-04-02 06:46:54 EDT
I just wanted to document here, for those tracking the general nature of mirrors, that some of the bugs, that gave rise to *this* bug, (such as bug 131031 and bug 131026) have turned out to be real bugs (not just some mysterious unreliability).

That is, with persistence, they were tracked down, fixed, and the "reliability" of the mirroring system was greatly improved (at least, for those parts I use :) 

So, I just wanted to remind readers that while some expectations may be too high in some cases, we should also not set our expectations too low ... and we should continue to find ways to improve the reliability and predictability of the mirror system. 

Here's a few concrete thoughts, that I hope are constructive -- if someone from the community had time and effort to pursue: 

We should have the ability to check, occasionally, the contents of a mirror agains its counterpart against the master copy -- in addition to just its general availability. 

Also, if someone sees a download web page that says some build is available, but then they drill down and try to download something, it then says "not found", or "access forbidden", a bug should be opened agaiinst that components downloads webpage, asking them to check if a resource is_readable, before they display it, and also, before displaying a "high level" directory is available, that they check all the contents of that directory to be sure its contents are avaialble too. I know we in WTP have fixed up a few places, not, alas, not all yet. 

So, to emphasize, I'm not saying mirrors should be expected to be 100%, but I think for a while there, they were a lot lower than that because of some of those bugs that have been fixed. Lastly, I'm just one voice, but I do think the reliability of the mirroring system should occasionally be measured (say, once a week?). Then, with those ongoing base line measurements, some sudden change in the nubmers might indicated a special problem that could be corrected earlier than it otherwise would have been.  Not to mention ... it would help us know what our expectations should be. 


Comment 17 Aaron Digulla CLA 2006-04-02 15:32:37 EDT
What is the waterlevel for opening a bug? I tried to download 3.0M6 yesterday but the mirror only had the index page, yet.

Maybe it would be better if we dug out the idea to spread Eclipse as a small internet updater (like the internet installer of Mozilla) which then connects to the download sites and does it's magic.

Then, mirrors could be updated. *After* the update, they would report their new status back ("I'm ready, now!") and that would solve many of these problems.
Comment 18 Ed Burnette CLA 2006-04-02 16:55:45 EDT
Here's an idea I'm trying to spread around: have Eclipse Foundation members, especially 'strategic' ones, provide reliable mirrors on their own servers.
Comment 19 Eclipse Webmaster CLA 2006-04-03 15:28:35 EDT
(In reply to comment #16)
> I just wanted to document here, for those tracking the general nature of
> mirrors, that some of the bugs, that gave rise to *this* bug, (such as bug
> 131031 and bug 131026) have turned out to be real bugs (not just some
> mysterious unreliability).

Bug 131026 was a bug with newly introduced functionality (rsync exclusions) and bug 131031 is not technically fixed - I have no way of making sure mirror admins configure their Apache server correctly.  Actually, 2 of the 3 bad ones were IBM internal mirrors which I cannot access to see if they're "ok".

In my mind, the Eclipse mirroring is no more robust than it was in January: any mirror can be broken at any point in time.

Actually, what gave rise to this bug was the expectation that the mirroring system should be able to compensate for shortcomings in Eclipse's Update Manager.  Frankly, compared to most other OSS outfits, our mirror system totally rocks.



> reliability of the mirroring system should occasionally be measured (say, once
> a week?). 

Agreed. But how do I check IBM's fullmoons (and other internal-only mirrors)?  Because if those stop working, or behave erratically as per Bug 131031 comment 6, you know who'll be on the receiving end of a bug  ;)

If we can solve that, and if someone writes said script to "measure" mirrors, I'm open to running it.  Keep in mind that we currently poll each public mirror every hour and remove dead mirrors automatically.

(In reply to comment #17)
> What is the waterlevel for opening a bug? I tried to download 3.0M6 yesterday
> but the mirror only had the index page, yet.

If you're accessing web pages on mirror sites, then you should contact the mirror site admin. Although, because Eclipse builds are about 2-3 Gigabytes, it does take a while to fetch the entire file set.
 

(In reply to comment #18)
> Here's an idea I'm trying to spread around: have Eclipse Foundation members,
> especially 'strategic' ones, provide reliable mirrors on their own servers.

I have +1 and -1, -1 and -1:
+1: Sounds like a good idea
-1: Solves nothing: bug 131031 states two mirrors by IBM and both were misconfigured (by no fault of the admin)
-1: The Foundation members already pay for all the IT infrastructure. Are we to expect them to pay for bandwidth that we can otherwise get for free? Perhaps I'm naive in being grateful for all they pay, plus for giving us a rackfull of kick-ass servers for FREE.
-1: Some unrelated companies appreciate OSS and like to Give Back to The Community, and providing a mirror is an easy (and effective) way for them to do so. At least that's the message I got from this page: http://mirrors.playboy.com/

D.
Comment 20 Alex Blewitt CLA 2006-04-03 15:47:03 EDT
For the record, I go out of my way *not* to use the mirrors any more. I can't connect to FTP servers, and a number of the mirror sites are to FTP URLs instead of HTTP URLs. The only one I know for sure that supports HTTP is the Eclipse.org one.

I can't even right-click to look at the file links, because every link is a link to an HTML page with an HTTP-redirect to allow a file to be downloaded. So, I just assume that everything else except from the mirror pages is FTP, and the only one that supports HTTP is the main mirror one.

This applies both to the Update Manager and the downloads from http://downloads.eclipse.org.

Apache's mirroring system is in fact much better than Eclipse's, because they actually render the page with links direct to a specific mirror. If I don't like the mirror, I can change it, but all of the files actually have links that I can right-click on and do 'Save As' instead of having to spawn multiple browser windows that I need to close almost immediately afterwards.

It's good to ask for feedback, and it's good that you're getting it. 

Re: finding out if a mirror is 'live'; you can have scripts that poll for known files and check the header codes via an HTTP HEAD request. If you get a 404, don't prompt that in the list. If you get a 200, then the file exists on the mirror server. Doesn't seem that difficult to achieve.

Of course, I've mentioned this in passing before, and each time it goes unnoticed :-)
Comment 21 Kim Moir CLA 2006-04-03 16:32:12 EDT
Denis, I don't think the expectation is that you should measure the performance of internal mirrors such as those at IBM. If you have an issue with an IBM mirror, let us know and we will have it fixed. We don't expect you to fix issues our servers being misconfigured, although it may take some time to determine where the problem lies :-) Bug 131031

The reason that IBM has internal mirrors is that a long time ago when eclipse.org was young and didn't have many external mirrors, IBMers downloading eclipse would kill the servers with the increase in http sessions and bandwidth utilization.  Strategic developers may contribute $, servers and developers, but they also consume a lot of code that from the various eclipse.org projects. If they have the resources, it's another way they can give back to our community.

As for other OSS contributors, bug 129944 has already been closed :-). Why aren't there large US universities hosting eclipse as they do for fedora?  They have lots of bandwidth. Have they been asked to become mirrors?

http://fedora.redhat.com/Download/mirrors.html

Comment 22 Eclipse Webmaster CLA 2006-04-03 16:44:57 EDT
(In reply to comment #20)
> For the record, I go out of my way *not* to use the mirrors any more.

That's too bad - you lose your right to complain when Eclipse.org is slow  ;)


> Apache's mirroring system is in fact much better than Eclipse's, because they
> actually render the page with links direct to a specific mirror. If I don't
> like the mirror, I can change it, but all of the files actually have links that
> I can right-click on and do 'Save As' instead of having to spawn multiple
> browser windows that I need to close almost immediately afterwards.

Our projects (and the Board) asked for download statistics, so I couldn't do what Apache does without messing with the URL. Besides, it's all subjective.  Personally, I don't like the way Apache does it, and that's why I borrowed ideas from MySQL  ;)


> Re: finding out if a mirror is 'live'; you can have scripts that poll for known
> files and check the header codes via an HTTP HEAD request. If you get a 404,
> don't prompt that in the list. If you get a 200, then the file exists on the
> mirror server. Doesn't seem that difficult to achieve.

There are about 25,000 files on download.eclipse.org, and it's perfectly acceptable that not all mirrors mirror all the files (see http://www.eclipse.org/downloads/mir_request.php).  What you asked is not difficult at all, we just don't have a) the time to code this and b) anything even close to a requirement to do this.

If it doesn't seem difficult to achieve, then by all means, feel free to contribute some code. I'll even put your picture on the righthand side!


> Of course, I've mentioned this in passing before, and each time it goes
> unnoticed :-)

I've asked you to open a bug so we can get comments and votes, and you never have.  See this page that describes how best to get something done by the webmaster:
http://wiki.eclipse.org/index.php/Webmaster_FAQ#I_asked_the_webmaster_to_do_something.__When_will_it_get_done.3F


(In reply to comment #21)
> Why aren't there large US universities hosting eclipse as they do for fedora?  

We occasionally go mirror shopping, but I can't seem to get replies from large Universities with gigabits of bandwidth.  Matt will be going on another mirror shopping spree for Callisto.


Bah, I'm closing this as INVALID.  No matter what I do, I suck, so I'll just come to terms with it  ;)

D.
Comment 23 Alex Blewitt CLA 2006-04-03 17:57:19 EDT
Re: comment 22: actually, you said "I'll see what I can do"

http://www.eclipsezone.com/eclipse/forums/t63246.html#91989036

I did ask whether you filed it as a bug, but you never responded to that. More than happy to help out and file a bug, but you didn't leave the ball in my court with the bug request; you dropped it.
Comment 24 Alex Blewitt CLA 2006-04-03 18:01:22 EDT
Bug 134630 raised to ensure the ball doesn't get dropped again.
Comment 25 Alex Blewitt CLA 2006-04-03 18:13:05 EDT
Bug 134634 added to request that links are direct links to downloadable files, not indirection through HTML pages with http-equiv hacks.