Community
Participate
Working Groups
I've seen a few bugs open about mirrors - some don't have file X, some return 404's, some have corrupted files. I feel that the expectations are set too high for mirror sites. Mirror are free hardware, free bandwidth and free sysadmin resources that are generally reliable, but not 100% - and this is true for any project, be it Eclipse, Apache or anything. If the expectation is that mirrors should be 100% accurate, then public, non-Foundation-hosted mirrors should be replaced with an elaborate, Foundation-hosted and Foundation-funded global mirroring system. The cost of this is astronomical, as mirror sites currently handle 93% of the zip/gz downloads. Otherwise, it should be generally acknowledged by the community as a whole that mirrors are what they are - gifts of bandwidth, servers and human resources by people and organizations we don't know, and that their unpredictable nature should be expected by all who use them - be it a human via a web page, or a piece of software. D.
I agree with you Denis. Expectations need to be set to keep costs in check. I think the root of the problem with the latest episode of Callisto vs. the Mirror Monsters, was that the update manager responded poorly in the face of broken mirrors. (I'll stop the puns right here). The user experience could reflect poorly on the quality of Eclipse, which is a cause of concern. I also think we dove down the update site route for Callisto without thinking very far ahead about the affects of a large update site with many files on the mirror system, although we had suspicions. But sometimes it's a better lesson to try and fail. We probably need to go back to the table and rethink this thing a little.
A working mirror is great. A broken mirror is bad; 7 years bad luck, anyone? What about daily pinging the mirror sites to find out if they've got the files hosted? It only need be a HEAD command. Or even update the update manager, so that when trying to download a file from a mirror, it does a HEAD check first, and if not, falls back to another mirror. But also, I think that it's Eclipse fighting the way it does things. Having to download an SDK (which contains help, source for the entire platform, PDE stuff for people who don't even want PDE ...) Ideally, all you'd really need is to have the bootstrap RCP-type client, and then pull down the features that you were interested in. The problem is, the build stuff generates one Uber-ZIP file, and you can't download individual plugins as they're updated. Even when you're downloading WTP, you have to download a fresh Eclipse install because the dependencies are screwed ... The Callisto release will really help matters. Instead of having to download everything, you can pick what you want. Of course, it will also require that the individual projects make things easier to split apart (e.g. JDT without the PDE, not without help etc.) There's also the problem of people not being able to access FTP sites (and really, there's no point in FTP for downloads any more; HTTP has surpassed it in capability) which results more often than not (for me) in downloding it from Eclipse.org, simply because I know that's an HTTP site. I put a proposal in to change the update manager to allow plugins to be d/l individually, rahter than the Uber-zip files, but it's not going to be considered until Eclipse really changes direction in the 4.x releases at the earliest. I think that ultimately, viewing Eclipse as an OSGi-bundle updating platform is probably going to be the solution, and looking at an Eclipse repository of plugins rather than a place where ZIPs are downloaded is going to be the future. But of course by then, more people will be using Eclipse so you'll still have the same level of traffic ;-) You do a good job guys, keep it up.
(In reply to comment #2) > There's also the problem of people not being able to access FTP sites (and > really, there's no point in FTP for downloads any more; HTTP has surpassed it > in capability) which results more often than not (for me) in downloding it from > Eclipse.org, simply because I know that's an HTTP site. Are you talking about FTP access from Update Manager? They have a way of filtering out FTP sites from the mirrorsURL. Have you opened a bug for this? If not, then I'll hack some rewrite rules that shift you to our own FTP server ... just for you :) D.
Re comment #3; Last time I mentioned it, you said you'd see what you could do ... http://www.eclipsezone.com/forums/thread.jspa?messageID=91988404� Raising a bug seems like a good first step :-)
how about an automated script for making bittorrent links for _all_ content available from eclipse - notably the webtools project & dependences like emf? often times there's simply not a seed for the eclipse content that i want to download so i'm more dependent on fast mirrors. bittorrent only works for popular files - but for those, like new releases, it is probably the best way. since you're probably gonna get hammered when 3.2 comes out and also the rest of the 'train', it might be a good time to start planning...
Lets see ... this is a bug about expecations being wrong ... hmm, whose responsible for fixing that wetware?! :) Just kidding ... I did want to voice my thanks for all the patient education taking place, such as https://dev.eclipse.org/committers/help/howdoi.php#downloads.put I've googled some for "file mirror reliability" to educate my self more, anyone know of especially good references to help "fix" our "expectations" -- or is it a hopeless task for us mere mortals? :) While not exactly the same thing being discussed here, I find statistics such as following link very interesting -- it is instructive to see how much variety and variability there is. http://ftp.iasi.roedu.net/mirrors-status/
(In reply to comment #5) > bittorrent only works for > popular files - but for those, like new releases, it is probably the best way. No it isn't. The most popular files (during a new Eclipse release) are already available in BitTorrent, with a link displayed prominently on www.eclipse.org/downloads/ - and only a very small spec of our users pick the torrents. Until a bittorrent client is incorporated in FireFox/MSIE (without the need for users to download a plugin/application), I won't invest any time whatsoever in a permaseed solution for bittorrent - it's just not the bandwidth savior BitTorrent fanboys make it out to be. D.
Just wondering, does eclipse.org have any statistics on the reliability of its mirrors? If so, could we limit the list of mirrors that the callisto update site refers to the most reliable?
(In reply to comment #8) > Just wondering, does eclipse.org have any statistics I haven't made any effort in measuring reliability after jan. 2005. I've invested a lot of time and energy in our current mirror system, and I hesitate to invest even more for little yield. I think a major issue is that mirrors sync the TIME file first (before gigabytes of new files, which could take hours) so they report themselves as being up-to-date without that being true. I've added a 3-hr delay between the mirror's TIMEstamp and the file's timestamp to compensate for this. Mirrors will take longer to show up on the list for new files, but the listed mirrors have a better chance of having the file. BTW: download.php *is* open-source, so patches are welcome. D.
Mirrors are always going to be unreliable unless you pay for them, ala akamai. In the face of that reality, the software needs to be robust enough to tolerate that with retries, resumes, checksums, etc.. Here's wishing that Google will use those tractor-trailer data centers for free and reliable mirrors/caching someday.
Perhaps the expectation (if users of free open source software are allowed to have any) is not so much that "mirrors should be 100% accurate". Rather, it's that the Eclipse website should not present someone who's trying to download a file with a list of mirrors which is -at least for some types of files- rather regularly useless. Here's a typical example: http://download.eclipse.org/eclipse/downloads/index.php 3.2 Stream Nightly Build swt-N20060315-0010-win32-win32-x86.zip Results: Selected mirror (bold label "Download from:"): 404 Not Found Mirror #1: An error has occurred. [...] 404 File not found. Mirror #2: Sorry, there is a problem accessing this item. It may not exist. Mirror #3: Object not found! Error 404 Mirror #4: 404 Not Found Mirror #5: 404 Not Found Mirror #6: Object not found! The requested URL was not found on this server. Mirror #7: 530-Sorry, I'm now too busy. 530 Login incorrect. Mirror #8: 404 Not Found Mirror #9: 550 [...] No such file or directory. Mirror #10: 550 [...] No such file or directory Main Download Site (eclipse.org): Has the file.
Re: comment #11, I agree that this is the kind of thing that can annoy users. But I didn't think that many of the mirrors took nightly builds? Of course, if they don't carry nightly builds they probably shouldn't be shown in the list :-)
The root of the problem is the mirror dialog in Eclipse. Basically, the main site should know which mirror has which files (and can actually deliver them). What I envision is something along these lines: - User wants feature X - Eclipse asks the main site for X - Main site tells Eclipse which mirrors have X - Eclipse chooses the best mirror based on the user prefs - Eclipse downloads X - Eclipse tells the main site about success/failure. - If there was a failure, Eclipse will try with the next mirror This would also allow to find any existing plugin, if we forced all plugin writers to register their work on the main site. Anyone ever tried to install GMF? You just have to add five(!) non-standard update sites, install the various parts in the correct order (which you will get right after attempt 3) and when you finished this, you'll download and install the ZIPs because the update won't work.
(In reply to comment #11) > http://download.eclipse.org/eclipse/downloads/index.php > 3.2 Stream Nightly Build > swt-N20060315-0010-win32-win32-x86.zip This is a bug with download.php - we recently excluded Nightly builds from our RSYNC configuration to save disk space (and bandwidth) but the mirror list doesn't know about the excluded directories. See bug 132324. D.
(In reply to comment #13) Your comments are not unlike Ed's, comment 10. Thanks. D.
I just wanted to document here, for those tracking the general nature of mirrors, that some of the bugs, that gave rise to *this* bug, (such as bug 131031 and bug 131026) have turned out to be real bugs (not just some mysterious unreliability). That is, with persistence, they were tracked down, fixed, and the "reliability" of the mirroring system was greatly improved (at least, for those parts I use :) So, I just wanted to remind readers that while some expectations may be too high in some cases, we should also not set our expectations too low ... and we should continue to find ways to improve the reliability and predictability of the mirror system. Here's a few concrete thoughts, that I hope are constructive -- if someone from the community had time and effort to pursue: We should have the ability to check, occasionally, the contents of a mirror agains its counterpart against the master copy -- in addition to just its general availability. Also, if someone sees a download web page that says some build is available, but then they drill down and try to download something, it then says "not found", or "access forbidden", a bug should be opened agaiinst that components downloads webpage, asking them to check if a resource is_readable, before they display it, and also, before displaying a "high level" directory is available, that they check all the contents of that directory to be sure its contents are avaialble too. I know we in WTP have fixed up a few places, not, alas, not all yet. So, to emphasize, I'm not saying mirrors should be expected to be 100%, but I think for a while there, they were a lot lower than that because of some of those bugs that have been fixed. Lastly, I'm just one voice, but I do think the reliability of the mirroring system should occasionally be measured (say, once a week?). Then, with those ongoing base line measurements, some sudden change in the nubmers might indicated a special problem that could be corrected earlier than it otherwise would have been. Not to mention ... it would help us know what our expectations should be.
What is the waterlevel for opening a bug? I tried to download 3.0M6 yesterday but the mirror only had the index page, yet. Maybe it would be better if we dug out the idea to spread Eclipse as a small internet updater (like the internet installer of Mozilla) which then connects to the download sites and does it's magic. Then, mirrors could be updated. *After* the update, they would report their new status back ("I'm ready, now!") and that would solve many of these problems.
Here's an idea I'm trying to spread around: have Eclipse Foundation members, especially 'strategic' ones, provide reliable mirrors on their own servers.
(In reply to comment #16) > I just wanted to document here, for those tracking the general nature of > mirrors, that some of the bugs, that gave rise to *this* bug, (such as bug > 131031 and bug 131026) have turned out to be real bugs (not just some > mysterious unreliability). Bug 131026 was a bug with newly introduced functionality (rsync exclusions) and bug 131031 is not technically fixed - I have no way of making sure mirror admins configure their Apache server correctly. Actually, 2 of the 3 bad ones were IBM internal mirrors which I cannot access to see if they're "ok". In my mind, the Eclipse mirroring is no more robust than it was in January: any mirror can be broken at any point in time. Actually, what gave rise to this bug was the expectation that the mirroring system should be able to compensate for shortcomings in Eclipse's Update Manager. Frankly, compared to most other OSS outfits, our mirror system totally rocks. > reliability of the mirroring system should occasionally be measured (say, once > a week?). Agreed. But how do I check IBM's fullmoons (and other internal-only mirrors)? Because if those stop working, or behave erratically as per Bug 131031 comment 6, you know who'll be on the receiving end of a bug ;) If we can solve that, and if someone writes said script to "measure" mirrors, I'm open to running it. Keep in mind that we currently poll each public mirror every hour and remove dead mirrors automatically. (In reply to comment #17) > What is the waterlevel for opening a bug? I tried to download 3.0M6 yesterday > but the mirror only had the index page, yet. If you're accessing web pages on mirror sites, then you should contact the mirror site admin. Although, because Eclipse builds are about 2-3 Gigabytes, it does take a while to fetch the entire file set. (In reply to comment #18) > Here's an idea I'm trying to spread around: have Eclipse Foundation members, > especially 'strategic' ones, provide reliable mirrors on their own servers. I have +1 and -1, -1 and -1: +1: Sounds like a good idea -1: Solves nothing: bug 131031 states two mirrors by IBM and both were misconfigured (by no fault of the admin) -1: The Foundation members already pay for all the IT infrastructure. Are we to expect them to pay for bandwidth that we can otherwise get for free? Perhaps I'm naive in being grateful for all they pay, plus for giving us a rackfull of kick-ass servers for FREE. -1: Some unrelated companies appreciate OSS and like to Give Back to The Community, and providing a mirror is an easy (and effective) way for them to do so. At least that's the message I got from this page: http://mirrors.playboy.com/ D.
For the record, I go out of my way *not* to use the mirrors any more. I can't connect to FTP servers, and a number of the mirror sites are to FTP URLs instead of HTTP URLs. The only one I know for sure that supports HTTP is the Eclipse.org one. I can't even right-click to look at the file links, because every link is a link to an HTML page with an HTTP-redirect to allow a file to be downloaded. So, I just assume that everything else except from the mirror pages is FTP, and the only one that supports HTTP is the main mirror one. This applies both to the Update Manager and the downloads from http://downloads.eclipse.org. Apache's mirroring system is in fact much better than Eclipse's, because they actually render the page with links direct to a specific mirror. If I don't like the mirror, I can change it, but all of the files actually have links that I can right-click on and do 'Save As' instead of having to spawn multiple browser windows that I need to close almost immediately afterwards. It's good to ask for feedback, and it's good that you're getting it. Re: finding out if a mirror is 'live'; you can have scripts that poll for known files and check the header codes via an HTTP HEAD request. If you get a 404, don't prompt that in the list. If you get a 200, then the file exists on the mirror server. Doesn't seem that difficult to achieve. Of course, I've mentioned this in passing before, and each time it goes unnoticed :-)
Denis, I don't think the expectation is that you should measure the performance of internal mirrors such as those at IBM. If you have an issue with an IBM mirror, let us know and we will have it fixed. We don't expect you to fix issues our servers being misconfigured, although it may take some time to determine where the problem lies :-) Bug 131031 The reason that IBM has internal mirrors is that a long time ago when eclipse.org was young and didn't have many external mirrors, IBMers downloading eclipse would kill the servers with the increase in http sessions and bandwidth utilization. Strategic developers may contribute $, servers and developers, but they also consume a lot of code that from the various eclipse.org projects. If they have the resources, it's another way they can give back to our community. As for other OSS contributors, bug 129944 has already been closed :-). Why aren't there large US universities hosting eclipse as they do for fedora? They have lots of bandwidth. Have they been asked to become mirrors? http://fedora.redhat.com/Download/mirrors.html
(In reply to comment #20) > For the record, I go out of my way *not* to use the mirrors any more. That's too bad - you lose your right to complain when Eclipse.org is slow ;) > Apache's mirroring system is in fact much better than Eclipse's, because they > actually render the page with links direct to a specific mirror. If I don't > like the mirror, I can change it, but all of the files actually have links that > I can right-click on and do 'Save As' instead of having to spawn multiple > browser windows that I need to close almost immediately afterwards. Our projects (and the Board) asked for download statistics, so I couldn't do what Apache does without messing with the URL. Besides, it's all subjective. Personally, I don't like the way Apache does it, and that's why I borrowed ideas from MySQL ;) > Re: finding out if a mirror is 'live'; you can have scripts that poll for known > files and check the header codes via an HTTP HEAD request. If you get a 404, > don't prompt that in the list. If you get a 200, then the file exists on the > mirror server. Doesn't seem that difficult to achieve. There are about 25,000 files on download.eclipse.org, and it's perfectly acceptable that not all mirrors mirror all the files (see http://www.eclipse.org/downloads/mir_request.php). What you asked is not difficult at all, we just don't have a) the time to code this and b) anything even close to a requirement to do this. If it doesn't seem difficult to achieve, then by all means, feel free to contribute some code. I'll even put your picture on the righthand side! > Of course, I've mentioned this in passing before, and each time it goes > unnoticed :-) I've asked you to open a bug so we can get comments and votes, and you never have. See this page that describes how best to get something done by the webmaster: http://wiki.eclipse.org/index.php/Webmaster_FAQ#I_asked_the_webmaster_to_do_something.__When_will_it_get_done.3F (In reply to comment #21) > Why aren't there large US universities hosting eclipse as they do for fedora? We occasionally go mirror shopping, but I can't seem to get replies from large Universities with gigabits of bandwidth. Matt will be going on another mirror shopping spree for Callisto. Bah, I'm closing this as INVALID. No matter what I do, I suck, so I'll just come to terms with it ;) D.
Re: comment 22: actually, you said "I'll see what I can do" http://www.eclipsezone.com/eclipse/forums/t63246.html#91989036 I did ask whether you filed it as a bug, but you never responded to that. More than happy to help out and file a bug, but you didn't leave the ball in my court with the bug request; you dropped it.
Bug 134630 raised to ensure the ball doesn't get dropped again.
Bug 134634 added to request that links are direct links to downloadable files, not indirection through HTML pages with http-equiv hacks.