Bug 297805 - Mirror ranking needs to be improved
Summary: Mirror ranking needs to be improved
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.6   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.6   Edit
Assignee: Thomas Hallgren CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-15 02:54 EST by Thomas Hallgren CLA
Modified: 2011-06-16 10:34 EDT (History)
8 users (show)

See Also:


Attachments
Patch to improve mirror selection (10.75 KB, patch)
2010-02-03 13:32 EST, Thomas Hallgren CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Hallgren CLA 2009-12-15 02:54:55 EST
Mirrors may have temporary and fairly short outages. They may be incomplete in some respect, or just be under very heavy load for a short period of time. In some cases an artifact may be missing physically although it's listed in the artifacts.xml (same on all mirrors).

Any of those failures will increase the failure count for the mirror in question and that is somewhat fatal, since this failure count has higher precedence than download speed and geographic location. Once the count is increased, the mirror is forever deemed bad throughout the download (which can be thousands of artifacts).

I think the algorithm could be improved by adding a periodic retry on mirrors with an initialRank value that indicates that it is geographically close. I also think that we should have a ratio between high transfer rate and failure count. Let's say that 5 times higher transfer rate is worth one failure. Perhaps a successful transfer should reset the failure count, or at least cut it in half so that failures are forgiven by subsequent good behavior.

The fact that a mirror gets punished for problems that exists on all mirrors (the out of sync) is exceptionally bad since it will hit the top ranked mirrors first and thus, put them out of commission. I think that is what's causing bug 297408.
Comment 1 Thomas Hallgren CLA 2009-12-15 02:57:46 EST
The text bug 297408 was supposed to show up as a link. Perhaps bugzilla fail to recognize that when it ends up with a line break.
Comment 2 Pascal Rapicault CLA 2009-12-31 05:08:10 EST
If I remember correctly the MayInstall project had some interesting ranking (and even abort and restart) strategies. We may be able to get some code from there. Jed, could you point us at the code if it is available in the open?
Thomas, Henrik do you want to take ownership of this bug?
Comment 3 Thomas Hallgren CLA 2010-01-27 09:53:37 EST
I'll spend some time on this now.
Comment 4 Thomas Hallgren CLA 2010-02-03 13:32:07 EST
Created attachment 158083 [details]
Patch to improve mirror selection

This patch changes the mirror selection in the following ways.

1. A timer is registered that decrements the failure count after some period of time. The time is dependent on the number of failures. The first failure is reset after 30 seconds, the second after 5 minutes. If the total failure count reaches above 2, the timer is not reset.

2. When comparing speed and failure counts, the comparison is using ratios. A failure ratio is considered twice as important then a speed ratio.

3. FileNotFoundExceptions are treated separately. Some files cannot be found on any mirror and it's bad to punish the most popular mirror with a failure. At least 5 FileNotFoundExceptions are needed before the failure count is increased. This is also influenced by the fact that a not found is generally very quick in comparison with a connection failure (most often a socket timeout).
Comment 5 Thomas Hallgren CLA 2010-02-05 17:12:43 EST
I committed this to HEAD.
Comment 6 Martin W. Kirst CLA 2011-06-16 10:34:27 EDT
This new mirror ranking implementation causes Bug 317785
and breaks compatibility with JRE7.
See Bug 317785 for more details.

I will commit Unit-Test and Patch soon.