[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [p2-dev] Mirror ranking

I'm sure this algorithm can be improved. Please enter a bug report for it. I like the idea of resetting or lowering the failure count if we later have a successful transfer. How it behaved on the most recent transfer is generally going to be much more interesting than historical behaviour.


Thomas Hallgren <thomas@xxxxxxx>
Sent by: p2-dev-bounces@xxxxxxxxxxx

12/09/2009 01:40 PM

Please respond to
P2 developer discussions <p2-dev@xxxxxxxxxxx>

P2 developer discussions <p2-dev@xxxxxxxxxxx>
[p2-dev] Mirror ranking

I mirrored Helios today and it basically took forever. After a few
hours, I was beginning to wonder what was going on and luckily, the
process ran in a debugger. I found that the top ranked mirror was the
one at eclipse.org. That surprised me since I know that I have a fast
mirror in Sweden that serves up a copy of Helios.

First I checked if this mirror was included in the list served up by the
mirror request to Eclipse.org. It was. Next, I stopped the debugger and
patched the URL for entry number zero in my mirrors list with the URL of
that mirror. I resumed and now the processing went very much faster. So
the mirror was actually OK.

So why did download.eclipse.org move to the top of the list? It's
supposed to be right at the bottom. The algorithm for sorting the
mirrors looks like this:

        public int compareTo(Object o) {
            if (!(o instanceof MirrorInfo))
                return 0;
            MirrorInfo that = (MirrorInfo) o;
            //less failures is better
            if (this.failureCount != that.failureCount)
                return this.failureCount - that.failureCount;
            //faster is better
            if (this.bytesPerSecond != that.bytesPerSecond)
                return (int) (that.bytesPerSecond - this.bytesPerSecond);
            //trust that initial rank indicates geographical proximity
            return this.initialRank - that.initialRank;

A failure count of one will deem the mirror forever worse then a failure
count the zero, no matter if that mirror is a hundred times faster. I
think that was what caused my problem. All mirrors in the list have a
failureCount of 1 and a byte-count of -1, except two,
download.eclipse.org (initialRank = 55) and one other (initialRank=10)
because after some initial failure, they were never given a second chance.

My guess is that something went wrong at the very beginning that caused
all mirrors except download.eclipse.org and node number 10 to fail. Not
sure what that was. That however, moved download.eclipse.org to the top
and node number 10 to second place. And although I have mirrors 100
times faster close by, they are never consulted again. I'm downloading
about 3.800 artifacts.

Mirrors may have temporary and fairly short outages. They may be
incomplete in some respect, or just be under very heavy load for a short
period of time. I think the algorithm could be improved by adding a
periodic retry on mirrors with an initialRank value that indicates that
it is geographically close. I also think that we should have a ratio
between high transfer rate and failure count. Let's say that 5 times
higher transfer rate is worth one failure. Perhaps a successful transfer
should reset the failure count, or at least cut it in half so that
failures are forgiven by subsequent good behavior.

One question that I don't know the answer to at this point is what
happens when an artifact is missing although it should be there
according to the artifact repository. Will the mirror get punished by
that? If that's the case, then it's not so good. The same will be true
on all mirrors but the best one will be punished.

What do you think?

- thomas

p2-dev mailing list