Improved Mirroring Support in Eclipse Install/Update

By Branko Tripkovic and Dejan Glozic
11/07/2005

Background

Current mirroring support in Eclipse Install/Update was added on the request of Eclipse foundation just before Eclipse 3.1 went live in order to help with the traffic spikes during the release times. However, its current implementation leaves a lot to be desired. It requires too much of the annoying user interaction and currently there is no way around this (bug 97806). In addition, there is no standard way for the user to select the correct mirror.

At the moment we support mirrors using optional mirrorsURL attribute in the site tag of site.xml which points to an xml file that contains the update site mirror definitions. This file has mirrors defined by the mirror tag with attributes that define mirror's name and URL. Once Update detects mirrorsURL attribute in the site tag of site.xml it downloads the file and pops up a dialog with a list of mirrors for the user to select.

Goals

We would like to improve the mirroring support in Eclipse Install/Update and have the following goals:

  1. Automate choosing or at least help the user with selecting the most appropriate mirror. Ideally we would like no user input, however there is not enough information on the system to prevent this all the time. We want to make this process as simple as possible both for the end user and the service providers. We also want to have algorithm for determining the best available mirror that is not too CPU and/or memory-intensive.
     
  2. Provide ISVs building applications on top of the Eclipse platform (e.g. RCP applications) with the way to contribute alternate mirror sorting algorithms. This will allow companies to solve specific problems such as national rules, business policies, network performances etc.
     
  3. Propose a server-side alternative to the client side mirror selection.

Options

There are several possible options to improve the mirror handling. They can be first classified based on the place where mirror determination takes place (client or server). Client-side choices can be further classified based on the algorithm used to determine the best mirror.

Client-side options

Geographic proximity algorithm

As its name says this algorithm works based on geographic proximity of the mirror to client and usually countries are used for determining proximity.

Issues:

Link cost algorithm

This algorithm works by measuring cost (number of hops) of connecting to different mirrors. It is a better algorithm since it calculates the real cost, not the estimated cost.

Issues:

Server-side options

Mirroring support is usually put on the server. There are several reasons for this. It is much easier to change mirror selecting logic. It is also much easier to update data that mirroring logic uses to select correct mirrors. Finally, servers usually know more about networks and other servers then clients. Work in update to support server side mirror selection is very limited, since update has only to understand one of the two standard ways that 'redirect' instructions from the server are issued. These two standard ways are:

  1. one of the 300-series http return codes (probably 300) issued by the server to redirect traffic to another site
  2. using meta tag in the HTML header that defines the redirect (mirror) site

Proposed Solutions

Client-side

If we decide to go with client side solution we will have to define extension point to support plugging of algorithms from other sources, and extend definitions of the site tag in site.xml and mirror tag in the mirrors definition file to support this. It would be done as follows:

Extension Point

A new extension point will be added to the Install/Update Core plug-in:

<extension point="org.eclipse.update.core.mirrorSorter">
<sorter
id="com.example.xyz.MirrorSorter"
class="com.example.xyz.update.XYZMirrorSorter"
/>
</extension>

The extension point provides for registering a mirror sorter with a unique identifier referenced from within the site.xml using the newly added sorter-id attribute.

All of the extender classes must implement following interface:

public interface IMirrorSorter {
/**
* Accepts a list of mirrors as defined in the definition file and sorts them based on
* preferred use, with the best mirrors at the head of the list.
* @param candidates mirrors to sort
* @return the ordered list of mirrors (the best mirrors first)
*/
public IMirror[] sort(IMirror[] candidates);
}

public interface IMirror {
/**
* Returns the URL as a string (as defined in the mirror definition file)
* @return mirror URL as a string
*/
public String getAddress();
/**
* Returns the mirror label (as defined in the mirror definition file)
* @return mirror label
*/
public String getLabel();
/**
* Returns mirror property value given the property name
* @return value of the named mirror property or <code>null</code> if not defined
*/
public String getProperty(String name);
}

Augmented Site Tag Definition in site.xml:

<!ATTLIST site 
    type          CDATA #IMPLIED
    url           CDATA #IMPLIED
    mirrorsURL    CDATA #IMPLIED
sorter-id CDATA #IMPLIED
>

where sorter-id represents the identifier of the mirror sorting class that is registered user of extension point mirrorSorter. If sorter-id is not present, the default sorter provided by Install/Update will be used (the exact identifier to be defined).

Augmented Mirror Tag Definition in mirrors definition file:

<!ELEMENT mirror (property*)> 
<!ELEMENT properties>
<!ATTLIST property
    name          CDATA #REQUIRED
    value         CDATA #REQUIRED
>
where property element carries sorter-specific information that can be used to determine suitable mirror.

Install/Update default mirror sorter implementation

An issue with both of the client-side options above is that they are based on the assumption that all mirrors are created equal i.e. they have same performance. In our case, this is not true. We can try to alleviate this issue by assigning mirror ratings and use them in our calculations as follows:

  1. Divide the world intro large regions (East Cost USA, Central USA, West Cost USA, Europe, East Asia), assign countries and states/provinces to these regions and assign the time zone to each region.
  2. Put this information in a comma-separated value (CSV) file. Format would be:
  3. Region, Country, State/Province, Time Zone
  4. Make a weighted graph of the world's adjacent regions. This graph would be put in a CSV file. Format would be:
  5. Region, Region, Weight
  6. Weight is an approximate network round trip time (closely related to geographic distance, but not the same) in integers 1-5
  7. User will be allowed to set his/her location, country and geographic region in the preferences. If user does not provide this information, system-provided country and time zone will be used in our calculations
  8. Allow the user to have "Select the mirror automatically" and "Always prompt" mutually exclusive choices in the preferences.
  9. Service provider will be required to supply a mirror definition file with the following additional properties defined for each mirror: region, country, state/province, time zone, rating (actual property names to be defined later)

    Based on this information we will sort mirrors in the following manner:
     

Install/Update will use the list of mirrors obtained from the sorter by looping through it until we get the response from one of them. This would allow us to skip the unresponsive mirrors.

This solution does not address the problem where the user and his/hers gateway is not in the physical proximity of each other.

Server side

As previously mentioned, the beauty of the server side solution is that we do not have to define mirror selection algorithm in advance. It can be changed on the fly and on each site independently. However we should decide in advance which method of redirecting we will use (although both can be supported at the same time). There are several ways to do this that in use on the web:

  1. Using HTTP status codes and new location interpreted by clients
  2. Using meta tag in HTML
  3. Using JavaScript

Since the second solution is HTML specific and the third requires a JavaScript interpreter we suggest using the HTTP 300 status code with a sorted list of provided mirrors. Sorting can be done using either the client's IP address and an IP to Geographical location database, using a client provided location (using a GET request issued by the client with 'country' and either 'time zone' or 'state/province' as parameters), or some other way that can be devised by the site implementing mirroring support. In any case using client provided location information, a server-side algorithm similar to the one presented for the client side solution can be used. Our proposal is for the update to pass the location information at all times and let the server decide if it wants to use it. This way we will not dictate the choice of algorithms in advance.

Provided here is a php script with configuration files that will sort mirrors based on the client's country and time zone. Country and time zone are expected to be provided by as GET parameters. We have adopted the above mentioned default mirror sorter for the client side solution (one difference is that mirrors are in the CSV file and we did not do mirror rating). As an added bonus, this script can be used to generate sorted mirrors lists for current implementations of mirror support. We also provide patches for the org.eclipse.update.core and org.eclipse.update.ui plug-ins of the Update component that enables the script to be used in this manner. These patches add the "Automatically select mirror" checkbox on the Update preferences page. When checked, Update will automatically select the mirror from the mirror list. Also, the timeZone and countryCode parameters are added to every mirror request.

Server side solutions can also resolve issues of a user's location differing from their gateway location. For example, it is well known what IP address ranges are used by certain large organizations and where their internet gateways are located and this can be included in the decision making process.

Conclusion

We recommend that the server-side solution is selected for handling mirrors in the Update component.

Note: All files contained herein are provided as is. If they are to be used in production environments, error checking and other miscellaneous improvements should be added.