Eclipse Install/Update design concept include grouping artifacts called features published on a remote server on an Update site. A feature consists of the feature manifest file, NL property files, images, licenses, copyrights and other resources placed in a single JAR archive. When directed at the update site, Eclipse Update Manager must download each of these JARs and parse the manifest in order to perform activities such as site browsing, searching, dependency checking etc.
This approach works reasonably well for moderate update sites, but we are facing scalability problems with the upcoming Callisto update site that contains hundreds if not thousands of features. Each of the feature JARs is small, but opening a connection and downloading this small JAR is costly and adds up. Even worse, users need to pay this price BEFORE they even decide if they want to install anything from the site. A solution is needed to reduce the number of connections simply to browse or search the update site.
Once the features to install have been selected, Update needs to physically download plug-in JARs onto user's machine. At this point, payload size ceases to be trivial - a full Callisto download is several hundred megabytes. A technique to reduce the payload size would benefit users who are downloading the full Callisto set.
We propose a site enhancement utility that can run on any update site and produce artifacts that will address the problems mentioned above. In addition to the utility, enhancements of the Install/Update code will be made in order to make Update capable of consuming these artifacts. The performance enhancement is optional and Install/Update should continue to perform as today in their absence.
The core of the solution is in using update site digests to merge all the information needed for browsing and searching into one file (digest.xml) that is archived for size and downloaded using one connection instead of many separate connections for individual features. Once the install decision has been made, the use of Pack200 utility (a part of J2SE 5.0) on plug-in and fragment archives will make the payload smaller and faster to transport.
The utility will be some kind of a command line tool fully driven by the content of the site.xml file:
<utility-name> site.xml
The file will contain additional attributes that take advantage of the performance enhancements. The role of the utility will be to generate required artifacts according to the specification in site.xml. This specification is accomplished using additional attributes of the element 'site':
Example:
<site digest="digests/digest.xml" digestLocales="en_US,ja_JP,de_DE,fr_FR" pack200="true"> ....</site>
The utility will use digest and pack200 attributes as input. If digest is present, it will cause the generation of digest files (the default as well as for each supported locale), and the addition/update of the digestLocales attribute listing all the locales for which a digest has just been generated. If pack200 is present, it will cause the utility to call pack200.exe on each plug-in or fragment archive (say, com.example.xyz.jar) and generate a packed version (com.example.xyz.jar.pack.gz).
It is the responsibility of the utility caller to ensure that it is run regularly in order to keep the generated content in sync with the source. Failure to do so will result in stale content and browsing or installation errors. Digest generator portion of the utility can be simple and always generate all the files if it does not take too much time. On the other hand, pack200 portion must be incremental in order to avoid packing jars that have not changed.
The goal of the update site digests is to minimize the number of files that need to be downloaded in order to browse or search the update site. Digests are made by parsing all the referenced features in site.xml and generated merged content in one XML file. The DTD of the file is implementation detail. Suffice to say is that the support will be added to Update Manager to download the file, expand it, parse it and use it to feed light features sufficient to populate the UI or perform dependency checks. Digest will have a processing instruction listing the digest version to allow future enhancements while retaining backward compatibility.
Once the choice has been made, features selected for installation will be fully downloaded into regular feature objects. The gain is in the ratio of features that are downloaded for searching to those actually needed for the installation process. In update sites such as those for Callisto, ratio is typically 100:1 or higher, hence the gain. Consequently, digests are not needed for small and simple sites.
If Pack200 is indicated in site.xml, Eclipse Update will first try to download "com.example.xyz.jar.pack.gz" when downloading "com.example.xyz.jar" archive. If found, it will be downloaded and unpacked at the client machine. The rest of the process will be as usual. If the file is not present, "com.example.xyz.jar" will be downloaded instead. This may actually slow down the installation due to the redundant connection attempts. For this reason, there are no options on the utility - if the pack200 option is present, the tool will traverse all the plug-ins and fragments that are present at the site and generate a packed version.
Since packed version of the archive needs to be unpacked on the client, it is a prerequisite for the client to have unpack200.exe in the system path. Update Manager will check if it can unpack Pack200 archives before downloading them. For this reason, update sites must have normal versions of the archives in addition to the packed versions.
The location of the unpack200 executable can be specified to update using a system property "org.eclipse.update.jarprocessor.pack200". The value can be one of:
If the property is not set, we will look for unpack200.exe first in java.home/bin, then on the system path. If that fails, unpack will not be used.