platform-update-home/doc/working/site-performance-enhancement.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (download) (as text) (annotate)
Mon Apr 3 22:06:04 2006 UTC (3 years, 7 months ago) by dejan
Branch: MAIN
*** empty log message ***
<h1>Update Site Performance Enhancement Utility Proposal</h1>
<h2>The problem</h2>
<p>Eclipse Install/Update design concept include grouping artifacts called <b>
features</b> published on a remote server on an <b>Update site</b>. A feature 
consists of the feature manifest file, NL property files, images, licenses, 
copyrights and other resources placed in a single JAR archive. When directed at 
the update site, Eclipse Update Manager must download each of these JARs and 
parse the manifest in order to perform activities such as site browsing, 
searching, dependency checking etc.</p>
<p>This approach works reasonably well for moderate update sites, but we are 
facing scalability problems with the upcoming Callisto update site that contains 
hundreds if not thousands of features. Each of the feature JARs is small, but 
opening a connection and downloading this small JAR is costly and adds up. Even 
worse, users need to pay this price BEFORE they even decide if they want to 
install anything from the site. A solution is needed to reduce the number of 
connections simply to browse or search the update site.</p>
<p>Once the features to install have been selected, Update needs to physically 
download plug-in JARs onto user's machine. At this point, payload size seizes to 
be trivial - a full Callisto download is several hundred megabytes. A technique 
to reduce the payload size would benefit users who are downloading the full 
Callisto set.</p>
<h2>The solution</h2>
<p>We propose a <b>site enhancement utility</b> that can run on any update site 
and produce artifacts that will address the problems mentioned above. In 
addition to the utility, enhancements of the Install/Update code will be made in 
order to make Update capable of consuming these artifacts. The performance 
enhancement is optional and Install/Update should continue to perform as today 
in their absence.</p>
<p>The core of the solution is in using <b>update site digests</b> to merge all 
the information needed for browsing and searching into one file (digest.xml) 
that is archived for size and downloaded using one connection instead of many 
separate connections for individual features. Once the install decision has been 
made, the use of <b>Pack200</b> utility (a part of J2SE 5.0) on plug-in and 
fragment archives will make the payload smaller and faster to transport.</p>
<h3>The performance enhancement utility</h3>
<p>The utility will be some kind of a command line tool fully driven by the 
content of the site.xml file:</p>
<blockquote>
	<pre>&lt;utility-name&gt; site.xml</pre>
</blockquote>
<p>The file will contain additional attributes that take advantage of the 
performance enhancements. The role of the utility will be to generate required 
artifacts according to the specification in site.xml. This specification is 
accomplished using additional attributes of the element 'site':</p>
<ul>
	<li><b>digest</b> - an optional attribute that points at the site digest 
	file. When the utility finds this attribute, it will generate the update 
	site digests based on all the features referenced in the file. In addition 
	to the default digest (e.g. 'digests/digest.xml') there may be 
	locale-specific versions (e.g. 'digests/digest_en_US.xml').</li>
	<li><b>digestLocales</b> - a comma-separated list of locales for which a 
	digest file exists. The existence of this list prevents Update Manager from 
	making multiple trips to the server and opening connections to the missing 
	files. This list must be exhaustive i.e. is must match the existing digest 
	files. If a particular locale version of the digest is on the server but the 
	local is not listed in this list, it will not be used. The value of this 
	attribute is generated by the utility based on all the different digest 
	locales.</li>
	<li><b>pack200</b> - a boolean attribute indicating that the site contains 
	archives packed using pack200.exe.</li>
</ul>
<p>Example:</p>
<blockquote>
	<pre>&lt;site digest=&quot;digests/digest.xml&quot; digestLocales=&quot;en_US,ja_JP,de_DE,fr_FR&quot; pack200=&quot;true&quot;&gt;
....</pre>
	<pre>&lt;/site&gt;</pre>
</blockquote>
<p>The utility will use <b>digest</b> and <b>pack200</b> attributes as input. If
<b>digest</b> is present, it will cause the generation of digest files (the 
default as well as for each supported locale), and the addition/update of the<b> 
digestLocales</b> attribute listing all the locales for which a digest has just 
been generated. If <b>pack200</b> is present, it will cause the utility to call 
pack200.exe on each plug-in or fragment archive (say, com.example.xyz.jar) and 
generate a packed version (com.example.xyz.jar.gz).</p>
<p>It is the responsibility of the utility caller to ensure that it is run 
regularly in order to keep the generated content in sync with the source. 
Failure to do so will result in stale content and browsing or installation 
errors.</p>
<h3>Update site digests</h3>
<p>The goal of the update site digests is to minimize the number of files that 
need to be downloaded in order to browse or search the update site. Digests are 
made by parsing all the referenced features in site.xml and generated merged 
content in one XML file. The DTD of the file is implementation detail. Suffice 
to say is that the support will be added to Update Manager to download the file, 
expand it, parse it and use it to feed light features sufficient to populate the 
UI or perform dependency checks. Digest will have a processing instruction 
listing the digest version to allow future enhancements while retaining backward 
compatibility. </p>
<p>Once the choice has been made, features selected for installation will be 
fully downloaded into regular feature objects. The gain is in the ratio of 
features that are downloaded for searching to those actually needed for the 
installation process. In update sites such as those for Callisto, ratio is 
typically 100:1 or higher, hence the gain. Consequently, digests are not needed 
for small and simple sites.</p>
<h3>Support for Pack200</h3>
<p>If Pack200 is indicated in site.xml, Eclipse Update will first try to 
download &quot;com.example.xyz.jar.gz&quot; when downloading &quot;com.example.xyz.jar&quot; 
archive. If found, it will be downloaded and unpacked at the client machine. The 
rest of the process will be as usual. If the file is not present, &quot;com.example.xyz.jar&quot; 
will be downloaded instead. This may actually slow down the installation due to 
the redundant connection attempts. For this reason, there are no options on the 
utility - if the pack200 option is present, the tool will traverse all the 
plug-ins and fragments that are present at the site and generate a packed 
version.</p>
<p>Since packed version of the archive needs to be unpacked on the client, it is 
a prerequisite for the client to have unpack200.exe in the system path. Update 
Manager will check if it can unpack Pack200 archives before downloading them. 
For this reason, update sites must have normal versions of the archives in 
addition to the packed versions.</p>