The current feature-based update sites can get huge quickly when small
numbers of changes are scattered across a large number of features.
This kind of situation happens easily when releasing a dot-level or a
patchlevel upgrade--ie: 1.0.0 to 1.0.1--where lots of small bugs are
The purpose of this email is to open a dialog and to outline our solution. Apologies in advance for those who are less comfortable with expressing algorithms in formal math. Time constraints have dictated communication using the most compact and efficient means possible.
The algorithm we developed worked as follows:
The build server maintains copies of all production versions of the application. In
the general case, then, we need an algorithm to upgrade a given user
from a version V(n) to a version V(n+delta) where delta is the number
of versions that have elapsed since the user last upgraded. We want
this algorithm to have the following properties:
Send a minimal number of bits across the wire
Be as simple as possible
Be verifiably correct. We couldn't afford to have 40k-70k of broken clients if the algorithm malfunctioned.
to the previous: Have safeguards built-in so that if a bad update site
is ever generated, the whole thing fails fast--before *any* bits are
ever sent to clients.
You will notice some conflicts in the
requirements above. In particular, the requirement to send the fewest
number of bits over the wire conflicts with the requirements to be as
simple as possible and to be verifiably correct. The result was that
our algorithm sacrifices some efficiency in the number of bits it sends
over the wire in order to be simpler and more verifiably correct.
result: In our application, going from version 1.0.0 to version 1.0.1
using update site features resulted in about a 10 meg download. After
applying our algorithm, the download size shrunk to about 3 megs. This
is still large for some of our users who utilize a dial-up connection,
but it is doable, whereas the 10 meg download would simply be
Here is the algorithm:
First, we simplify
the problem. Instead of upgrading users from an arbitrary version V(n)
to V(n+delta), we only upgrade them from V(n) to V(n+1). Going to
V(n+delta) is simply an iteration of the algorithm over delta steps. This simplification works fine for our user population. If it would be acceptable for the general Eclipse population is an open question.
we notice that sending JAR'd plugins is very expensive. Suppose only
two classes in a large plugin change: the whole plugin must still be
Our algorithm then is simple. We dynamically generate
a function F where F(P(V(n))) = P(V(n+1)). F is really just a zipped
collection of files that encodes just the files that changed between
the two plugin versions.
This is accomplished using the following algorithm. For a given plugin P in two versions V(n) and V(n+1):
Unpack P(V(n)) and P(V(n+1)) into temporary directories.
files that correspond to each other in the two plugins, MD5 hash them,
and if the hashes are different, put the file from P(V(n+1)) into F.
Include new files from P(V(n+1)) in F.
Create a text file manifest listing files in P(V(n)) that are no longer included in P(V(n+1)) and include that text file in F.
Package all of the files in F in a zip file.
I believe that the algorithm for applying this function F to P(V(n)) to get a P(V(n+1)) is self-evident.
approach has advantages over binary-diffing the two plugin jars P(V(n))
and P(V(n+1)): If the order of the files contained in the plugin
changes between versions, a naive binary diff algorithm will not
achieve nearly the same amount of "compression" that we achieve.
we have described a simple algorithm that is able to dramatically
reduce download bandwidth requirements over using update sites.
However, we have not described how we achieved the "fail-fast"
requirement. If a flawed update site is ever generated, we want to
detect this automatically at build time rather than after 50,000
clients have already downloaded it and broken their installs.
Here is the algorithm we use:
After a build has completed, we have the following artifacts on our build server:
P(V(n)) - the old version P(V(n+1)) - the original new version generated by PDEBuild F - the function (basically an intelligent binary diff) we generated that we can apply to P(V(n)) to get P(V(n+1))
The approach we take to validate F is the following:
Apply F to P(V(n)) on the build server using the same code that is running on the clients.
Let's call the result of applying F to P(V(n)), P(V(n+1))' (notice the prime there). So formally, F(P(V(n))) = P(V(n+1))'
We need to prove that P(V(n+1))' is the same as P(V(n+1)), the version originally generated by PDEBuild.
do this, we simply unpack P(V(n+1))' and P(V(n+1)) into temporary
directories and perform an MD5 hash on all files, making sure that all
files present in one are also present in the other and that their hashes are equal.
F to P(V(n)) on the build server using the same code as is running on
the clients produces a P(V(n+1))' identical to P(V(n+1)) on the server,
then it should work on the clients as well.
implementation, all update site building on the build server was done
in Ruby. The effort to implement the server side of this algorithm in
Ruby was about three days. The effort to implement the client-side
(fully unit-tested, pair-programmed, one pair programming team working
on it full-time) in Java was about a month. This was due to the hassle
of working with Java input and output streams. However, the result was
that the Java code worked entirely in memory, piping input streams into
output streams and so on until only the final results were written to
disk. Thus, the Java solution would be immune to breakage resulting
from something on the file system being changed in a way we did not