[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
RE: [provisioning-dev] Improved update generation/distribution algorithm
|
Title: Message
Hi
David:
Mighty interesting!
Would the code be availabel somewhere for
review?
Cordially
--
Cheers
Philippe
http://easyeclipse.org - http://phpeclipse.net - http://eclipse.org/atf
The current
feature-based update sites can get huge quickly when small numbers of changes
are scattered across a large number of features. This kind of situation
happens easily when releasing a dot-level or a patchlevel upgrade--ie: 1.0.0
to 1.0.1--where lots of small bugs are fixed.
The purpose of this email
is to open a dialog and to outline our solution. Apologies in advance
for those who are less comfortable with expressing algorithms in formal
math. Time constraints have dictated communication using the most
compact and efficient means possible.
The algorithm we developed worked
as follows:
The build server maintains copies of all production
versions of the application. In the general case, then, we need an
algorithm to upgrade a given user from a version V(n) to a version V(n+delta)
where delta is the number of versions that have elapsed since the user last
upgraded. We want this algorithm to have the following properties:
- Send a minimal number of bits across the wire
- Be as simple as possible
- Be verifiably correct. We couldn't afford to have 40k-70k of
broken clients if the algorithm malfunctioned.
- Related to the previous: Have safeguards built-in so that if a bad
update site is ever generated, the whole thing fails fast--before *any* bits
are ever sent to clients.
You will notice some conflicts in the
requirements above. In particular, the requirement to send the fewest
number of bits over the wire conflicts with the requirements to be as simple
as possible and to be verifiably correct. The result was that our
algorithm sacrifices some efficiency in the number of bits it sends over the
wire in order to be simpler and more verifiably correct.
The result: In
our application, going from version 1.0.0 to version 1.0.1 using update site
features resulted in about a 10 meg download. After applying our
algorithm, the download size shrunk to about 3 megs. This is still large
for some of our users who utilize a dial-up connection, but it is doable,
whereas the 10 meg download would simply be prohibitive.
Here is the
algorithm:
First, we simplify the problem. Instead of upgrading
users from an arbitrary version V(n) to V(n+delta), we only upgrade them from
V(n) to V(n+1). Going to V(n+delta) is simply an iteration of the
algorithm over delta steps. This simplification works fine for our user
population. If it would be acceptable for the general Eclipse population
is an open question.
Second, we notice that sending JAR'd plugins is
very expensive. Suppose only two classes in a large plugin change: the
whole plugin must still be shipped.
Our algorithm then is simple.
We dynamically generate a function F where F(P(V(n))) = P(V(n+1)). F is
really just a zipped collection of files that encodes just the files that
changed between the two plugin versions.
This is accomplished using the
following algorithm. For a given plugin P in two versions V(n) and
V(n+1):
- Unpack P(V(n)) and P(V(n+1)) into temporary directories.
- For files that correspond to each other in the two plugins, MD5 hash
them, and if the hashes are different, put the file from P(V(n+1)) into F.
- Include new files from P(V(n+1)) in F.
- Create a text file manifest listing files in P(V(n)) that are no longer
included in P(V(n+1)) and include that text file in F.
- Package all of the files in F in a zip file.
I believe that
the algorithm for applying this function F to P(V(n)) to get a P(V(n+1)) is
self-evident.
This approach has advantages over binary-diffing the two
plugin jars P(V(n)) and P(V(n+1)): If the order of the files contained in the
plugin changes between versions, a naive binary diff algorithm will not
achieve nearly the same amount of "compression" that we achieve.
So we
have described a simple algorithm that is able to dramatically reduce download
bandwidth requirements over using update sites. However, we have not
described how we achieved the "fail-fast" requirement. If a flawed
update site is ever generated, we want to detect this automatically at build
time rather than after 50,000 clients have already downloaded it and broken
their installs.
Here is the algorithm we use:
After a build has
completed, we have the following artifacts on our build server:
P(V(n))
- the old version
P(V(n+1)) - the original new version generated by
PDEBuild
F - the function (basically an intelligent binary diff) we
generated that we can apply to P(V(n)) to get P(V(n+1))
The approach we
take to validate F is the following:
- Apply F to P(V(n)) on the build server using the same code that is
running on the clients.
- Let's call the result of applying F to P(V(n)), P(V(n+1))' (notice
the prime there). So formally, F(P(V(n))) = P(V(n+1))'
- We need to prove that P(V(n+1))' is the same as P(V(n+1)), the version
originally generated by PDEBuild.
- To do this, we simply unpack P(V(n+1))' and P(V(n+1)) into temporary
directories and perform an MD5 hash on all files, making sure that all files
present in one are also present in the other and that their hashes are
equal.
- If applying F to P(V(n)) on the build server using the same code as is
running on the clients produces a P(V(n+1))' identical to P(V(n+1)) on the
server, then it should work on the clients as well.
In our
implementation, all update site building on the build server was done in
Ruby. The effort to implement the server side of this algorithm in Ruby
was about three days. The effort to implement the client-side (fully
unit-tested, pair-programmed, one pair programming team working on it
full-time) in Java was about a month. This was due to the hassle of
working with Java input and output streams. However, the result was that
the Java code worked entirely in memory, piping input streams into output
streams and so on until only the final results were written to disk.
Thus, the Java solution would be immune to breakage resulting from something
on the file system being changed in a way we did not expect.
Let me
know if you have any questions.
Regards,
Dave
Orme