Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: AW: AW: [provisioning-dev]Improved updategeneration/distributionalgorithm

Pascal,
 
The delta mechanism is independent of the transport layer. But we currently only use HTTP/S.
 
Stefan

________________________________

Von: provisioning-dev-bounces@xxxxxxxxxxx im Auftrag von Pascal Rapicault
Gesendet: Mi 02.05.2007 20:51
An: Developer discussions for provisioning (Update Manager) initiatives
Cc: provisioning-dev@xxxxxxxxxxx; provisioning-dev-bounces@xxxxxxxxxxx
Betreff: Re: AW: AW: [provisioning-dev]Improved updategeneration/distributionalgorithm



Stefan,

Does your delta mechanism only works with http/https or is it transport
agnostic?
Thx

PaScaL




                                                                          
             "Liebig, Stefan "                                            
             <Stefan.Liebig@co                                            
             mpeople.de>                                                To
             Sent by:                  <provisioning-dev@xxxxxxxxxxx>     
             provisioning-dev-                                          cc
             bounces@eclipse.o                                            
             rg                                                    Subject
                                       AW: AW: [provisioning-dev] Improved
                                       updategeneration/distributionalgori
             05/02/2007 05:23          thm                                
             AM                                                           
                                                                          
                                                                          
             Please respond to                                            
                "Developer                                                
              discussions for                                             
               provisioning                                               
                 \(Update                                                 
                 Manager\)                                                
               initiatives"                                               
             <provisioning-dev                                            
               @eclipse.org>                                              
                                                                          
                                                                          




Dave,

To be more specific I took the sources of junit 3.8.1. Created an
Eclipse project for it and let Eclipse create the jar(s) for me.

´Original´ size of the jar: 119.959 byte

Than I made some modifications and generated the deltas
between the jars:

- fixed an unneccessary cast in junit.swingui.TestTreeModel
  Size of changed TestTreeModel.class: 4959 byte
  Resulting delta to original: 1534 byte

- added a simple bean with 4 properties (setters/getters)
  Jar size: 120.604
  Bean.class size: 1204
  Resulting delta to original: 1173 byte

- changed the name of a interface method from
  junit.framework.Test.run() to junit.framework.Test.lauf()
  Jar size: 120.020 byte
  this affects 7 classes
  Jar size: 120.020 byte
  Resulting delta to original: 7889 byte

We transmit (download) each delta of a *changed* resource with a single
HTTP/S request/response. The above deltas are the raw data. You have to
add some bytes for the http protocol stuff and a few bytes for some type
information, so that the client can distinguish between different patch
types.
If the server has been requested for a delta (patch), but the server is no
longer able to generate the delta because the original version is no longer

available (removed by an admin to save space,..) the server responses with
a compressed version of the latest version. This type information tells the

client how to handle the ´patch´ (not neccessarily a delta!).

I hope that this information is helpful.

Stefan


Von: provisioning-dev-bounces@xxxxxxxxxxx im Auftrag von David J. Orme
Gesendet: Fr 27.04.2007 17:43
An: Developer discussions for provisioning
Betreff: Re: AW: [provisioning-dev] Improved
updategeneration/distributionalgorithm

Thanks for the data.  What I was hoping you could do was give specific data
on the mailing list about a delta you ran: the size of the application;
version numbers, size of the download, and so on.  It's probably more
important to know the type of upgrade (a minor version, a patch, etc) and
the size of the code base than exact numbers, although exact numbers are
useful too.

We have one use-case; you have another.  The more use-cases we can discuss
on the list, the better the solution we eventually evolve is likely to be.

BTW, in the abstract, I *really* like your algorithm.  It sounds simpler
than ours and as you point out it can deal with upgrading JREs, not just
plugins/bundles.


Regards,

Dave Orme
----- Original Message -----
From: Stefan Liebig <Stefan.Liebig@xxxxxxxxxxxx>
To: provisioning-dev@xxxxxxxxxxx
Sent: Friday, April 27, 2007 0:27:41 AM GMT-0800
Subject: AW: [provisioning-dev] Improved update
generation/distributionalgorithm

Well, that depends on the changes you make between the two versions of a
resource.
As we are able to update complete JREs, we have measured a patch data size
of about 24 MB
when updating from from JRE 1.5.0.10 (~70 MB) to JRE 1.6.0 (~87 MB).

I can ´calculate´ the patch size between two jars for you. You may send
them to me or
you choose some ´well known´ jars which are publicly available.

Stefan

Von: provisioning-dev-bounces@xxxxxxxxxxx im Auftrag von David J. Orme
Gesendet: Do 26.04.2007 17:39
An: Developer discussions for provisioning
Betreff: Re: [provisioning-dev] Improved update
generation/distributionalgorithm

This is really interesting too.  What kind of space reduction measurements
have you gotten?


Regards,

Dave
----- Original Message -----
From: Stefan Liebig <Stefan.Liebig@xxxxxxxxxxxx>
To: provisioning-dev@xxxxxxxxxxx
Sent: Thursday, April 26, 2007 3:41:50 AM GMT-0800
Subject: Re: [provisioning-dev] Improved update generation/distribution
algorithm

I think the approach taken by us (
http://wiki.eclipse.org/index.php/GPW_Lightning_Retrospectives#Smartup_software_update
) is somehow similar. At least we have the same requirements. Would be nice
if we could join our efforts. For that I will give you a brief outline of
our mechanism/algorithm.
First our alg. updates resources from V(n) to V(n+delta) with delta>0 in a
single step. Resources might be jars, zips, dlls, docs, ...
Since our solution is not (yet) eclipse based we do not have plugin jars
like eclipse has. But this is not a general restriction for our mechanism.

Our mechanism is based on delta encodings (
http://en.wikipedia.org/wiki/Delta_encoding).
In our particular case we have decided to use the xdelta alg. (
http://en.wikipedia.org/wiki/Xdelta) which is pretty good usable for many
types of data. However, it performs very bad when ´diffing´ compressed data
like zips and jars.
Because of that we unpack zips (in memory) into somthing similar to a tar
and perfom the ´binary diffing´ on the different version of these ´tars´.
The resulting delta (or patch) is then zipped (it really gets smaller!) and
send of the wire. ´Applying´ this delta on the ´old´ version creates the
´new´ version of the ´tared´ resource. The resulting version is then
converted back into a zip.
For this mechanism to work properly it required that versions of resources
can be uniquely identified. For this identification we use a combination of
the resource name and the MD5 hash of the resource. The resource name is
similar to the artifactId in mavens project.xml, i.e. the name of the
resource without any version specific parts in it.

The simplified update flow is:
- the client gets somehow the information that a resource should be updated
to a newer one
- therefore it asks the update server for the delta between the old and the
new version
- the update server generates the delta or has it already generated and
cached and sends it back
- the client applies this delta to old version and gets the new version
This happens for all required resources.

Our update server is an active server component realized as a servlet. It
manages all resources in their different versions. The server is capable of
generating deltas in advance and keeping them in a cache for faster
response times. If a requested delta is not available it generates them on
demand.
In case that a delta is requested for a ´very´ old version of a resource
which is no longer available on the server (has been removed by an
administrator) the server will respond with a zipped version of the
complete resource. Every delta/patch send to the client has a type that
tells the client how to handle the ´delta´.
The situation that the server does not have a base (old) version for
generating a delta may also be the result of the manipulation of a resource
on the client side. The identification for this manipulated resource
(resource name + md5 hash) will most likely not reference a resource on the
server. This case will also result in returning a complete (zipped)
resource.

As I already mentioned it seems that Dave and we have similar requirements
but we solve them differently. The challenge will be to provide with the
new provisioning a base where different mechanisms can be plugged in.
There might also be a generalized delta layer in which Daves and our
solution could be plugged in.

Comments, questions?

Stefan

Von: provisioning-dev-bounces@xxxxxxxxxxx im Auftrag von David J. Orme
Gesendet: Do 26.04.2007 00:22
An: provisioning-dev@xxxxxxxxxxx
Betreff: [provisioning-dev] Improved update generation/distribution
algorithm

The current feature-based update sites can get huge quickly when small
numbers of changes are scattered across a large number of features.  This
kind of situation happens easily when releasing a dot-level or a patchlevel
upgrade--ie: 1.0.0 to 1.0.1--where lots of small bugs are fixed.

The purpose of this email is to open a dialog and to outline our solution.
Apologies in advance for those who are less comfortable with expressing
algorithms in formal math.  Time constraints have dictated communication
using the most compact and efficient means possible.

The algorithm we developed worked as follows:

The build server maintains copies of all production versions of the
application.  In the general case, then, we need an algorithm to upgrade a
given user from a version V(n) to a version V(n+delta) where delta is the
number of versions that have elapsed since the user last upgraded.  We want
this algorithm to have the following properties:
      Send a minimal number of bits across the wire
      Be as simple as possible
      Be verifiably correct.  We couldn't afford to have 40k-70k of broken
      clients if the algorithm malfunctioned.
      Related to the previous: Have safeguards built-in so that if a bad
      update site is ever generated, the whole thing fails fast--before
      *any* bits are ever sent to clients.
You will notice some conflicts in the requirements above.  In particular,
the requirement to send the fewest number of bits over the wire conflicts
with the requirements to be as simple as possible and to be verifiably
correct.  The result was that our algorithm sacrifices some efficiency in
the number of bits it sends over the wire in order to be simpler and more
verifiably correct.

The result: In our application, going from version 1.0.0 to version 1.0.1
using update site features resulted in about a 10 meg download.  After
applying our algorithm, the download size shrunk to about 3 megs.  This is
still large for some of our users who utilize a dial-up connection, but it
is doable, whereas the 10 meg download would simply be prohibitive.

Here is the algorithm:

First, we simplify the problem.  Instead of upgrading users from an
arbitrary version V(n) to V(n+delta), we only upgrade them from V(n) to
V(n+1).  Going to V(n+delta) is simply an iteration of the algorithm over
delta steps.  This simplification works fine for our user population.  If
it would be acceptable for the general Eclipse population is an open
question.

Second, we notice that sending JAR'd plugins is very expensive.  Suppose
only two classes in a large plugin change: the whole plugin must still be
shipped.

Our algorithm then is simple.  We dynamically generate a function F where
F(P(V(n))) = P(V(n+1)).  F is really just a zipped collection of files that
encodes just the files that changed between the two plugin versions.

This is accomplished using the following algorithm.  For a given plugin P
in two versions V(n) and V(n+1):
   1. Unpack P(V(n)) and P(V(n+1)) into temporary directories.
   2. For files that correspond to each other in the two plugins, MD5 hash
      them, and if the hashes are different, put the file from P(V(n+1))
      into F.
   3. Include new files from P(V(n+1)) in F.
   4. Create a text file manifest listing files in P(V(n)) that are no
      longer included in P(V(n+1)) and include that text file in F.
   5. Package all of the files in F in a zip file.
I believe that the algorithm for applying this function F to P(V(n)) to get
a P(V(n+1)) is self-evident.

This approach has advantages over binary-diffing the two plugin jars
P(V(n)) and P(V(n+1)): If the order of the files contained in the plugin
changes between versions, a naive binary diff algorithm will not achieve
nearly the same amount of "compression" that we achieve.

So we have described a simple algorithm that is able to dramatically reduce
download bandwidth requirements over using update sites.  However, we have
not described how we achieved the "fail-fast" requirement.  If a flawed
update site is ever generated, we want to detect this automatically at
build time rather than after 50,000 clients have already downloaded it and
broken their installs.

Here is the algorithm we use:

After a build has completed, we have the following artifacts on our build
server:

P(V(n)) - the old version
P(V(n+1)) - the original new version generated by PDEBuild
F - the function (basically an intelligent binary diff) we generated that
we can apply to P(V(n)) to get P(V(n+1))

The approach we take to validate F is the following:
   1. Apply F to P(V(n)) on the build server using the same code that is
      running on the clients.
   2. Let's call the result of applying F to P(V(n)), P(V(n+1))'  (notice
      the prime there).  So formally, F(P(V(n))) = P(V(n+1))'
   3. We need to prove that P(V(n+1))' is the same as P(V(n+1)), the
      version originally generated by PDEBuild.
   4. To do this, we simply unpack P(V(n+1))' and P(V(n+1)) into temporary
      directories and perform an MD5 hash on all files, making sure that
      all files present in one are also present in the other and that their
      hashes are equal.
   5. If applying F to P(V(n)) on the build server using the same code as
      is running on the clients produces a P(V(n+1))' identical to
      P(V(n+1)) on the server, then it should work on the clients as well.
In our implementation, all update site building on the build server was
done in Ruby.  The effort to implement the server side of this algorithm in
Ruby was about three days.  The effort to implement the client-side (fully
unit-tested, pair-programmed, one pair programming team working on it
full-time) in Java was about a month.  This was due to the hassle of
working with Java input and output streams.  However, the result was that
the Java code worked entirely in memory, piping input streams into output
streams and so on until only the final results were written to disk.  Thus,
the Java solution would be immune to breakage resulting from something on
the file system being changed in a way we did not expect.

Let me know if you have any questions.


Regards,

Dave Orme
_______________________________________________
provisioning-dev mailing list
provisioning-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/provisioning-dev


_______________________________________________
provisioning-dev mailing list
provisioning-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/provisioning-dev


<<winmail.dat>>


Back to the top