Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [eclipse-mirrors] Re: [carsten@xxxxxxxxx: [ftpsync-eclipse-ftp@ftp] (28382) rsync ERROR on 2010.05.28-09:29:48]

On 05/31/2010 10:03 AM, Carlos Carvalho wrote:
Eclipse Webmaster (Denis Roy) (webmaster@xxxxxxxxxxx) wrote on 31 May 2010 09:11:
 >1. You will occasionally get permissions errors on your RSYNC.  We have 
 >almost 1000 committers that can put files in the download area.  We set 
 >the umasks to allow for world-readable by default, but with 1000 
 >committers, something is bound to happen.  If you can ignore the 
 >permissions errors, I'd suggest you do so.

There are several issues here. First, permission errors can only be
ignored to a limited extent.

Carlos,

While not 100% ideal, many mirrors have been rsync > /dev/null with us for many years.  I don't see why others can't do that either, but I'm sure you will tell me.  If there is a major malfunction with our mirror system, I will notice it quickly enough and fix it.


Second issue, your repository construction is wrong. You should not
allow committers direct access to it, exactly because it's impossible
to make sure permissions (and other things) will be correct. You
should pull commits from a staging area to the master archive. This is
pretty obvious and is what's done by the vast majority of software
distributions...
  
Please be cautious if you're comparing Eclipse.org to Fedora, or other one-project shops where a single dedicated build team produces output for a single (or very few) project(s).  Eclipse is closer in concept to Sourceforge and Apache, where each project does its own little thing.  When you say "you should pull commits", who is "you" exactly?

I had a look at Apache's environment[1], and we seem to be set up similarly.  Their mirror is much, much smaller, however -- max. 40G.  When Eclipse.org was that size we could run fix scripts every 10 minutes.  But despite the similarities, Eclipse.org is conceptually different:

- Most of the 200+ Eclipse projects are related to a single platform project -- Eclipse itself.
- Eclipse.org produces binaries[2] that users can download and install, and for multiple OS platforms, not just source tarballs.  This consumes tons of space.
- You can "download Eclipse" as a product, but you can't really "download Apache" or "download Sourceforge".

Of course, one could conceive a system where the downloads repository is protected, and projects must indirectly submit content to be replicated to it.  Perhaps that's what we need to do.  But there are definite blockers to enabling this, cost/benefit being one of them (inertia is costly to overcome in large organizations).  I've opened a bug against the Eclipse Architecture council to assist in resolving the issue.  But there are many other issues we can resolve before exploring this route.

Third, your cluster is not suitable for mirroring. It is very good for
building but inefficient for mirroring because of the content
duplication in ram. The archive master should be a single machine with
at least 16GB of ram, or the repo should be split among the different
machines.
  
Since February 2010, download.eclipse.org has been residing on a single machine with 64G of RAM.  If you have not seen any difference in mirroring efficiency since, then perhaps the inefficiencies are not as substantial as you claim.


[1] http://apache.org/dev/mirrors.html
[2] http://download.eclipse.org/eclipse/downloads/drops/S-3.6RC3-201005271700/index.php#EclipseSDK

Back to the top