Community
Participate
Working Groups
Its been assumed we'd not move "basebuilder" to Git, since its a large amount of binary data and would not lend itself well to Git. But ... since I've been working on bug 395776 anyway, I've been doing some experiments ... and wonder if for safety/security it would make since to convert now ... presumably convert only a subset of past few releases, etc., to safe space. I thought I'd open this bug just to discuss and document things I've learned (or, am learning) about its size, etc., if nothing else.
Here's what I have learned so far. The CVS repo of basebuilder is about 4G. Converting to Git takes it down to about 2G, and removing old tags, etc., takes it down to about 1.5G. That's using the same tools and filters as attached to bug 395776. Obviously not something you'd want to work with on a daily basis (especially wouldn't want hundreds of people cloning it, for bandwidth reasons if nothing else), but ... working with it on my local machines was reasonable. Just took about 10 minutes or so to clone (over my wireless connection). I think its worth investigating if/how it performs on git.eclipse.org ... see if it can provide a transition over next few months and/or feasible for extra long term use for patch builds of old releases, or something. I'll me be sure to mark is as "experimental and temporary" until we know more. Another potential bottle-neck is that (I think) the feasible way to use it from git is to use CGit and simply get an archive of the tagged version we want. That'd be about 100M. If CGit does a good job of caching, that should be ok ... but ... if it doesn't, that'd be another problem. Webmasters, adding you to CC so you'll know I'm doing this experimenting. Let me know if over the next few days you notice anything on your end, such as some form of slow-down or inefficiencies.
I meant to add, as point of comparison, the swt.binaries repo is about 0.5G.
Created attachment 224405 [details] scripts and output from the conversion scripts are the same as attached to bug 395776, but these are the logs from converting basebuilder. To repeat some of what said in bug 395776; things tagged or branch as "temp" or "test" were filtered out, as were most things prior to 2008. I think most major tags from about Eclipse 3.4 forward were kept, but many marked "M1, M2, etc" were removed. I'm sure many other branches/tags could be removed if we decide to keep this around for archival purposes, and presumably (if I understand Git correctly) that would free-up more unnecessary stuff to be garbage collected eventually.
From my initial "tests", to clone the repo over my broadband connection takes about 45 minutes. To clone it on the build machine itself is fast, less than a minute. To use cgit to get one tagged version, as we would for a build, takes only about a minute on my broadband and about 5 seconds on the build machine. That one tagged version is about 60M. For example, wget --no-verbose -O basebuilder-R38M6PlusRC3D.zip http://git.eclipse.org/c/platform/eclipse.platform.releng.basebuilder.git/snapshot/eclipse.platform.releng.basebuilder-R38M6PlusRC3D.zip 2>&1 Since the times seems reasonable (except to clone the whole thing over broadband) I plan to test this in a build (Friday nightly) as think its feasible for at least a transitional step. Unless anyone objects. The change in build scripts is minimal. One thought occurs to me ... I know some repos are cloned to github, but am not sure how that's determined (automatically? or do projects signup?) ... but this would be one that should not be, because of its size, and because not useful or necessary ... its just a "reproducible build" repo, not something for someone to contribute patches for, or fork. Is there a "blacklist" for github?
(In reply to comment #4) > One thought occurs to me ... I know some repos are cloned to github, but am > not sure how that's determined (automatically? or do projects signup?) ... > but this would be one that should not be, because of its size, and because > not useful or necessary ... its just a "reproducible build" repo, not > something for someone to contribute patches for, or fork. Is there a > "blacklist" for github? The repositories to mirror comes from the project metadata in the portal. If we don't want it cloned, we should just not list that repository in our portal metadata. That sounds reasonable because like you said it's not a real source repository and not meaningful for dash statistics, etc. The portal metadata is aggregated into a single list that is consumed by github: http://eclipse.org/projects/git-repos.php
I've converted the build scripts to use Git version (via CGit) and a test build went ok (The unit tests take a slightly different code path so will let that be tested during nightly build). FWIW, I compared fetching 10 tagged versions with both cvs and cgit and if anything cgit was a little faster. Over my broadband, either method took between 2 to 3 minutes. On build.eclipse.org, the cgit method took between about 10 seconds, and the cvs export took about 20 seconds (per tagged version). So that's encouraging. I used this sample of tagged versions: R38M6PlusRC3D R38M6PlusRC3C R36_RC4 R37_M7 R35_RC4 r34x_v20120319 r35x_v20120319 r36x_v20120306 r36x_v20120306 R3_7_maintenance (a branch) For all, the cvs and git versions were the same except for two minor things I hadn't seen before. The ".cvsignore" file was not transitioned for sub-directories (I know they are for _main_ directories or modules). The other thing, I learned (but makes obvious sense), as "checked out", a few files had "cvs/rcs directives" such as $Id$ and those are literally '$Id' in git checkouts, but "filled in" when checked out from cvs (with filename, date, author). In other words, no significant differences. Also confirmed the difference between R38M6PlusRC3D and R38M6PlusRC3C were as expected ... where that is new enough I can recall we changed only the jdt.core plugin. So, I'm going to consider this "done for now".
From the Foundation Portal metadata, I removed /cvsroot/eclipse/org.eclipse.releng.basebuilder I also removed following, since there isn't such a repo ... not sure if there was and now isn't, or if someone is getting ready to create it? /gitroot/platform/eclipse.platform.releng.binaries.git That leaves these three, in the Portal, under eclipse.platform.releng /gitroot/platform/eclipse.platform.releng.eclipsebuilder.git /gitroot/platform/eclipse.platform.releng.git /gitroot/platform/eclipse.platform.releng.maps.git The eclipse.platform section seems accurate for important ones, if some reduncancy: /gitroot/platform/eclipse.platform.common.git /gitroot/platform/eclipse.platform.debug.git /gitroot/platform/eclipse.platform.git /gitroot/platform/eclipse.platform.releng.eclipsebuilder.git /gitroot/platform/eclipse.platform.releng.git /gitroot/platform/eclipse.platform.resources.git /gitroot/platform/eclipse.platform.runtime.git /gitroot/platform/eclipse.platform.swt.git /gitroot/platform/eclipse.platform.team.git /gitroot/platform/eclipse.platform.text.git /gitroot/platform/eclipse.platform.ua.git /gitroot/platform/eclipse.platform.ui.git The actual repos in platform section of gitroot are as follows: /gitroot/platform eclipse.platform.common.git eclipse.platform.debug.git eclipse.platform.git eclipse.platform.news.git eclipse.platform.releng.aggregator.git eclipse.platform.releng.basebuilder.git eclipse.platform.releng.buildtools.git eclipse.platform.releng.eclipsebuilder.git eclipse.platform.releng.git eclipse.platform.releng.maps.git eclipse.platform.resources.git eclipse.platform.runtime.git eclipse.platform.swt.binaries.git eclipse.platform.swt.git eclipse.platform.team.git eclipse.platform.text.git eclipse.platform.ua.git eclipse.platform.ui.git
For the record, I did have to tweak the getBaseBuilder.xml ant script for Hudson. Lesson learned: always use absolute file paths by prefixing nearly all directories/files with ${WORKSPACE} and not assume "current directory". I confirmed it still works for current directory, though, if ${WORKSPACE} is not specified by Hudson environment (technically, Ant's ${basedir}).
One final tweak. In the getBasebuilder.xml script I put in a chmod so 'eclipse' will be executable. I'm not sure if this bit was "lost in translation", or lost during the ant unzip, but in either case needs to be set when "checked out" since, currently, checked out by 'e4Build', but later (for test summary processing) executed by committer id.
> FWIW, I compared fetching 10 tagged versions with both cvs and cgit and if > anything cgit was a little faster. Over my broadband, either method took > between 2 to 3 minutes. On build.eclipse.org, the cgit method took between > about 10 seconds, and the cvs export took about 20 seconds (per tagged > version). So that's encouraging. That is definitely good news. Thanks.
cc'ing Thanh. Thanh, have a quick read of comment 0 through comment 4. When a specific portion of the repo is needed, the cGit web interface (http://git.eclipse.org/c) provides some nice tools for that.