Bug 395777 - Investigate converting basebuilder to Git
Summary: Investigate converting basebuilder to Git
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.3   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 4.3 M4   Edit
Assignee: David Williams CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-04 21:44 EST by David Williams CLA
Modified: 2012-12-13 12:19 EST (History)
3 users (show)

See Also:


Attachments
scripts and output from the conversion (351.30 KB, application/zip)
2012-12-06 17:26 EST, David Williams CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2012-12-04 21:44:54 EST
Its been assumed we'd not move "basebuilder" to Git, since its a large amount of binary data and would not lend itself well to Git. 

But ... since I've been working on bug 395776 anyway, I've been doing some experiments ... and wonder if for safety/security it would make since to convert now ... presumably convert only a subset of past few releases, etc., to safe space. 

I thought I'd open this bug just to discuss and document things I've learned (or, am learning) about its size, etc., if nothing else.
Comment 1 David Williams CLA 2012-12-06 16:41:35 EST
Here's what I have learned so far. 

The CVS repo of basebuilder is about 4G. 

Converting to Git takes it down to about 2G, and removing old tags, etc., takes it down to about 1.5G. That's using the same tools and filters as attached to bug 395776. 

Obviously not something you'd want to work with on a daily basis (especially wouldn't want hundreds of people cloning it, for bandwidth reasons if nothing else), but ... working with it on my local machines was reasonable. Just took about 10 minutes or so to clone (over my wireless connection). 

I think its worth investigating if/how it performs on git.eclipse.org ... see if it can provide a transition over next few months and/or feasible for extra long term use for patch builds of old releases, or something. I'll me be sure to mark is as "experimental and temporary" until we know more. 

Another potential bottle-neck is that (I think) the feasible way to use it from git is to use CGit and simply get an archive of the tagged version we want. That'd be about 100M. If CGit does a good job of caching, that should be ok ... but ... if it doesn't, that'd be another problem. 

Webmasters, adding you to CC so you'll know I'm doing this experimenting. Let me know if over the next few days you notice anything on your end, such as some form of slow-down or inefficiencies.
Comment 2 David Williams CLA 2012-12-06 16:53:04 EST
I meant to add, as point of comparison, the swt.binaries repo is about 0.5G.
Comment 3 David Williams CLA 2012-12-06 17:26:54 EST
Created attachment 224405 [details]
scripts and output from the conversion

scripts are the same as attached to bug 395776, but these are the logs from converting basebuilder. 

To repeat some of what said in bug 395776; things tagged or branch as "temp" or "test" were filtered out, as were most things prior to 2008. I think most major tags from about Eclipse 3.4 forward were kept, but many marked "M1, M2, etc" were removed. I'm sure many other branches/tags could be removed if we decide to keep this around for archival purposes, and presumably (if I understand Git correctly) that would free-up more unnecessary stuff to be garbage collected eventually.
Comment 4 David Williams CLA 2012-12-07 01:29:17 EST
From my initial "tests", to clone the repo over my broadband connection takes about 45 minutes. To clone it on the build machine itself is fast, less than a minute. 

To use cgit to get one tagged version, as we would for a build, takes only about a minute on my broadband and about 5 seconds on the build machine. That one tagged version is about 60M. For example, 

wget --no-verbose -O basebuilder-R38M6PlusRC3D.zip http://git.eclipse.org/c/platform/eclipse.platform.releng.basebuilder.git/snapshot/eclipse.platform.releng.basebuilder-R38M6PlusRC3D.zip 2>&1

Since the times seems reasonable (except to clone the whole thing over broadband) I plan to test this in a build (Friday nightly) as think its feasible for at least a transitional step. Unless anyone objects. The change in build scripts is minimal. 

One thought occurs to me ... I know some repos are cloned to github, but am not sure how that's determined (automatically? or do projects signup?) ... but this would be one that should not be, because of its size, and because not useful or necessary ... its just a "reproducible build" repo, not something for someone to contribute patches for, or fork. Is there a "blacklist" for github?
Comment 5 John Arthorne CLA 2012-12-07 09:04:42 EST
(In reply to comment #4)
> One thought occurs to me ... I know some repos are cloned to github, but am
> not sure how that's determined (automatically? or do projects signup?) ...
> but this would be one that should not be, because of its size, and because
> not useful or necessary ... its just a "reproducible build" repo, not
> something for someone to contribute patches for, or fork. Is there a
> "blacklist" for github?

The repositories to mirror comes from the project metadata in the portal. If we don't want it cloned, we should just not list that repository in our portal metadata. That sounds reasonable because like you said it's not a real source repository and not meaningful for dash statistics, etc. The portal metadata is aggregated into a single list that is consumed by github:

http://eclipse.org/projects/git-repos.php
Comment 6 David Williams CLA 2012-12-07 15:53:28 EST
I've converted the build scripts to use Git version (via CGit) and a test build went ok (The unit tests take a slightly different code path so will let that be tested during nightly build). 

FWIW, I compared fetching 10 tagged versions with both cvs and cgit and if anything cgit was a little faster. Over my broadband, either method took between 2 to 3 minutes. On build.eclipse.org, the cgit method took between about 10 seconds, and the cvs export took about 20 seconds (per tagged version). So that's encouraging. 

I used this sample of tagged versions: 

R38M6PlusRC3D
R38M6PlusRC3C
R36_RC4
R37_M7
R35_RC4
r34x_v20120319
r35x_v20120319
r36x_v20120306
r36x_v20120306
R3_7_maintenance (a branch)

For all, the cvs and git versions were the same except for two minor things I hadn't seen before. The ".cvsignore" file was not transitioned for sub-directories (I know they are for _main_ directories or modules). The other thing, I learned (but makes obvious sense), as "checked out", a few files had "cvs/rcs directives" such as $Id$ and those are literally '$Id' in git checkouts, but "filled in" when checked out from cvs (with filename, date, author). In other words, no significant differences. 

Also confirmed the difference between R38M6PlusRC3D and R38M6PlusRC3C were as expected ... where that is new enough I can recall we changed only the jdt.core plugin. 

So, I'm going to consider this "done for now".
Comment 7 David Williams CLA 2012-12-07 15:57:56 EST
From the Foundation Portal metadata, I removed

/cvsroot/eclipse/org.eclipse.releng.basebuilder

I also removed following, since there isn't such a repo ... not sure if there was and now isn't, or if someone is getting ready to create it? 
/gitroot/platform/eclipse.platform.releng.binaries.git

That leaves these three, in the Portal, under
eclipse.platform.releng
/gitroot/platform/eclipse.platform.releng.eclipsebuilder.git
/gitroot/platform/eclipse.platform.releng.git
/gitroot/platform/eclipse.platform.releng.maps.git

The eclipse.platform section seems accurate for important ones, if some reduncancy: 
/gitroot/platform/eclipse.platform.common.git
/gitroot/platform/eclipse.platform.debug.git
/gitroot/platform/eclipse.platform.git
/gitroot/platform/eclipse.platform.releng.eclipsebuilder.git
/gitroot/platform/eclipse.platform.releng.git
/gitroot/platform/eclipse.platform.resources.git
/gitroot/platform/eclipse.platform.runtime.git
/gitroot/platform/eclipse.platform.swt.git
/gitroot/platform/eclipse.platform.team.git
/gitroot/platform/eclipse.platform.text.git
/gitroot/platform/eclipse.platform.ua.git
/gitroot/platform/eclipse.platform.ui.git

The actual repos in platform section of gitroot are as follows: 
/gitroot/platform
eclipse.platform.common.git
eclipse.platform.debug.git
eclipse.platform.git
eclipse.platform.news.git
eclipse.platform.releng.aggregator.git
eclipse.platform.releng.basebuilder.git
eclipse.platform.releng.buildtools.git
eclipse.platform.releng.eclipsebuilder.git
eclipse.platform.releng.git
eclipse.platform.releng.maps.git
eclipse.platform.resources.git
eclipse.platform.runtime.git
eclipse.platform.swt.binaries.git
eclipse.platform.swt.git
eclipse.platform.team.git
eclipse.platform.text.git
eclipse.platform.ua.git
eclipse.platform.ui.git
Comment 8 David Williams CLA 2012-12-07 23:50:05 EST
For the record, I did have to tweak the getBaseBuilder.xml ant script for Hudson. 

Lesson learned: always use absolute file paths by prefixing nearly all directories/files with ${WORKSPACE} and not assume "current directory". I confirmed it still works for current directory, though, if ${WORKSPACE} is not specified by Hudson environment (technically, Ant's ${basedir}).
Comment 9 David Williams CLA 2012-12-08 11:04:22 EST
One final tweak. In the getBasebuilder.xml script I put in a chmod so 'eclipse' will be executable. I'm not sure if this bit was "lost in translation", or lost during the ant unzip, but in either case needs to be set when "checked out" since, currently, checked out by 'e4Build', but later (for test summary processing) executed by committer id.
Comment 10 Denis Roy CLA 2012-12-11 10:30:13 EST
> FWIW, I compared fetching 10 tagged versions with both cvs and cgit and if
> anything cgit was a little faster. Over my broadband, either method took
> between 2 to 3 minutes. On build.eclipse.org, the cgit method took between
> about 10 seconds, and the cvs export took about 20 seconds (per tagged
> version). So that's encouraging. 

That is definitely good news.  Thanks.
Comment 11 Denis Roy CLA 2012-12-11 10:31:58 EST
cc'ing Thanh.   Thanh, have a quick read of comment 0 through comment 4.  When a specific portion of the repo is needed, the cGit web interface (http://git.eclipse.org/c) provides some nice tools for that.