Bug 345471 - Granularity of Eclipse and Equinox git repos
Summary: Granularity of Eclipse and Equinox git repos
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 3.7   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.8 M2   Edit
Assignee: Kim Moir CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 345479 349150
  Show dependency tree
 
Reported: 2011-05-11 13:12 EDT by Kim Moir CLA
Modified: 2011-08-10 16:14 EDT (History)
23 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kim Moir CLA 2011-05-11 13:12:40 EDT
In the arch call today, I mentioned the testing that DJ and I were conducting to run the build from git as described in bug 344152.  John remarked that the granularity of the Eclipse/Equinox git repos (currently just mirrored, not migrated) might need to be revisited before the actual migration.  This bug is to discuss and decide on the granularity of the Eclipse and Equinox repositories before we migrate.  Today they are divided into the following repos - platform, pde, jdt, equinox, p2 as you can see on this link.

http://dev.eclipse.org/git/index.html
Comment 1 John Arthorne CLA 2011-05-11 13:19:36 EDT
I think the answer is one repository per component (sub-project). I.e., each unique committer group would have one repo. So we would have repos like "SWT", "Platform UI", "Platform Resources", "JDT Core", "JDT UI", etc.  This is the minimum number of repositories we can possibly have, since multiple committer groups within a single git repository is not feasible.
Comment 2 Pascal Rapicault CLA 2011-05-11 13:43:31 EDT
Following the groups seems to leading to too fine of a granularity to me. I think ppl we are making committers are reasonable enough that they will not go and change the source code they don't know / own.
I would have gone for coarser repo like JDT (including all JDT related things), PDE. For the platform itself it is not clear how to split it.
Comment 3 Boris Bokowski CLA 2011-05-11 14:04:21 EDT
I am with John, at least for the Platform. I wouldn't want to have to clone all of Platform Text, Ant, Debug, Resources, SWT, etc. if I am working on Platform UI.

Also, the "one unix-group, one Git repository" is a simple enough rule of thumb that it could actually work.
Comment 4 David Carver CLA 2011-05-31 15:04:41 EDT
(In reply to comment #3)
> I am with John, at least for the Platform. I wouldn't want to have to clone all
> of Platform Text, Ant, Debug, Resources, SWT, etc. if I am working on Platform
> UI.
> 
> Also, the "one unix-group, one Git repository" is a simple enough rule of thumb
> that it could actually work.

You wouldn't necessarly have to clone those others, if they had git repos, and you could install the necessary bits necessary.

So, I'd be for Pascal's approach, so something like

org.eclipse.jdt.git
org.eclipse.pde.git
Comment 5 Boris Bokowski CLA 2011-05-31 15:16:55 EDT
(In reply to comment #4)
> (In reply to comment #3)
> > I am with John, at least for the Platform. I wouldn't want to have to clone all
> > of Platform Text, Ant, Debug, Resources, SWT, etc. if I am working on Platform
> > UI.
> 
> You wouldn't necessarly have to clone those others, if they had git repos, and
> you could install the necessary bits necessary.

If all of Eclipse Platform was in a single Git repository, a Git clone operation would give me all these other components in source form, including their history. This is way too much, I stand by my opinion that the granularity should not be coarser than the Eclipse project structure and its associated Unix groups.
Comment 6 Paul Webster CLA 2011-06-01 09:55:05 EDT
(In reply to comment #3)
> 
> Also, the "one unix-group, one Git repository" is a simple enough rule of thumb
> that it could actually work.

This also makes perfect sense to me, at least for the Eclipse Project.

PW
Comment 7 Thomas Watson CLA 2011-06-01 09:58:28 EDT
(In reply to comment #6)
> (In reply to comment #3)
> > 
> > Also, the "one unix-group, one Git repository" is a simple enough rule of thumb
> > that it could actually work.
> 
> This also makes perfect sense to me, at least for the Eclipse Project.
> 
> PW

+1 I think this the most simple way forward.  Question, if we decide to go more fine/coarse grain in the future how hard is it to change later?  Is this something we can work through and change during the Juno release, but after that we are pretty much set in stone?
Comment 8 Paul Webster CLA 2011-06-01 10:28:03 EDT
(In reply to comment #7)
> +1 I think this the most simple way forward.  Question, if we decide to go more
> fine/coarse grain in the future how hard is it to change later?  Is this
> something we can work through and change during the Juno release, but after
> that we are pretty much set in stone?

Depends on what you mean by "hard" :-)

With git's ability to re-write history, you can stitch 2 repos together, even changing their directory location within the repo [1] while keeping the commit information generally intact.

But if you need to reproduce Juno builds, you wouldn't be able to move them very far, would you?  Or conversely, you would have to leave an abandoned "big" repo to rebuild parts of Juno *and* move the history with you to the smaller ones so you could find out stuff.

[1] you can take repo1/proj1 and repo2/proj2 and create bigRepo/bundles/proj1,proj2 

PW
Comment 9 Thomas Watson CLA 2011-06-03 10:29:19 EDT
I think the decision we make here has an impact on the solution to bug345670.  If we have more than one project per repo how will Eclipse-BundleSource headers work?

As I understand it you can only clone complete git repositories.  So if a user of PDE imports from repo a single bundle at some specific version then what has to happen?

- The complete repo where that bundle lives has to be cloned and then the single project from the cloned repo needs to be imported into the workspace.

- Now lets imagine the user selects another bundle which exists in the same repository, but it its tagged version is not available in the previously cloned repo.  Would we now need to create another clone of the repo from the necessary commit tag and then import that project?

I'm probably missing something in git.  There probably is a good way to tag things so that this works nicely?
Comment 10 John Arthorne CLA 2011-06-03 11:36:34 EDT
(In reply to comment #9)
> I think the decision we make here has an impact on the solution to bug345670. 
> If we have more than one project per repo how will Eclipse-BundleSource headers
> work?

Yes and no... The Eclipse-BundleSource header for git would need to work for any kind of git repository layout. The solution to that problem shouldn't be tailored to the particular repository layout used by the Eclipse & Equinox projects (this bug). I don't think we should be constraining our project layout to simplify the implementation of Eclipse-BundleSource for Git. 


> Would we now need to create another clone of the repo from the necessary
> commit tag and then import that project?

Projects are imported from a Git clone's working copy. Typically there is only one working copy per clone, and that working copy contains the contents of a single branch/tag. So yes, we would need a separate clone per distinct branch. So this would be somewhat expensive, but I suspect it's also a rare case. For example a case where a user wants one bundle from 3.7 and another bundle from 3.6 in their workspace at the same time. I think we should continue the discussion of how to implement Eclipse-BundleSource for Git in bug 348040.
Comment 11 John Arthorne CLA 2011-06-03 11:37:53 EDT
(In reply to comment #10)
> I think we should continue the
> discussion of how to implement Eclipse-BundleSource for Git in bug 348040.

I meant bug 345670. Hopefully Orion search indexer performance is a totally unrelated problem ;)
Comment 12 Deepak Azad CLA 2011-06-10 00:09:28 EDT
(In reply to comment #3)
> I am with John, at least for the Platform. I wouldn't want to have to clone all
> of Platform Text, Ant, Debug, Resources, SWT, etc. if I am working on Platform
> UI.
> 
> Also, the "one unix-group, one Git repository" is a simple enough rule of thumb
> that it could actually work.

+1. I don't want to clone everything if I am just working on JDT/UI.
Comment 13 Dani Megert CLA 2011-06-10 02:40:37 EDT
+1 to have one Git repo per ACL (Unix group).
Comment 14 John Arthorne CLA 2011-06-10 08:55:34 EDT
Note we will also need another repository for common things: map files and documentation. I think the easiest solution is for the platform, JDT, and PDE doc to all be in this single repository with an appropriate access control list so all committers can write to it.
Comment 15 Jeff McAffer CLA 2011-06-10 09:58:32 EDT
Adding Wayne to this discussion.  Seems like this is the sort of thing that has/will come up in other projects and he might have some perspective.

Similarly, I see Denis is already on the bug.  Any feed back on this from the Webmaster point of view?

For my vote?  I like the "simplest possible" approach of one repo per ACL as that does indeed seem simple.  I wonder what others are doing.
Comment 16 Wayne Beaton CLA 2011-06-10 10:06:52 EDT
(In reply to comment #15)
> Adding Wayne to this discussion.  Seems like this is the sort of thing that
> has/will come up in other projects and he might have some perspective.

FWIW, I've been monitoring it (I even Tweeted about it).

Like many other things, it's a balancing act. I've seen projects set up a single repository for each bundle. That's probably too extreme. Dividing it up along ACL lines is probably as granular as I'd like to see (essentially one repository per Eclipse Project). Even at that level of granularity, I suspect that some of the the repository clones will still be huge.

It may be worth experimenting to see how huge before you make a decision.

You may consider further dividing by functional areas or something, e.g. subsets that people working in particular functional areas need to have.
Comment 17 John Arthorne CLA 2011-06-10 10:15:36 EDT
(In reply to comment #16)
> Even at that level of granularity, I suspect
> that some of the the repository clones will still be huge.
> 
> It may be worth experimenting to see how huge before you make a decision.

Some of our repositories are indeed huge. The Eclipse TLP CVS repository is 15GB. One thing that really bloats our CVS repository is our current practice of checking compiled code into our repository in several cases (compiled native libraries, base builder). We are looking at using a p2 repository for binary artifacts going forward, and omitting all binaries during our CVS->Git export. This should greatly help with keeping the size down. It will still be interesting to see the Git repository sizes before we make any final decision. 

Kim just wanted to get some consensus beforehand, because the migration step is going to be quite complicated and we don't want to change our minds half way through if we can avoid it!
Comment 18 David Carver CLA 2011-06-10 10:28:05 EDT
(In reply to comment #17)
> (In reply to comment #16)
> > Even at that level of granularity, I suspect
> > that some of the the repository clones will still be huge.
> > 
> > It may be worth experimenting to see how huge before you make a decision.
> 
> Some of our repositories are indeed huge. The Eclipse TLP CVS repository is
> 15GB. One thing that really bloats our CVS repository is our current practice
> of checking compiled code into our repository in several cases (compiled native
> libraries, base builder). We are looking at using a p2 repository for binary
> artifacts going forward, and omitting all binaries during our CVS->Git export.
> This should greatly help with keeping the size down. It will still be
> interesting to see the Git repository sizes before we make any final decision. 
> 
> Kim just wanted to get some consensus beforehand, because the migration step is
> going to be quite complicated and we don't want to change our minds half way
> through if we can avoid it!

+1 for stopping the practice of checking in binaries into the source repository.  There are better ways, and p2 repos would be the recommendation, especially if you are not going to use maven and maven.eclipse.org to share artifacts.
Comment 19 Denis Roy CLA 2011-06-10 10:40:42 EDT
> Similarly, I see Denis is already on the bug.  Any feed back on this from the
> Webmaster point of view?

Thanks.  For sure, one repo per unix group cuts down on administrivia and complexity.  The fewer extended ACLs we create, the easier it is on everyone.

(In reply to comment #16)
> repository per Eclipse Project). Even at that level of granularity, I suspect
> that some of the the repository clones will still be huge.

The git mirrors can provide early clues... Projects like AJDT have a long history, and they are correspondingly quite big.  FWIW, I ran an aggressive compaction on the git mirror repos just yesterday.

21M     org.eclipse.actf
430M    org.eclipse.ajdt
500K    org.eclipse.albireo
3.2M    org.eclipse.amalgam
40M     org.eclipse.amp
9.8M    org.eclipse.ant
72K     org.eclipse.apogee
9.0M    org.eclipse.atf
1.6M    org.eclipse.babel
235M    org.eclipse.birt
84K     org.eclipse.blinki
4.7M    org.eclipse.bpel
274M    org.eclipse.cdt
2.9M    org.eclipse.cloudfree
15M     org.eclipse.cobol
3.4M    org.eclipse.compare
24M     org.eclipse.core
2.2M    org.eclipse.corona
316M    org.eclipse.cosmos
136K    org.eclipse.cvs
2.0G    org.eclipse.dash
50M     org.eclipse.datatools
7.8M    org.eclipse.dd
14M     org.eclipse.debug
56M     org.eclipse.dltk
39M     org.eclipse.e4
55M     org.eclipse.ecf
81K     org.eclipse.edt
81K     org.eclipse.egl
276M    org.eclipse.emf
124M    org.eclipse.epf
100M    org.eclipse.epp
863M    org.eclipse.equinox
68M     org.eclipse.equinox.p2
127M    org.eclipse.ercp
16M     org.eclipse.esl
30M     org.eclipse.examples
973K    org.eclipse.fproj
22M     org.eclipse.gef
54M     org.eclipse.gmf
8.0M    org.eclipse.gmp
2.0M    org.eclipse.gmt
17M     org.eclipse.gyrex
9.8M    org.eclipse.help
6.9M    org.eclipse.hibachi
72M     org.eclipse.higgins
225M    org.eclipse.hyades
1.6M    org.eclipse.ide4edu
247M    org.eclipse.jdt
13M     org.eclipse.jface
376K    org.eclipse.jsch
23M     org.eclipse.jwt
3.0M    org.eclipse.ltk
40M     org.eclipse.m2m
42M     org.eclipse.m2t
2.7M    org.eclipse.maynstall
93M     org.eclipse.mdt
27M     org.eclipse.mtj
89M     org.eclipse.mylyn
4.7M    org.eclipse.nab
276K    org.eclipse.ofmp
465M    org.eclipse.ohf
213M    org.eclipse.orbit
15M     org.eclipse.osgi
89M     org.eclipse.pde
37M     org.eclipse.pdt
72K     org.eclipse.pdtincubato
71M     org.eclipse.phoenix
121M    org.eclipse.platform
6.4M    org.eclipse.pmf
246M    org.eclipse.ptp
67M     org.eclipse.rap
1.5G    org.eclipse.releng
81K     org.eclipse.remus
25M     org.eclipse.riena
81K     org.eclipse.sapphire
1.5M    org.eclipse.scripting
55M     org.eclipse.sdk
2.5M    org.eclipse.search
344K    org.eclipse.soc
266M    org.eclipse.swt
24M     org.eclipse.team
2.3M    org.eclipse.test
1.3M    org.eclipse.text
27M     org.eclipse.tigerstripe
24M     org.eclipse.tm
107M    org.eclipse.tmf
3.3M    org.eclipse.tml
7.4M    org.eclipse.tomcat
141M    org.eclipse.tptp
1.8M    org.eclipse.ua
71M     org.eclipse.ui
464K    org.eclipse.uml2
9.9M    org.eclipse.update
2.4M    org.eclipse.vcm
43M     org.eclipse.ve
1.4M    org.eclipse.webdav
403M    org.eclipse.webtools
Comment 20 Kim Moir CLA 2011-06-10 16:13:42 EDT
I looked at the projects in our existing CVS repos and sorted them by unix group to create a first draft of how our git repos might be organized.

http://wiki.eclipse.org/Platform-releng/Git_Migration_Granularity
Comment 21 Thomas Watson CLA 2011-06-15 11:47:00 EDT
(In reply to comment #20)
> I looked at the projects in our existing CVS repos and sorted them by unix
> group to create a first draft of how our git repos might be organized.
> 
> http://wiki.eclipse.org/Platform-releng/Git_Migration_Granularity

Under Framework I see this:

(from compendium)
rt.equinox.framework org.eclipse.osgi.services
rt.equinox.framework org.eclipse.osgi.util

At first I thought this was a mistake because I had always thought these bundles were under the rt.equinox.bundles committer group and would have gone into the Equinox Bundles git repo.  But it appears this is not the case.  So I am just confirming that I think this is fine and we can include the above bundles in the Equinox Framework git repo along with the other projects in the rt.equinox.framework committer group.
Comment 22 Kim Moir CLA 2011-06-20 13:52:16 EDT
Last night, I looked at the list and decided that this would be the initial list of repositories that we required

/gitroot/jdt/eclipse.jdt.core.git
/gitroot/jdt/eclipse.jdt.debug.git
/gitroot/jdt/eclipse.jdt.ui.git
/gitroot/jdt/eclipse.jdt.git
/gitroot/platform/eclipse.platform.git
/gitroot/platform/eclipse.platform.debug.git
/gitroot/platform/eclipse.platform.releng.git
/gitroot/platform/eclipse.platform.resources.git
/gitroot/platform/eclipse.platform.runtime.git
/gitroot/platform/eclipse.platform.swt.git
/gitroot/platform/eclipse.platform.team.git
/gitroot/platform/eclipse.platform.text.git
/gitroot/platform/eclipse.platform.ua.git
/gitroot/platform/eclipse.platform.ui.git
/gitroot/pde/eclipse.pde.git
/gitroot/pde/eclipse.pde.build.git
/gitroot/pde/eclipse.pde.ui.git
/gitroot/pde/eclipse.pde.incubator.git

/gitroot/equinox/rt.equinox.bundles.git
/gitroot/equinox/rt.equinox.framework.git
/gitroot/equinox/rt.equinox.p2.git
/gitroot/equinox/rt.equinox.incubator.git
/gitroot/equinox/rt.equinox.security.git

Paul, John and I had some hallway conversations about this.  Paul had some concerns about the one repo per unix group approach so I'll let him update this bug with his proposal.
Comment 23 Wayne Beaton CLA 2011-06-20 14:04:39 EDT
(In reply to comment #22)

> Paul, John and I had some hallway conversations about this.  Paul had some
> concerns about the one repo per unix group approach so I'll let him update this
> bug with his proposal.

Finer-grained repos make sense to me. We cannot, however, split a repo across UNIX groups.
Comment 24 Paul Webster CLA 2011-06-20 14:31:10 EDT
I fully support encapsulated any give repo within one unix group :-)

In looking through eclipse.platform.ui, it seems we have more than one buildable unit.

1. jface, core commands, and databinding
2. workbench and ui (rest of RCP)
3. ide and ide support (application, etc, based on core.resources)
4. Eclipse 4 stuff, which depends on EMF

I can put them all into 1 repo, eclipse.platform.ui.git (my initial tests place the .git repo at about 85M), but it might make sense to put the projects in 3 repos:

1) jface+commands+databinding
2) workbench+ide
3) Eclipse 4

Managing this by unix group could lead to:

#Option 1
/gitroot/platform/eclipse.platform.ui/org.eclipse.jface.git
/gitroot/platform/eclipse.platform.ui/org.eclipse.ui.git
/gitroot/platform/eclipse.platform.ui/org.eclipse.eclipse4.git  <- named still TBD

or a unix-group like container at the gitroot:

#Option 2
/gitroot/eclipse.platform.ui/org.eclipse.jface.git
/gitroot/eclipse.platform.ui/org.eclipse.ui.git
/gitroot/eclipse.platform.ui/org.eclipse.eclipse4.git  <- named still TBD

This could still fit into the proposal in comment #22 as:

#Option 3
/gitroot/platform/org.eclipse.jface.git
/gitroot/platform/org.eclipse.ui.git
/gitroot/platform/org.eclipse.eclipse4.git  <- named still TBD

The difference between Option 1&2 and Option 3 is that growing new repos in option 1&2 can be done by the developers, similar to how we already manage other git repos (like /gitroot/e4).  In option 3 all 3 repos had to be created by a webmaster so they have the correct unix group permission.  If we need another repo, we'll have to submit a bug (I'm not saying it's a bad thing, but different from how we manage e4 for example).

PW
Comment 25 John Arthorne CLA 2011-06-20 15:05:39 EDT
(In reply to comment #24)
> I can put them all into 1 repo, eclipse.platform.ui.git (my initial tests place
> the .git repo at about 85M), but it might make sense to put the projects in 3
> repos:
> 
> 1) jface+commands+databinding
> 2) workbench+ide
> 3) Eclipse 4

The advantage of more repositories is that someone who knows they only want to work on a subset can checkout less stuff. On the other hand, my fairly short experience is that working with multiple git repositories can be a real pain. For example if you have a change that spans multiple repositories you have to do the branch/fetch/merge/commit/push dance for each repository separately. You don't have a single atomic commit that can be merged/cherry-picked across remotes in a single step, etc. In Orion we have two repositories, but after six months I kinda wished we had only created one because of all the extra workflow steps introduced by having two repositories. Maybe this is one of the reasons other big projects like the Linux kernel use a single Git repository. Considering that someone working on Platform UI might also need to clone SWT, Runtime, Equinox, and possibly others, we should try to avoid increasing the number of clones unnecessarily.

In the end it is the Platform UI committers that will feel this pain though, so we can do whatever you guys want. Maybe bring it up at your next Platform UI planning call?
Comment 26 James Blackburn CLA 2011-06-20 15:36:47 EDT
(In reply to comment #25)
> On the other hand, my fairly short
> experience is that working with multiple git repositories can be a real pain.

Agreed.  I've spent a couple years with in CDT clone at the project level, and I ended up manipulating them using some crafted bash to run cgit over all of them at once.  It's much easier with fewer repos...

The other thing to bear in mind is that if you decide at a later date that you want to split out some content from the main repository, this can be easily done in git.  It's more painful to recombine a number of repos.

85M doesn't sound too bad -- this is the started size of the CDT repo too.
Comment 27 Paul Webster CLA 2011-06-20 16:37:06 EDT
(In reply to comment #25)
> Considering that someone working on Platform UI might also need to clone SWT,
> Runtime, Equinox, and possibly others, we should try to avoid increasing the
> number of clones unnecessarily.

OK, that's a fairly convincing argument as well :-)

I'll bring the discussion up at our Platform UI call, but now I'm leaning towards one eclipse.platform.ui.git repo.

PW
Comment 28 Kim Moir CLA 2011-06-20 21:58:01 EDT
I opened bug 349891 to create the directories for the git repos for Eclipse and Equinox so we can do a full test build.  We can sort out the platform ui repo after they discuss it in their planning call.
Comment 29 Paul Webster CLA 2011-06-21 11:37:40 EDT
(In reply to comment #28)
> I opened bug 349891 to create the directories for the git repos for Eclipse and
> Equinox so we can do a full test build.  We can sort out the platform ui repo
> after they discuss it in their planning call.

Platform UI will just follow the convention:
/gitroot/platform/eclipse.platform.ui.git


PW
Comment 30 Kim Moir CLA 2011-08-10 16:14:04 EDT
I think this bug can be closed. We're making progress with the git migration with eight of 25 git repos transitioned so far.