Bug 354181 - migrate jdt.core to git
Summary: migrate jdt.core to git
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.8   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.8 M3   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 345479
  Show dependency tree
 
Reported: 2011-08-08 16:10 EDT by Kim Moir CLA
Modified: 2011-10-24 01:17 EDT (History)
10 users (show)

See Also:


Attachments
list of jdt.core committers past and present for git repository (1.21 KB, text/plain)
2011-08-08 16:12 EDT, Kim Moir CLA
no flags Details
list of jdt.core cvs tags/branches that are bogus in git (936.92 KB, text/plain)
2011-08-09 11:16 EDT, Kim Moir CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kim Moir CLA 2011-08-08 16:10:06 EDT
This is the bug that will track the testing and migration of the jdt.core bundles from CVS to git.
Comment 1 Kim Moir CLA 2011-08-08 16:12:28 EDT
Created attachment 201102 [details]
list of jdt.core committers past and present for git repository

list of jdt.core committers I scraped from cvs files and eclipse ldap.  Let me know if there are any corrections required.  This list is used by the cvs2git.options file so previous commits are properly attributed in the new git repository.
Comment 2 Jay Arthanareeswaran CLA 2011-08-09 00:41:47 EDT
(In reply to comment #1)
> Created attachment 201102 [details]
> list of jdt.core committers past and present for git repository
> 
> list of jdt.core committers I scraped from cvs files and eclipse ldap.  Let me
> know if there are any corrections required.  

The list looks alright to me.
Comment 3 Dani Megert CLA 2011-08-09 02:26:53 EDT
(In reply to comment #2)
> (In reply to comment #1)
> > Created attachment 201102 [details] [details]
> > list of jdt.core committers past and present for git repository
> > 
> > list of jdt.core committers I scraped from cvs files and eclipse ldap.  Let me
> > know if there are any corrections required.  
> 
> The list looks alright to me.

I'm fine if we transfer inactive owners but access must only be granted to the following active committers:

oliviert ssankaran pmulet jeromel daudel kent ffusier ajain jarthanaree skandula wharley sherrmann kmoir mkeller
Comment 4 Kim Moir CLA 2011-08-09 09:53:03 EDT
git fast import is failing with 

kmoir@build:~/migrationtestjdtcore/eclipse.jdt.core> cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
fatal: Branch name doesn't conform to GIT standards: refs/tags/v_0_128_01_(1*0_stream/candidate_129)
fast-import: dumping crash report to .git/fast_import_crash_4077

Perhaps we should ask the webmaster to help us rename the branch in CVS?
Comment 5 Kim Moir CLA 2011-08-09 11:15:18 EDT
There are characters that aren't legal in git branches that appear in existing jdt.core branches

http://www.kernel.org/pub/software/scm/git/docs/git-check-ref-format.html

I'll attach a list of the problematic tags. Not sure if it is a problem to delete these tags and branches in CVS - do we still need them?  This has to be done to allow the git migration to proceed.
Comment 6 Kim Moir CLA 2011-08-09 11:16:28 EDT
Created attachment 201151 [details]
list of jdt.core cvs tags/branches that are bogus in git
Comment 7 Dani Megert CLA 2011-08-09 11:20:07 EDT
I would expect that the migration scripts can take care of this, e.g. by replacing the offending chars with another one.
Comment 8 Jay Arthanareeswaran CLA 2011-08-09 11:27:42 EDT
Looks like we have issues with multiple chars, '*', ':' etc.

Did we face this problem in other projects too? And if so, what was the approach taken?
Comment 9 Kim Moir CLA 2011-08-09 11:36:00 EDT
No, I haven't run into this problem when running test migrations for other projects. Perhaps jdt.core just used this convention at some point.

Dani, I just found some options in the cvs2git.options file to map regular expressions to new values when doing the migration.  I'll try mapping the illegal "*" to "_".  However, this means I have to rerun the cvs2git export again which takes about 70 minutes, very time consuming :-(
Comment 10 Kim Moir CLA 2011-08-11 10:12:15 EDT
Your test git repo is ready

ssh://userid@git.eclipse.org/gitroot/jdt/eclipse.jdt.core.git 

It's pretty large 
kmoir@build:/gitroot/jdt> du -sh eclipse.jdt.core.git/
929M    eclipse.jdt.core.git/

Let me know if you have any issues and when you'd like to schedule the real migration.  Commits will continue in CVS until we schedule a date for the actual migration.
Comment 11 Deepak Azad CLA 2011-08-11 10:32:26 EDT
(In reply to comment #10)
> It's pretty large 
> kmoir@build:/gitroot/jdt> du -sh eclipse.jdt.core.git/
> 929M    eclipse.jdt.core.git/

jeez, that is large! bug 345471 comment 19 says the size of org.eclipse.jdt was 247M, I wonder what happened here.

I am curious to know how large are some of the other git repositories. I also wonder if it would make sense to spit the jdt core repo in two e.g. separate the tests in another repo.
Comment 12 Dani Megert CLA 2011-08-11 10:34:12 EDT
(In reply to comment #11)
> (In reply to comment #10)
> > It's pretty large 
> > kmoir@build:/gitroot/jdt> du -sh eclipse.jdt.core.git/
> > 929M    eclipse.jdt.core.git/
> 
> jeez, that is large! bug 345471 comment 19 says the size of org.eclipse.jdt was
> 247M, I wonder what happened here.
That was maybe the feature repo.


> I am curious to know how large are some of the other git repositories. I also
> wonder if it would make sense to spit the jdt core repo in two e.g. separate
> the tests in another repo.
-1. We decided to split along the ACLs. We should keep it that way.
Comment 13 Denis Roy CLA 2011-08-11 10:37:45 EDT
Have you tried running git gc --aggressive on it?
Comment 14 Kim Moir CLA 2011-08-11 10:58:32 EDT
>>Have you tried running git gc --aggressive on it?

Yes, this is with git gc --aggressive 

(which took an hour to run :-)

jdt.core has a numbers of large workspaces in binary that they use for testing.  They are are many copies in the repository.
Comment 15 Paul Webster CLA 2011-08-11 13:03:47 EDT
(In reply to comment #10)
> Your test git repo is ready
> 
> ssh://userid@git.eclipse.org/gitroot/jdt/eclipse.jdt.core.git 

Kim, we discovered you need to remove the "core" entry from the .gitignore file.

See bug 354429, bug 354448, etc.

> 
> It's pretty large 
> kmoir@build:/gitroot/jdt> du -sh eclipse.jdt.core.git/
> 929M    eclipse.jdt.core.git/

SWT included a second repo for its binaries, to help contain the size.  The bulk of your space seems to come from:

155516  jdt-core-home
47448   org.eclipse.jdt.core.tests.model
35312   org.eclipse.jdt.core.tests.performance

We didn't migrate platform-ui-home, maybe if that was moved by itself or removed?  Wasn't that information migrated over to the org.eclipse CVS repo?

PW
Comment 16 Kim Moir CLA 2011-08-11 14:21:30 EDT
Okay, I see that now that I have pulled the latest migration scripts. Thanks Paul.  

Yes, I'll exclude jdt-core-home from the real migration.  I just grouped the bundles by group ownership to select the ones that needed to be migrated.  I forgot that those ones were migrated to the org.eclipse CVS repo.
Comment 17 Deepak Azad CLA 2011-08-12 00:19:47 EDT
(In reply to comment #15)
> SWT included a second repo for its binaries, to help contain the size.  The
> bulk of your space seems to come from:
> 
> 155516  jdt-core-home
> 47448   org.eclipse.jdt.core.tests.model
> 35312   org.eclipse.jdt.core.tests.performance

This is 238276 (or 238M), which still leaves about 700M, which is also large.
Comment 18 Jay Arthanareeswaran CLA 2011-08-12 01:23:58 EDT
(In reply to comment #17)
> This is 238276 (or 238M), which still leaves about 700M, which is also large.

It's probably due to the long history and numerous branches we have in the repository. The .pack file is taking most of the memory. I will see if cloning with fewer branches makes a difference and report.
Comment 19 Dani Megert CLA 2011-08-12 03:33:30 EDT
(In reply to comment #18)
> (In reply to comment #17)
> > This is 238276 (or 238M), which still leaves about 700M, which is also large.
> 
> It's probably due to the long history and numerous branches we have in the
> repository. The .pack file is taking most of the memory. I will see if cloning
> with fewer branches makes a difference and report.
R*maintenance branches need to be kept.
Comment 20 Jay Arthanareeswaran CLA 2011-08-12 04:55:48 EDT
I tried with master and 3 R_* branches but that didn't make a difference to the size. It could be something else that is contributing to the '.pack' file and if it's the versions and tags, I don't think we can do much here. Even splitting into multiple projects may not help. But I am not a Git expert and could be wrong.
Comment 21 Paul Webster CLA 2011-08-12 10:37:30 EDT
(In reply to comment #17)
> > 155516  jdt-core-home
> > 47448   org.eclipse.jdt.core.tests.model
> > 35312   org.eclipse.jdt.core.tests.performance
> 
> This is 238276 (or 238M), which still leaves about 700M, which is also large.

Each folder contributes about 1.6 to 2 times that amount (especially with binaries) because it's in that folder *and* in the .git directory.

The clone will always get the entire .git directory, which is about .33 to .5 the size of your cloned repo.  Branches are just files (in theory :-) with a hash in them pointing to the tip commit.

PW
Comment 22 Jay Arthanareeswaran CLA 2011-08-16 02:47:17 EDT
I quickly looked at the clone and this is how I find the space to be occupied:

Size on disk:      1.22 GB
JDT/Core projects: 159 MB
jdt-core-home :    151 MB
.git folder :      939 MB

So, if we get rid of the jdt-core-home, we can save about 200 - 300 MB (according to Paul). Even after that, the clone will take nearly a GB.

Olivier, is there something else such as binaries etc. that we can move to another repository? I personally don't like it as it might require changes in the projects' set-up.
Comment 23 Jay Arthanareeswaran CLA 2011-08-16 05:47:47 EDT
By the way, I did some sanity tests on the test repository, looked for tags & branches, tried committing, pushing and such - things seem alright. Is there anything else that needs testing in particular?
Comment 24 Dani Megert CLA 2011-08-16 06:12:01 EDT
>By the way, I did some sanity tests on the test repository, looked for tags &
>branches, tried committing, pushing and such 
Did you also try for maintenance branches? How about releasing into the map file?
Comment 25 Jay Arthanareeswaran CLA 2011-08-16 06:29:53 EDT
(In reply to comment #24)
> Did you also try for maintenance branches? How about releasing into the map
> file?

Yes, I did the tests on R3_7* also. I didn't try the release part, I will do that now. I suppose the Releng tool doesn't support Git yet. Is it to be done manually?
Comment 26 Dani Megert CLA 2011-08-16 07:35:07 EDT
(In reply to comment #25)
> (In reply to comment #24)
> > Did you also try for maintenance branches? How about releasing into the map
> > file?
> 
> Yes, I did the tests on R3_7* also. I didn't try the release part, I will do
> that now. I suppose the Releng tool doesn't support Git yet.
Right
> Is it to be done manually?
I saw some messages going by that talk about script being available but didn't look at those yet.
Comment 27 Kim Moir CLA 2011-09-23 11:48:20 EDT
From Olivier

Run this command after migration
git filter-branch --index-filter 'git rm --cached -ignore-unmatch org.eclipse.jdt.core.tests.performance/full-source-R3_0.zip org.eclipse.jdt.core.tests.performance/GenericTypeTest.java org.eclipse.jdt.core.tests.performance/EclipseVisitorBug.java org.eclipse.jdt.core/jdtcore.jar org.eclipse.jdt.core/jdtcoresrc.zip org.eclipse.jdt.core.tests.performance/compiler-R3_0.zip org.eclipse.jdt.apt.tests/perf-test-project.zip' --tag-name-filter cat -- --all
rm -rf .git/refs/original/
rm -rf .git/logs/
git gc --prune=now


Also this project should be moved to its own repo
plugin@org.eclipse.jdt.core.tests.binaries=v_C12,:pserver:anonymous@dev.eclipse.org:/cvsroot/eclipse,,jdt-core-home/org.eclipse.jdt.core.tests.binaries
Comment 28 Kim Moir CLA 2011-09-25 16:06:14 EDT
Note:

Getting rid of tags with * 

sed 's/1\*0/1_0/' < git-dump.dat > git-dump.dat.new
sed 's/2\*0/2_0/' < git-dump.dat.new > git-dump.dat.new2
Comment 29 Kim Moir CLA 2011-09-26 10:33:50 EDT
Okay, your repos are ready. I ran most of the migration last night so you could check them today since this is one of the largest repos to migrate.

ssh://userid@git.eclipse.org/gitroot/jdt/eclipse.jdt.core.git 
ssh://userid@git.eclipse.org/gitroot/jdt/eclipse.jdt.core.binaries.git 

kmoir@build:/home/local/data/eclipse/eclipse.jdt.core.deploy> du -sh  /gitroot/jdt/eclipse.jdt.core.git/
163M    /gitroot/jdt/eclipse.jdt.core.git/

Let me know if you see any issues.
Comment 30 Olivier Thomann CLA 2011-09-26 10:41:07 EDT
I'll check them immediately. Please don't release anything to these repos before they are validated.
Comment 31 Olivier Thomann CLA 2011-10-04 10:37:13 EDT
This can be closed as FIXED.
Comment 32 Srikanth Sankaran CLA 2011-10-24 01:17:58 EDT
We have had many successful builds after the migration to git and there are no
known/open issues from JDT/Core p.o.v. (other than that some inertia bound
luddites (I know of atleast one ;-)) haven't stopped bemoaning the attempts
to (mixing metaphors) move their cheese that ain't broken. These elements will
hopefully come to terms with the new world order soon :))

This task is marked to verified to be complete at 3.8 M3 time.