From: higgins-dev-bounces@xxxxxxxxxxx [mailto:higgins-dev-bounces@xxxxxxxxxxx] On Behalf Of Jim Sermersheim
Sent: Monday, November 19, 2007
7:37 PM
To: 'Higgins
(Trust Framework) Project developer discussions'
Subject: Re: [higgins-dev]
Research on how Higgins could
convert fromCVS to SVN
Does
anyone have reservations about making the switch? If not, what are the
next steps? We'll need to agree on a time that everyone stops using CVS
for an hour or so.
>>> "Jim Sermersheim" <jimse@xxxxxxxxxx> 11/15/07 2:53
PM >>>
So,
I tried importing from this repository into Eclipse (using subclipse) and it
works perfectly. File histories are preserved and version compares
work. I pulled down IdAS and all dependency projects, ran my handy
AWK script to link all the dependency libs to my common deps folder, and all
the projects built without error.
I
didn't have to enter the pw twice from within eclipse
Jim
>>> "Andrew Hodgkinson" <ahodgkinson@xxxxxxxxxx> 11/15/07
1:36 PM >>>
All,
The trial conversion of the Higgins
repository (296 MB) from CVS to SVN completed without any
errors. The total time to create an SVN dump file (that can be
subsequently imported into SVN using 'svnadmin load') required a little over 4
minutes to complete. The cvs2svn statistics are somewhat
interesting:
cvs2svn
Statistics:
------------------
Total
CVS Files: 9880
Total
CVS Revisions: 25451
Total
CVS Branches: 5362
Total
CVS Tags: 33151
Total
Unique Tags: 27
Total
Unique Branches: 6
CVS
Repos Size in KB: 239215
Total
SVN Commits: 2245
First
Revision Date: Wed Oct 12 13:21:57 2005
Last
Revision Date: Thu Nov 15 08:39:48 2007
------------------
Timings
(seconds):
------------------
80.9 pass1 CollectRevsPass
0.0 pass2 CollateSymbolsPass
14.6 pass3 FilterSymbolsPass
0.1 pass4 SortRevisionSummaryPass
0.6 pass5 SortSymbolSummaryPass
14.3 pass6 InitializeChangesetsPass
6.6 pass7 BreakRevisionChangesetCyclesPass
6.6 pass8 RevisionTopologicalSortPass
3.5 pass9 BreakSymbolChangesetCyclesPass
7.1 pass10 BreakAllChangesetCyclesPass
11.2 pass11 TopologicalSortPass
6.6 pass12 CreateRevsPass
0.2 pass13 SortSymbolsPass
0.2 pass14 IndexSymbolsPass
99.3 pass15 OutputPass
252.1 total
real 4m12.295s
user 2m59.771s
sys 0m11.901s
Importing
the dump file to create the SVN repository required approximately 21 minutes on
my server. If you would like to check out a working copy to verify
that your component was migrated correctly, I've put a copy of the repository
on cards.bandit-project.org. Assuming SVN is installed on your
client, you can check out a copy using the following command:
svn checkout svn+ssh://higgins@xxxxxxxxxxxxxxxxxxxxxxxx/home/higgins/svn/org.eclipse.higgins/trunk destdir
The password is higgins$test. You can also pull
down a copy that includes all of the branches by using the following command:
svn checkout svn+ssh://higgins@xxxxxxxxxxxxxxxxxxxxxxxx/home/higgins/svn/org.eclipse.higgins destdir
Note that you will be required to enter the password twice. This
is a quirk of using the svn+ssh scheme.
Thanks,
Andy
>>>
"Daniel Sanders" <dsanders@xxxxxxxxxx> 11/14/07 1:50 PM
>>>
I don't
tend to think that the single versioning thing is a big issue. What
it means is that for a given sub-directory tree (which might represent a
project), the version history might include changes for version numbers 2, 3,
5, 8, 10, 16, and 30, but not for any of the other numbers. As long
as you can see what version numbers actually apply to a particular
sub-directory tree (and you can) I don't know why that version number list
necessarily has to be perfectly monotonic. It is also easy to
find out the highest version number where a change actually occurred in a
sub-directory tree. Are there other specific concerns you have about
this? or limitations you perceive? BTW, any given file in an SVN
repository will also have a sparsely populated version history - so even if you
do have an SVN repository per project, the version history for any object in
the repository will still be sparse. If we accept that we are
going to have sparse version histories for single files, why would it matter
that the same is true for sub-directories that represent a particular project?
>>>
"Jim Sermersheim" <jimse@xxxxxxxxxx> 11/14/2007 12:34 PM
>>>
With
Directory versioning or Versioned metadata, is it possible to have a repository
full of a number of different "projects (each in their own directory)
where each project (subdirectory tree) can have it's own versioning? All
the repositories I've used have a single version for the repository.
>>> "Andrew Hodgkinson" <ahodgkinson@xxxxxxxxxx>
11/14/07 11:18 AM >>>
I'm in the process of setting up a local Subversion repository
so that I can attempt a trial run of the CVS conversion script against the Higgins repository. I'll report the
results on tomorrow's conference call. In the mean time, for those
of you who aren't familiar with Subversion, I found the following section from
"Version Control with Subversion" very informative:
When discussing the features that Subversion brings to the
version control table, it is often helpful to speak of them in terms of how
they improve upon CVS's design. Subversion provides:
Directory versioning
CVS only tracks the history of individual files, but
Subversion implements a "virtual" versioned filesystem that
tracks changes to whole directory trees over time. Files and directories are
versioned.
True version history
Since CVS is limited to file versioning, operations such as
copies and renames-which might happen to files, but which are really changes to
the contents of some containing directory-aren't supported in CVS.
Additionally, in CVS you cannot replace a versioned file with some new thing of
the same name without the new item inheriting the history of the old-perhaps
completely unrelated-file. With Subversion, you can add, delete, copy, and
rename both files and directories. And every newly added file begins with a
fresh, clean history all its own.
Atomic commits
A collection of modifications either goes into the repository
completely, or not at all. This allows developers to construct and commit
changes as logical chunks, and prevents problems that can occur when only a
portion of a set of changes is successfully sent to the repository.
Versioned metadata
Each file and directory has a set of properties-keys and their
values-associated with it. You can create and store any arbitrary key/value
pairs you wish. Properties are versioned over time, just like file contents.
Choice of network layers
Subversion has an abstracted notion of repository access,
making it easy for people to implement new network mechanisms. Subversion can
plug into the Apache HTTP Server as an extension module. This gives Subversion
a big advantage in stability and interoperability, and instant access to
existing features provided by that server-authentication, authorization, wire
compression, and so on. A more lightweight, standalone Subversion server
process is also available. This server speaks a custom protocol which can be
easily tunneled over SSH.
Consistent data handling
Subversion expresses file differences using a binary
differencing algorithm, which works identically on both text (human-readable)
and binary (human-unreadable) files. Both types of files are stored equally
compressed in the repository, and differences are transmitted in both
directions across the network.
Efficient branching and tagging
The cost of branching and tagging need not be proportional to
the project size. Subversion creates branches and tags by simply copying the
project, using a mechanism similar to a hard-link. Thus these operations take
only a very small, constant amount of time.
Hackability
Subversion has no historical baggage; it is implemented as a
collection of shared C libraries with well-defined APIs. This makes Subversion
extremely maintainable and usable by other applications and languages.
>>> "Mary Ruddy"
<mary@xxxxxxxxxxxxxxxxx> 11/09/07 9:20 AM >>>
When the Higgins project was started, Eclipse only
offered CVS, not SVN. So even though SVN has advantages, we had to use
CVS. SVN is now available to projects on request.
SVN has
some features that will give us more control over the build process.
F
or
example: Andy "used to use CVS on another project
and
moved to
SVN. Originally his project was hesitant, but they found that
it made doing nightly builds much easier as a nightly build can be kicked off
on a particular revision. Didn't need to worry about tagging, while
letting developers check in ahead of the builds. Also can get
atomic commits (all or nothing).. Also able to use SVN revision in the file
name of a resulting build so that if someone subsequently reported a bug, we
could go back to the exact source for the build".
On the Higgins developers call yesterday, we discussed the
pros and cons. Dev notes to follow. We had a guest speaker on the
dev call from the Financial Services Technology Consortium and we agreed
to let him review the notes on his presentation to ensure
accuracy, so the notes are
delayed.
Andy was
nominated to research preparations for doing a dry run of the
conversation as part of formally preparing for any actual
conversion. More to follow.
Below is
the overview information I got on the process from Matt
Ward, one of the Eclipse web
masters:
Basically
a project needs to decide on it's developers list, and then the PL sends in a
request to have the repository moved from cvs to svn.
We use
the cvs2svn tool, but there are a couple of caveats from the
documentation:
1) CVS
doesn't record complete information about your project's history.
For
example, CVS doesn't record what file modifications took place within the same
CVS commit. Therefore, cvs2svn attempts to infer from CVS's incomplete
information what /really/ happened in the history of your repository. So the
second goal of cvs2svn is to reconstruct as much of your CVS repository's
history as possible.
2)One of
the most important topics to consider when converting a repository is the
distinction between binary and text files. If you accidentally treat a binary
file as text *your repository contents will be corrupted*.
For more
details check out http://cvs2svn.tigris.org/cvs2svn.html
-Matt.