[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [egit-dev] [JGIT] jgit clone file:// question

Marek Zawirski <marek.zawirski@xxxxxxxxx> wrote:
> Mark Struberg wrote:
>> If I do a $> git clone file://
>> git (under linux) is so smart to _not_ copy all the repo blobs but only 
>> create hardlinks instead.

Uh, wrong.

`git clone /path` will use hard links for everything, but falls back
to a straight file copy if the link(2) system call fails, such as
because we are crossing a filesystem boundary, or the OS is unable
to support it.  It also links (or copies) all objects in the source
repository, even those which are unreachable and could be pruned.

`git clone file:///path` will defeat the hard link approach above
and instead spawn a `git-upload-pack` process against /path, and
runs the native Git transport protocol over a pipe.  The new clone
is a repacked copy of the source, shares nothing on disk, and will
not have unreachable objects as they were not packed onto the pipe.

>> I know this is not possible under windows and 
>> surely not with Java. But is there any 'softlinking' which will be 
>> used, or will al files be copied over to the new clone when using JGit?
> AFAIK there will be no "softlinking".

Correct.  Java does not support hard links or soft links.  Soft links
for objects in a Git repository are incredibly dangerous.  The source
repository could remove the object and break the other repositories.
Hard links work because the filesystem ensures the disk blocks stay
in use until the last link is removed.

Because we don't have hard links available from Java we can't
emulate the `git clone /path` behavior.  So we only implement the
`git clone file://` behavior.

We could look into using JNA-POSIX.  But we can't because its license
is incompatible, its GPL and some of our downstream consumers are
EPL and can't link to it.  Eclipse might have something in one
of their own bundles, but its quite likely to be EPL, which is
incompatible with our downstream consumers who use GPL.  :-)

I want to write an optional JNI layer to accelerate certain functions
in JGit (like pack data access), but its really on the backburner
for me.  We could also put a link(2) implementation into that module.

> LocalTransport class uses packfile protocol for fetch even for local  
> connections (i.e. it communicates with JGit UploadPack thread or
> git upload-pack" program). I am not sure know what clone impl. is used  
> in the maven scenario you mentioned, but I guess it uses  
> FetchConnection, so it is protocol-based.

Right.  We're running a thread in the background to execute
UploadPack, and copying data from the one repository to the
other via that pure-Java pipe between the two threads.

Incidentally it was this implementation that discovered the multi_ack
protocol deadlocks if the pipe buffer isn't over 2048 bytes.  :-)