Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] RFC: Optimized "single-commit" push

I'm facing a similar problem with pushing a single annotated tag into
a big repository, I've reported it some time ago:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=484944

Push is slow because PackWriter marks everything reachable from the
have set including trees and blobs. If repository is big and has a
wide and deep tree it requires a lot of reads from the disk. In my
case it can take around an hour on a loaded server. I wonder if it
would be reasonable to add an ability to set an implementation of the
following interface into PackWriter:

    public interface PreparePack {

      void preparePack(ProgressMonitor monitor,
                       Repository repository,
                       PackWriter writer,
                       Set<ObjectId> want,
                       Set<ObjectId> have) throws IOException;
    }

The default implementation can do what it does now and cover the most
general case. A custom implementation can use application knowledge
and avoid marking everything reachable from the have set. In case of
pushing an annotated tag for a commit, the app might traverse only
commits from the have set without marking trees to check if the tagged
commit is in the remote repository and it is safe to write only a tag
object. If the app cannot determine what to write into the pack, it
can fallback to the default implementation.


On Mon, Jun 20, 2016 at 12:18 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Fri, Jun 17, 2016 at 8:59 AM, Justin Santa Barbara
> <justin@xxxxxxxxxxxx> wrote:
>> I am using git as a versioned store in a project I'm working on, with
>> a fairly big repo, and using JGit as both the git client (which is
>> actually a RESTful server) and the git server (using
>> jgit.http.server).  Performance is generally good with frequent server
>> GCs and sufficient memory, but a push will sometimes take a long time
>> in the "counting objects" phase (30 seconds or more).
>>
>> Because of my use-case though, the problem is constrained: I am
>> pushing a single commit to the remote server, and there are only a
>> handful of changed files (typically one).  I created an experimental
>> patch that detects this case and optimizes it by directly comparing
>> the new commit's tree to the base commit's tree:
>>
>> https://github.com/justinsb/jgit/commit/9db165e88d162c7f052f6c58784c16d4cd830b3e
>
> Huh. Interesting approach.
>
> Thing is, the PackWriter should already be doing this if you passed it
> wants=[commit], have=[commit.getParent(0)]. I suspect its getting long
> counting times because there are other things in the have collection
> from the server and this is costing more time to enumerate.
>
>> There are limitations however, which is why I gated it behind a
>> boolean option.  The biggest is that if the new files are already
>> available on the server on a different branch, we won't reuse them
>> (e.g. cherry-picks).
>>
>> A few questions I would love some feedback on:
>>
>> 1. Is this something that might be considered for inclusion into jgit?
>> 2. Should I instead figure out a way to expose the
>> PackWriter.preparePack(Iterator<RevObject>) method, perhaps by passing
>> a list containing the known set of objects when doing the push? I
>> imagine that would be more general and thus more welcome in jgit
>> (though obviously harder to use!)
>
> If we do any of these things, I'd prefer the
> preparePack(Iterator<RevObject>) option as it offers more flexibility
> to callers to construct a pack the way they want.
>
> But see my comment above, I really think something is wrong here, as
> the algorithm you implemented is what PackWriter should be doing
> itself for the single have/want case.
>
>
>> 3. Am I doing something obviously wrong to cause a slow 'counting
>> objects' phase (I expect it is just the repo size - it is currently
>> about 250k objects)
>>
>> Many thanks,
>> Justin
>> _______________________________________________
>> jgit-dev mailing list
>> jgit-dev@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>> https://dev.eclipse.org/mailman/listinfo/jgit-dev
> _______________________________________________
> jgit-dev mailing list
> jgit-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/jgit-dev


Back to the top