Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Implementing shallow clones

Matt Fischer <mattfischer84@xxxxxxxxx> wrote:
> 
> The biggest problem I ran into is that I don't think wrapping the
> RevCommits using the createCommit() function of RevWalk is going to
> work.

Sure it will.  You mentioned that C Git does this by adding the
depth into the "->util" field of the struct commit.  The way we do
the same thing in JGit is by overriding createCommit() on RevWalk
and returning a subclass of RevCommit which has the additional data
values we need.  This gives us more flexibility as we can allocate
any amount of storage we need inline with the object itself.

> The reason is that you seed the RevWalk with start commits
> which you get from elsewhere, so those commits haven't yet been
> wrapped.

Somewhat true.

Our API permits the caller to pass us anything to markStart()
and markUninteresting().  But the calling convention requires
that the RevCommit passed *MUST* come from the same RevWalk.
Well behaved callers already do that.  If they are crossing
RevWalk pools they pass a RevCommit from the one pool into
the other pool's parseCommit() method, which will ensure the
proper createCommit() implementation is used.

If its really an issue to you, override markStart() to type
check the input argument:

  class ShallowWalk extends RevWalk {
    @Override
    public void markStart(RevCommit c) {
      if (!(c instanceof ShallowCommit))
        throw new IllegalArgumentException("Only ShallowCommit supported");
      super.markStart(c);
    }

    @Override
    protected ShallowCommit createCommit(AnyObjectId id) {
      return new ShallowCommit(id);
    }
  }

  class ShallowCommit extends RevCommit {
    int depth = -1;

    ShallowCommit(AnyObjectId id) {
      super(id);
    }

    @Override
    public void reset() {
      super.reset();
      depth = -1;
    }
  }

> You could assume that all such commits would have depth 0,
> and treat them as such if they aren't wrapped, but these commits may
> have been parsed already, meaning their parents have been constructed,
> and those haven't been wrapped either.

You can't assume depth of 0 just because it was passed to
markStart().  We only want a depth of 0 on things that were directly
reachable from a ref.  So the caller probably needs to denote for
us which commits were at depth 0.

> RevCommit, and having its setter function automatically propagate the
> depth down to any parents which have already been parsed.  This is
> hacky and ugly, but at least it illustrates the pattern that will have
> to exist in a correct implementation.

True.  This is similar to the carryFlags code which dives down into
the parents when they are present.  Note that we take an optimization
for a string-of-pearls here by avoiding recursion when there is
only one parent.  This helps to keep our stack from exploding on
really large histories... but we still run a risk of stack overflow.

Its cleaner to just set the depth as the commits come out of the topo
sort generator.  We don't have to worry about stack depth, because
we just take this commit's depth and set it into every parent.
By defaulting the depth to -1 we'll promote to the smallest depth
if it hasn't been set yet:

  for (RevCommit p : c.parents)
    p.depth = Math.min(c.depth, p.depth);

That's a pretty simple pass to insert in front.  You may not
even need to mess around with the StartGenerator.  Though a
RevSort.REVERSE in front of this pass would totally confuse the
results since parents would emit before their children.  So we
might still need to inject this pass into StartGenerator anyway.
 
> As far as getting the pack built correctly, I went ahead with my
> previous proposal to add a SHALLOW flag to RevWalk,

I'm pretty against adding new reserved flags.  Those bits are
really precious.  We only have 32 of them, and right now RevWalk
reserves 6.  That leaves applications only 26 bits for their own use.
There are some algorithms like `git branch --contains` that benefit
from having a lot of bits available to them.

I am however OK with a ShallowWalk that extends RevWalk with just
doing its own flag reservation in its constructor:

  final RevFlag SHALLOW = newFlag("SHALLOW");

The generator would just need to pull this flag off the ShallowWalk
instance to find out what bit it was assigned.  That way we only
lose the bit in a shallow walker, which is a lot less likely to
need a lot of application specific flag bits.

-- 
Shawn.


Back to the top