[jgit-dev] Re: WorkingTreeIterator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[jgit-dev] Re: WorkingTreeIterator - the writing side

From: "Shawn O. Pearce" <spearce@xxxxxxxxxxx>
Date: Wed, 25 Aug 2010 08:08:02 -0700
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mutt/1.5.17+20080114 (2008-01-14)

Christian Halstrick <christian.halstrick@xxxxxxxxx> wrote:
> On Wed, Aug 25, 2010 at 04:22, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
> ...
> > This isn't fully generic.  One of the things I don't like
> > is writeEntryBlob is defined in terms of an ObjectLoader.
> > That is intentional because the implementation can invoke
> > ObjectLoader.copyTo(OutputStream) and possibly avoid some unnecessary
> > copying during output.  However its awkward to use for your merge
> > code for example, its a tad difficult to write out a conflicted
> > file contents.
> 
> Not sure how big the problem is. Whenever merge produces new
> conflict-free content there is theoretically no need to directly write
> it to the working tree. The new content has to be dumped to the
> object-database which will produce the content-sha1 and that sha1 goes
> into the index.

Yes, that is true.  The writeEntryBlob API I proposed is easy for
this case, the content is already stored as a blob and thus readily
available through an ObjectLoader.

> The problem is mainly on files with conflicts because those content
> does not go into the object database and therefore I have to write it
> in the moment the conflict is detected.

Right, that was what I was trying to say.

> One idea for a solution: The use-case that while iterating over the
> working tree I want to modify it and see the effects of this
> modification during the walk is not needed. Why don't we teach the
> working tree iterator to write new content always into .tmp files
> beside the original files.

You can't assume "Foo.tmp" for file "Foo" because the project might
actually also have a "Foo.tmp" file in the same directory.  :-)

But yes, I was proposing that writeEntryBlob()'s implementation
on a filesystem would be to output to a new temporary file, then
atomic rename over the old file to replace it when fully written.

The only problem with creating all of the temporary files up front
is how do you handle files going into new directories?  Ideally you
would defer creating them until the "commit" phase, because you don't
have the parent directory they reside in available any earlier.
And you can't make them at the top level and move them down,
some filesystems like AFS and Coda refuse to rename files across
directories.  And really big projects might do some strange thing
like having their working directory span mount points.

I guess if the parent directory already exists we could build up
all of the temporary files and then later commit them, but if the
directory is new defer creation until the end.  But I just don't
see any advantage to preparing the temporary files earlier.

> For all modifications without new content
> (deletions, tree creations, etc) we store the metadata in lists. In
> the end we have something like a "commit()" which renames the .tmp
> files, deletes files and creates folders.

Oh, I think I see what you want.

During the earlier phase where you are walking all of the trees
you want to get some sort of "future commit promise" from the
WorkingTreeIterator that you can add to the "modifiedFile" list.
Later when you commit the merge, you can loop through that list
and ask each item to commit its change to the working directory,
or if you abort, to ensure its state is cleaned up.

Maybe that is what we do:

  class WorkingTreeIterator {
    WritePromise writeEntryBlob(FileMode mode, ObjectReader reader, ObjectId id);
  }

  abstract class WritePromise {
    abstract void commit() throws IOException;
    abstract void abort() throws IOException;
  }

Then instead of a list of paths, you have a list of WritePromise
instances that you just need to loop through.  How the underlying
working tree manages these is up to itself.  It could create a
temporary file immediately and have commit() be rename and abort()
delete the temporary file.  Or it could just remember the arguments
and perform the checkout during commit().

> This would even move the
> task of .tmp-file handling from multiple places in jgit into a central
> place.

Teach LockFile how to use a random temporary name as an option from
its hard-coded .lock suffix:

  public LockFile(File f, FS fs) {
    this(f, new File(f.getParentFile(), f.getName() + SUFFIX), fs);
  }

  public LockFile(File dst, File tmp, FS fs) {
    this.ref = dst;
    this.lck = tmp;
    this.fs = fs;
  }

  public boolean lock() {
    boolean ok;
      if (lck == null) {
      lck = File.createTempFile("._tmp", "", ref.getParentFile());
      ok = true;
    } else {
      lck.getParentfile().mkdirs();
      ok = lck.createNewFile();
    }
    if (ok) {
      ... rest of current lock() method
    }
  }

Problem solved for the local filesystem IO case.

> > If you have the contents as a byte[], you can hand it off to
> > ObjectLoader.SmallObject and let that be the argument.  Or,
> > implement ObjectLoader yourself and use ObjectStream.Filter and
> > pass in a normal InputStream.
> 
> When I do a content merge I deal with a MergeFormatter which currently
> expects me to specify a OutputStream (and not a byte[]). Ideally the
> WorkingTreeIterator also gives me the possibility to retrieve an
> OutputStream for an object.

Flip MergeFormatter on its head.  Make it be an InputStream instead
of taking an OutputStream as an argument.

In particular EGit using an IFile can use setContents() to apply an
InputStream onto a working tree file.  But it can't do that with an
OutputStream.  So just rewrite the looping code to use two stages of
iterators, one for the outer loop and another for the inner loop.
Extend InputStream, and now MergeFormatter is actually just an
InputStream that you can read the conflicted result from.

:-)

-- 
Shawn.

Follow-Ups:
- [jgit-dev] Re: WorkingTreeIterator - the writing side
  - From: Christian Halstrick

References:
- [jgit-dev] WorkingTreeIterator - the writing side
  - From: Christian Halstrick
- [jgit-dev] Re: WorkingTreeIterator - the writing side
  - From: Shawn O. Pearce
- [jgit-dev] Re: WorkingTreeIterator - the writing side
  - From: Christian Halstrick

Prev by Date: [jgit-dev] Re: WorkingTreeIterator - the writing side
Next by Date: [jgit-dev] Re: [egit-dev] Junit versions
Previous by thread: [jgit-dev] Re: WorkingTreeIterator - the writing side
Next by thread: [jgit-dev] Re: WorkingTreeIterator - the writing side
Index(es):
- Date
- Thread

Breadcrumbs