Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] Re: WorkingTreeIterator - the writing side

Hi,

On Wed, Aug 25, 2010 at 17:08, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:

> But yes, I was proposing that writeEntryBlob()'s implementation
> on a filesystem would be to output to a new temporary file, then
> atomic rename over the old file to replace it when fully written.

great, then our ideas differ not too much.

> The only problem with creating all of the temporary files up front
> is how do you handle files going into new directories?  Ideally you
> would defer creating them until the "commit" phase, because you don't
> have the parent directory they reside in available any earlier.
> And you can't make them at the top level and move them down,
> some filesystems like AFS and Coda refuse to rename files across
> directories.  And really big projects might do some strange thing
> like having their working directory span mount points.

Got that point regarding not being able to rename across directories. And
the biggest problem I see is when the iterator iterates over
d/
d/x (file)

and while iterating we tell the iterator that the new workingtree
should look like
d/
d/x/
d/x/y (file)

There is no chance that we create the dir d/x directly (because there
is still the file d/x existing).

My idea here: handle dir's like files. Whenever you tell the iterator
to delete the file d/x and create
a new dir d/x it will just remember it has to do the deletion and
create a new dir d/_x.tmp (of
course the name has to be unique, but java i.o.File takes care about
that). When you tell the
iterator later we need a new file d/x/y we create d/_x.tmp/_y.tmp (we
could even skip choosing
a temporary file name for files already in a temporary directory)

...
> During the earlier phase where you are walking all of the trees
> you want to get some sort of "future commit promise" from the
> WorkingTreeIterator that you can add to the "modifiedFile" list.
> Later when you commit the merge, you can loop through that list
> and ask each item to commit its change to the working directory,
> or if you abort, to ensure its state is cleaned up.
>
> Maybe that is what we do:
>
>  class WorkingTreeIterator {
>    WritePromise writeEntryBlob(FileMode mode, ObjectReader reader, ObjectId id);
>  }
>
>  abstract class WritePromise {
>    abstract void commit() throws IOException;
>    abstract void abort() throws IOException;
>  }

First I liked this WritePromise. But when looking at our concrete
use cases (DirCacheCheckout, ResolveMerger) it looks like we always
want to abort or commit all of the modifications at once. So, why confront
the user of WorkingTreeIterator with the WritePromise class? Why isn't
abort() and commit() on WorkingTreeIterator sufficient? If we don't want to
pollute WorkingTreeIterator with concepts not needed by all read-only users
let's have a subclass or even only interface which collects all write-related
stuff. FileTreeIterator would be a WorkingTreeIterator also implementing
the interface.

>> This would even move the
>> task of .tmp-file handling from multiple places in jgit into a central
>> place.
>
> Teach LockFile how to use a random temporary name as an option from
> its hard-coded .lock suffix:
...
> Problem solved for the local filesystem IO case.

>From the name I never thought that LockFile would fit here, but I'll
try that out.
Anyhow, if we agree on WritePromises or commit(),abort() on
WorkingTreeIterator then
most of the burden of .tmp file handling is already removed from
WorkingTreeIterator
users.

>> When I do a content merge I deal with a MergeFormatter which currently
>> expects me to specify a OutputStream (and not a byte[]). Ideally the
>> WorkingTreeIterator also gives me the possibility to retrieve an
>> OutputStream for an object.
>
> Flip MergeFormatter on its head.  Make it be an InputStream instead
> of taking an OutputStream as an argument.
>
> In particular EGit using an IFile can use setContents() to apply an
> InputStream onto a working tree file.  But it can't do that with an
> OutputStream.  So just rewrite the looping code to use two stages of
> iterators, one for the outer loop and another for the inner loop.
> Extend InputStream, and now MergeFormatter is actually just an
> InputStream that you can read the conflicted result from.

that's a good idea. Since the Formatter is not used so much we are free
to change him and a InputStream may fit better here.
One thing I am concerned about: After you have called format() on a
MergeFormatter taking an OutputStream the MergeFormatter resouces
(e.g. the merge result) can be released. In the InputStream variant resources
needed by the formatter can only be freed after the last byte in in the
inputstream is read. When the expensive resources of the formatter are
freed depends on how well the formatter user behaves. E.g. a merge strategy
delaying reading the merge results returned by formatters up to the point in
time we know the merge will succeed whould consume enourmous amount
of memory (but would be very fast in detecting failing merges because no
new conent is really written to the filesystem).

Ciao
  Chris


Back to the top