Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] NPE in getAuthorIdent() on parsed RevCommit object

Dariusz Luksza <dariusz.luksza@xxxxxxxxx> wrote:
> 
> In EGit synchronization we have strange problem with
> RevCommit.getAuthorIdent(). It occurs *only* on oldest commit that is
> include in Synchronization view.
> 
> First thing that I've discover was, that
> remoteCommit.getParent(0).getAuthorIdent() in some cases was throw NPE
> even that remoteCommit.getParent(0).getId() returns proper value.
> Therefore I've use RevWalk.parseCommit(remoteCommit.getParrent(0)) to
> re-parse commit object. After re-parsing getAuthorIdent() returns
> proper value (there were no more NPE). Reference to remote commit
> parent is stored in private final field so that it can be obtained in
> various place of synchronization.

Not entirely strange to NPE.  RevWalk can (and does) discard the
byte[] buffer that holds the commit data that is used to obtain
this information.  If that's discarded, *poof*, no data, and we
NPE on access.
 
> But re-parsing commit object didn't solve this problem. NPE was still
> thrown in same place! For some reasons reference to author ident is
> lost before we achieve this trouble spot. Currently I come up with
> workaround for this issue by re-parsing commit exactly in trouble spot
> but this isn't efficient.

Yea, you really need to make sure its parsed right before you do
the access.  Traversals can cause a discard, especially if you have
the default of setRetainBody(false), or if the commit in question
ever got tagged with the UNINTERSTING flag (e.g. its reachable from
something that was passed to markUninteresting).

When you can't be certain the body is present, the correct pattern
becomes:

  revWalk.parseBody(revCommit);
  revCommit.getAuthorIdent()


We may need to stick a warning label on RevWalk.  Its designed
to be *fast*, not user-friendly to the application developer.
When there is a choice between ease-of-use and raw-throughput,
RevWalk favors raw-throughput.

Consequently we don't check to see if the object has the data it
needs on hand to answer getAuthorIdent().  Nor do we try to load it
when its missing.  We assume its there, because there are valid use
cases where you can prove its going to be present and bypassing the
checks saves time when dealing with 60k commits.  If you can't prove
it, you should make the extra call to verify the data is loaded.
That is the parseBody method.

If the body is already load, parseBody() basically only cost a few
machine instructions (load the flag field, do a boolean and with
the constant '1', check for != 0, return).  So its fairly cheap,
but if you can prove its not necessary, its good to avoid it.  :-)
 
-- 
Shawn.


Back to the top