Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] RevSort#COMMIT_TIME_DESC confusion

>> What I'm looking for is a quick and reliable check whether a certain
>> TARGET commit is reachable from another SRC commit. Currently I'm doing
>> a RevWalk with RevSort#COMMIT_TIME_DESC starting at SRC and stopping
>> once I either encounter TARGET or another commit X with commit-time(X) <
>> commit-time(TARGET).
> 
> Don't do this. Instead use RevWalk.isMergedInto(TARGET, SRC). The
> algorithm doesn't terminate until it finds the merge base between the
> two branches, which means it works even when there is clock skew.

That's definitely a better solution, thanks for pointing out. I'll
probably need some more general methods:

RevWalk.isAnyMergedTo(List<> BASES, TIP)

and

RevWalk.isMergedToAny(BASE, List<> TIPS)

MergeBaseGenerator looks somewhat specific to me and JavaDoc states:
"The maximum number of starting commits is bounded by the number of free
flags available in the RevWalk when the generator is initialized.". So
I'd probably implement this by two utility methods, creating their own
RevWalk. Should I keep them local to our project or submit as patch? In
latter case, any suggestions where to place that methods?

--
Best regards,
Marc Strapetz
=============
syntevo GmbH
http://www.syntevo.com
http://blog.syntevo.com




On 12.08.2011 19:05, Shawn Pearce wrote:
> On Fri, Aug 12, 2011 at 01:55, Marc Strapetz <marc.strapetz@xxxxxxxxxxx> wrote:
>> I was relying on RevSort#COMMIT_TIME_DESC to report RevCommits always in
>> descending order, however this is not the case for certain repositories,
>> i.e. the commit time of a parent may be more recent than the entry's
>> commit time itself (does anyone know why that can happen?). IMHO this
>> behavior should be documented in the javadocs.
> 
> The sorting is not an absolute result. Giving you an absolute sorted
> result means you have to wait up to a full minute on the linux-2.6
> repository before the first commit can be returned, as the entire
> project history must be loaded into memory and sorted. This is simply
> too expensive to perform most of the time. So COMMIT_TIME_DESC
> approximates by running everything through a priority queue sorted by
> commit time, descending. But when there is clock skew across commits,
> yes, they can arrive out of order in the result.
> 
> If you really want all commits sorted by time, you need to run the
> RevWalk until you have buffered all results in your own data
> structure, then sort that. Its the only way to eliminate the clock
> skew. But its so expensive to perform that nobody does this.
> 
> On Fri, Aug 12, 2011 at 05:04, Marc Strapetz <marc.strapetz@xxxxxxxxxxx> wrote:
>>> Could it be a cross-timezone issue, if local time is used instead of UTC?
>>
>> Looks like you are right, though I don't understand the reasons of this
>> effect. These are the offending commits (taken from the IDEA community
>> repository):
>>
>> commit d4f3d4c655295e2b1cf1d90374f8b8e18fdc3dac
>> tree 02d25f4ce8b2b45ef5e8af6fadbd1d328cb16f22
>> parent 676abb30545bf63409ab061b2fdcd021736896be
>> author Sergey Evdokimov <sergey.evdokimov@xxxxxxxxxxxxx> 1300978514 +0300
>> committer Sergey Evdokimov <sergey.evdokimov@xxxxxxxxxxxxx> 1301056269 +0300
>>
>>    Add method 'toString()' to IntArrayList.
>>
>> commit 676abb30545bf63409ab061b2fdcd021736896be
>> tree d93631c534c20a088cb2e4fb5a5c6c2dfea108ac
>> parent bea282d766d21e636752d0f50d603f23e4f868f3
>> author peter <peter@xxxxxxxxxxxxx> 1301056144 +0100
>> committer peter <peter@xxxxxxxxxxxxx> 1301056346 +0100
>>
>>    once the first calculation is finished, don't move the lookup
>>
>> 1301056269 is slightly before 1301056346, whereas order according to
>> local time would be correct. So does RevSort#COMMIT_TIME_DESC assert
>> correct order on local time?
> 
> This is *not* a timezone problem. The times are stored in UTC. The
> timezone next to them are advisory, so you can format the local time
> of the committer and know if it was 3 AM in their timezone, or 3 PM
> when they wrote that commit. Since the parent has a newer commit
> timestamp, this is clock skew. The systems that created these commits
> have very different settings on their system clocks. Its a common
> problem in distributed systems.
> 
> There have been a number of discussions on the Git mailing list about
> clock skew in commits. Clock skew happens. Git tries to deal with some
> clock skew by having a slop bucket as it traverses the history, but
> sometimes the clocks are just too far off and some optimizations do
> break.
> 
>> What I'm looking for is a quick and reliable check whether a certain
>> TARGET commit is reachable from another SRC commit. Currently I'm doing
>> a RevWalk with RevSort#COMMIT_TIME_DESC starting at SRC and stopping
>> once I either encounter TARGET or another commit X with commit-time(X) <
>> commit-time(TARGET).
> 
> Don't do this. Instead use RevWalk.isMergedInto(TARGET, SRC). The
> algorithm doesn't terminate until it finds the merge base between the
> two branches, which means it works even when there is clock skew.
> 
>> Now, according to upper example, that doesn't work
>> correctly. If it's about timezones, I could run until commit-time(X) <
>> commit-time(TARGET - 24 hours). However, I'm wondering if order of
>> commit-times are reliable at all? Can they arbitrarily jump back and
>> force or are there some restrictions on the order of timestamps in Git
>> repositories?
> 
> They aren't reliable. They can be any value. And no current
> implementation of Git enforces a rule like "commit time of descendent
> must be >= commit time of parent". We discussed doing this on the Git
> mailing list a week or two ago, but it hasn't been coded yet for any
> implementation.
> 


Back to the top