Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] "instable" diff algorithms lead to wrong merge results

5 nov 2010 kl. 00:26 skrev Christian Halstrick:

> Hi,
> 
> I am having hard times to make the merge results of JGit and C-Git
> similar. During long investigations I detected very interesting
> problems caused by the underlying diff algorithms and I am asking
> myself whether C Git developers are aware of this problem. I think
> that certain diff algorithms can lead to really wrong merge results
> (wrong in the eyes of a human reader of the merge result). Here is the
> problem:
> 
> Imagine you have a diff algorithm which is how I call it "not stable".
> In an "not stable" diff algorithm you can never be sure which EDITS a
> diff algorithm will produce for a isolated (surrounded by large blocks
> of common lines) difference. Example (each line of content has as
> prefix the line-number and a colon):
> 
> Lines 100-104 of Content BASE :
> ====
> ...
> 100: a() {
> 101: }
> 102:
> 103: b() {
> 104: }
> ====
> 
> Lines 100-107 of Content OURS :
> ====
> ...
> 100: a() {
> 101: }
> 102:
> 103: x() {
> 104: }
> 105:
> 106: b() {
> 107: }
> ====
> 
> Let's look at  two alternative diffs which can be reported by an diff
> algorithm when comparing BASE and OURS:
> a) OURS inserts its lines 101-103 before line 101 of BASE ( thats and
> Insert Edit(101,101,101,103) )
> b) OURS inserts its lines 103-105 before line 103 of BASE ( thats and
> Insert Edit(103,103,103,105) )
> 
> My fear is that at least for MyersDiff whether a) or b) is reported is
> determine on how lines 0-99 look like. If this is true the following
> can happen: Imagine there is also a content THEIRS which differs from
> OURS in lines 0-99 but which is exactly similar to OURS regarding
> lines 100-107. The diff between BASE and OURS returns a) and the one
> between BASE and THEIRS return b). During a merge operation where we
> merge together OURS and THEIRS with common base BASE. Since OURS and
> THEIRS do the same modification to BASE in lines 100-107 Semanticly I
> expect that every good merge algorithm will not report a conflict and
> simply take the new content regarding lines 100-107. But because of
> the "not stable" diff the EDITS a) and b) are detected as
> non-overlapping diffs which can be applied independently. We end up
> with content were x() is added twice!!!
> 
> Lines 100-110 of the Merge Result :
> ====
> ...
> 100: a() {
> 101: }
> 102:
> 103: x() {
> 104: }
> 105:
> 106: x() {
> 107: }
> 108:
> 109: b() {
> 110: }
> ====
> 
> Am I missing something?

I think you are. You cannot resolve the merge without conflict. In the case b there is a conflict, 
but since both sides does the exact same edit, the conflict can be easily and reliably be resolved automatically anyway. In a) there is 
an overlap between the inserted regions from ours and theirs at line 103 [the one with x()]. This assumes the merges
has identified the lines 103-105 as the lines to grab from THEIRS.

The main problem with your explanation, I think, is that you cannot reliably edit just "after" or "before", but rather "between".

Regardless of that, identifying and resolving conflicts gets much harder if similar merge problems yield different conflicts. It
simply gets harder to learn from patterns, so I think repeatability is important.


-- robin



Back to the top