Bug 418149 - JGit should support merge drivers
Summary: JGit should support merge drivers
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 3.0.2   Edit
Hardware: PC Mac OS X
: P3 enhancement with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 418150 418151
  Show dependency tree
 
Reported: 2013-09-27 03:30 EDT by Mikaël Barbero CLA
Modified: 2020-10-02 11:28 EDT (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mikaël Barbero CLA 2013-09-27 03:30:22 EDT
As stated in this thread http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02289.html, we are planning to add programmatic support for merge driver within JGit.

It requires some refactoring within ResolveMerger class to extract the text-only merge support and to implement it as a default merge driver.

This bug aims at gathering ideas and patchset around this support.
Comment 1 Laurent Goubet CLA 2013-10-22 06:08:12 EDT
I had planned to communicate a little earlier about the work in progress and some things I had trouble to understand... but ultimately thought it would be easier to talk with a patch proposed.

https://git.eclipse.org/r/#/c/17634 aims at adding the support for custom merge drivers into JGit.

The main thing I do not like with this patch is that the MergeResults are leaked to outter APIs, even though they are specific to textual merges. I've added a number of FIXMEs on the problematic lines, such as in https://git.eclipse.org/r/#/c/17634/2/org.eclipse.jgit/src/org/eclipse/jgit/merge/ResolveMerger.java at line 895.

As far as I can tell, the MergeResults are never used outside of tests except for one single use in org.eclipse.jgit.pgm.Merge.run() where they are only used to tell which files have failed (the actual failing sequences are never used, only the file names). I'd have removed this whole thing altogether if not for that single use and the fact that it is exposed as public API, thus requiring a bump in the version number if removed. What are your thoughts about this? Should it be removed? Deprecated one way or another? This is currently forcing me to call the content merging on all files (even binary) at line 622 of the ResolveMerger.

I plan on adding a few unit tests for the binary merger before this review can be pushed, but since the main code is done I'd like for your feedback on this.
Comment 2 Robin Stocker CLA 2013-11-15 10:58:22 EST
Reviewed the change just now. While reviewing, I started to wonder if this is the right approach for supporting logical model merge.

The thing is, the merge driver will be called and in case of model merge, I assume it will have to show a UI and wait for user input. While this is happening, the merge will be blocked. It will also happen on a file-to-file basis. Is this not problematic for model operations, which may need to operate on more than one file? What if the conflict spans two files which are connected, wouldn't you want to resolve that in one step instead of two?

Would it not be better to handle this in EGit, after JGit has finished merging? We could add an extension point for this that asks whether custom merge tools want to get involved. Note that it is still perfectly possible to get ours/base/theirs version of the files from the Git index at this point. If custom merge tools where also supported e.g. from the staging view, the user could also decide in what order they want to merge instead of this being determined by JGit.
Comment 3 Laurent Goubet CLA 2013-11-18 03:26:16 EST
(In reply to Robin Stocker from comment #2)
> Reviewed the change just now. While reviewing, I started to wonder if this
> is the right approach for supporting logical model merge.

It is not :).

> The thing is, the merge driver will be called and in case of model merge, I
> assume it will have to show a UI and wait for user input. While this is
> happening, the merge will be blocked. It will also happen on a file-to-file
> basis. Is this not problematic for model operations, which may need to
> operate on more than one file? What if the conflict spans two files which
> are connected, wouldn't you want to resolve that in one step instead of two?

Even more than this, since it is not the responsibility of the drivers to determine what should or should not be merged but that of the strategy. The merge drivers are only called when a textual conflict is detected on a given file. On the contrary, model merges could (and should) happen on files that are considered by Git to be "trivial merges" (i.e. on which it will never call the merge drivers in the first place).

> Would it not be better to handle this in EGit, after JGit has finished
> merging? We could add an extension point for this that asks whether custom
> merge tools want to get involved. Note that it is still perfectly possible
> to get ours/base/theirs version of the files from the Git index at this
> point. If custom merge tools where also supported e.g. from the staging
> view, the user could also decide in what order they want to merge instead of
> this being determined by JGit.

Actually, model merging should be handled by EGit *before* JGit gets its way with the files. A file that has been added in theirs (not present in ours or base) would be checked out directly by JGit, even though that file could be part of a larger model that is in conflict, a file that has no textual conflict could have a model conflict (and reversely)... However, when a file is *not* a part of a larger model, we do wish to delegate to the default mergers.

The subject of this bug is to add support for merge drivers into JGit. Even though I noticed too late that this would not really help us with model support, it is still an important part of the Git logic (being able not to corrupt binary files when merging remains a good thing :p).

As far as "model" merging is concerned, I planned to implement a custom implementation of the ResolveMerger that would be aware of models (thus, implemented in EGit and used from EGit, unknown from JGit). This does force me to iterate twice over the files "to be merged" though : a first pass to detect models, a second to actually merge by delegating to the proper mergers : model merging for the models, default merge drivers for the remainder. That should be discussed on another bug though. I'll try and open one asap with the first draft of the logic I think of implementing (no code yet, but at least I can describe what I think I need for comments).

I'll look at the review for this bug now :).
Comment 4 Louis Burton CLA 2015-05-14 06:47:01 EDT
Thanks for all the hard work on this bug Laurent.

I notice that after a lot of hard work it seems the code review was abandoned in February of this year, and that progress stalled due to priorities elsewhere. I see Christian fed back a few comments, but mostly minor things such as parameter naming or stream initialising.

Obviously you know the problems and complications, but I still hope to see the ability for merge drivers one day to help enable use of drivers such as :
https://github.com/ralfth/pom-merge-driver

Which will in turn help the open source jgitflow project which uses jgit.
Comment 5 Shahim Essaid CLA 2015-12-11 12:49:49 EST
I'd like to second the above comment. I have a custom merge and diff driver for OWL files and it would be very helpful if JGit is able to use these drivers, as defined in the .gitattributes. Specifically, I would like for my Gerrit Code Review server, which uses JGit, to be able to show custom diffs, apply the custom merges, etc. but this requires that JGit supports this Git extension feature.

My custom diff and merge drivers work from the command-line and are Java based (i.e. no UI, and can be added to the classpath), and show console output that is very similar to the default diff/merge output. The only thing that is needed is a call to the commands defined in the .git/config for the drivers specified in .gitattributes. The commands will be on the $PATH. The command-line Git implementation defaults to the builtin drivers if a command for the custom driver is not found in .git/config and JGit could probably do the same instead of throwing exceptions. This way other JGit based application won't be affected if they come accross a repository that has a custom driver in .gitattributes but the driver is not installed on the local system, and not configured in .git/config. Also, it would be really nice if there is a mechanism in JGit to lookup these custom drivers in the current JVM classpath, provide a console IO callback, etc. before defaulting to an external process.

I haven't looked at the previous work yet but I'm willing to help where I can. Can someone please add a comment to describe the feasibility of doing this, what needs to be done, etc.?
Comment 6 Shahim Essaid CLA 2015-12-11 13:07:54 EST
I asked a related question on the Gerrit list here: https://groups.google.com/forum/#!topic/repo-discuss/GO89Va2LEY0
Comment 7 Christian Halstrick CLA 2015-12-14 04:38:18 EST
I can summarize what JGit learned in the last month regarding this feature.

One aspect of this is that we need to support .gitattribute files at all. JGit was not looking at these files some time ago but current versions of JGit at least recognize them and interpret a (very) limited set of possible attributes contained in them. What we currently support for "filters" [1]. You can define clean and smudge filters which JGit calls before adding/checkout. They have been introduced to support "git lfs" feature [2].I think Ivan is working on teaching JGit macros [3] and especially correct line ending handling. See his changes [4].

Another aspect is that we need to support merge drivers and the definition in .gitattributes which drivers to use for what kind of files. Both is not supported by JGit up to know. Currently JGit knows only about MergeStrategies. These are different implementations of an interface. Build-in we have OURS,THEIRS,SIMPLE_TWO_WAY_IN_CORE,RESOLVE,RECURSIVE. You can implement and register your own MergeStrategy and also set in in MergeCommand but that's not what you need. Since we now already support filters there is already a lot of code in JGit which can be reused for real merge drivers. There was even an attempt from Laurent to introduce merge drivers in JGit [6] but it came out that low-level merge drivers was not the right tools for implementing what they needed.

A third aspect is Gerrit. I can imagine that convincing a central gerrit server to execute your own merge/diff-drivers is a security-nightmare. I would ask this question on the gerrit ML [5] 

[1] http://git-scm.com/docs/gitattributes#__code_filter_code
[2] https://git-lfs.github.com/
[3] http://git-scm.com/docs/gitattributes#_using_macro_attributes
[4] https://git.eclipse.org/r/#/q/owner:ivan.motsch%2540bsiag.com+status:open
[5] https://groups.google.com/forum/#!forum/repo-discuss
[6] https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02819.html