Mergiraf: a syntax-aware merge driver for Git
A central feature of Git is the ability to merge the contents of diverging revisions. It underpins not just the git merge
command, but also rebase
, cherry-pick
and revert
for instance. Without it, no collaboration would be possible. And it generally works great.
One limitation is that it’s line-based. If the two sides touch neighbouring lines, we need to manually resolve a merge conflict. That happens even if the changes are touching independent syntactic elements, because Git’s merging heuristic doesn’t know about syntax. Which begs the question: what if it did?
There has been research on this topic and various prototypes of syntax-aware merging algorithms exist out there. Git offers an extension point to provide a custom “merge driver”, an executable taking the two diverging versions of a file together with their common ancestor and producing the merged file. Curiously, this extensibility does not seem to be used that much. I couldn’t find any open source merge driver that would do syntax-aware merging for common programming languages, and be reliable enough for daily use. I tried improving Spork, the most promising one I could find, but I quickly ran into issues that couldn’t be fixed easily in the existing code base.
So I made one: it’s called Mergiraf. It was fun to make and I hope it can be useful to others.
I have tried to put the focus on usability, in contrast to academic prototypes which generally focus on quantitative evaluations on benchmarks. That means:
- being fast enough for interactive use: I don’t want to slow down my daily git commands for the sake of solving some conflicts once in a while,
- erring on the side of caution by producing conflicts markers in tricky cases, as an incorrectly solved conflict can be difficult to detect and have serious consequences,
- offering ways out of situations where Mergiraf produces an incorrect merge (with utilities to review its work and report issues easily), because those can and will happen,
- comprehensive documentation written as a user manual, not a research paper,
- dog-fooding: I have been using it for a while for real development on other projects and it’s working well so far.
So, would you use this? I suspect most people don’t encounter enough conflicts in their daily work to bother installing it. But I think Mergiraf can be quite useful if you maintain a fork and regularly synchronize it with upstream, in which case conflicts often pop up. I actually find it helpful even to reorganize my own work, making it quite a bit easier to reorder commits on a branch for instance. Avoiding conflicts with myself, in a sense.
I have also tried setting up Mergiraf as a project that’s inviting to new contributors. For instance by writing a detailed tutorial about adding support for a new language and a GOVERNANCE.md file setting out how people can get involved (Is it presomptuous to have a governance model for a tiny project where I’m the only contributor? More on that later.)
Many thanks to the Spork team for their help understanding and tinkering Spork, to various friends who provided various sorts of feedback at various stages, and to Freya F-T for the illustrations in the documentation!
Feedback welcome - for instance as issues on Codeberg.