Contribution experience report: Spork

Welcome to my eighth contribution experience report. See the one about Git for some background about the initiative.

Spork is a tool to merge diverging versions of Java source files. It’s the result of a Masters’ thesis and has a corresponding scientific article published about it, so it clearly identifies as research software.

It’s debatable whether such type of open source project is meant to include external contributions at all. A published research paper is normally treated as a final research output, which isn’t meant to evolve (apart from possible corrections to fix serious issues), so arguably any accompanying software should be equally final. For reproducibility purposes at least, the paper should point to a precise version of the software with which experiments can be reproduced. But there are also many examples of major open source projects which started off as research artifacts and got a life of their own after that.

My motivation to contribute

I had written my own little tool to fix import conflicts when merging Java files, which worked okay, but I was interested to see if I could instead use an existing tool, more principled and powerful. Spork seemed to be the state of the art, so it felt like a good candidate. Trying it out, I found various issues, and making small patches to address them felt like a refreshing distraction from working on OpenRefine.

First contact with the project

Opening an issue about the lack of compatibility with Java 21, I got helpful explanations from the original authors about the difficulties in offering such a support. Other issues about improvements to the algorithm itself were similarly well received - it felt like a very fitting channel of communication to have in-depth discussions and learn from the authors’ experience.

Development environment

It’s a Java project so I was used to the tooling. The fact that it mixes Kotlin and Java in the same project was a bit of a hurdle, I had to switch to using IntelliJ instead of Eclipse because out-of-the-box support for this set-up was better there.

Finding my way into the code base

It’s a relatively small code base so that wasn’t much of an issue.

Testing infrastructure

Testing is mostly done via an end-to-end suite, with many examples of different merge scenarios. It feels fitting for a tool like this.

Reviewing experience

Just like the discussion on issues, the review feedback was very helpful. For instance we had a pleasant joint investigation of the shortcomings of GraalVM which produces native executables which don’t crash but behave differently to running the tool on a standard Java virtual machine. But on another pull request, it turned out that the main developer had lost the mental context necessary to review my improvements to the algorithm - which is definitely understandable given the academic aspect of the project.

Code formatting

I don’t remember any particular style being enforced.

Governance and roadmap

It feels odd to talk about governance and roadmap for a research prototype, right? Still, I think it would generally be useful to know if authors have the intention of developing an open source project beyond the publication of the associated article, or if they have moved on. That’s one form of governance, answering the basic question: is this project open to contributions at all? Or is it better to improve on this project by making a separate code base, with an associated research paper? Not sure if it ought to be formalized, though.

After the main developer signalled he couldn’t review one of my contributions, another team member reached out to ask if I’d be up for joining the team and continuing development of the project, which I definitely appreciated.

Would I contribute again?

Despite the good experience on the human level, no, because I don’t think the prototype can be turned into a properly usable tool for various reasons. The algorithm is tightly coupled to the Spoon Java parser, which restricts it to merging Java files only, despite the fact that it shouldn’t be very hard to generalize the abstract algorithm to other languages. The reliance on reflection inside this parser also makes it hard to turn it into a native binary, which feels like a must to me if it is to be used as a Git merge driver.