Contribution experience report: JGit

Welcome to my fourth contribution experience report. I have done others for:

This episode is the first one about contributing to a library. JGit is a Java implementation of Git, covering many of the features of the original implementation in C. This makes it possible to use Git in Java programs without having to use separate processes and avoids license compatibility issues (Git itself is under GPLv2, JGit uses the Eclipse Distribution License which is similar to the new BSD license). JGit is used in pretty established projects such as the Eclipse IDE or Gerrit.

My motivation to contribute

In short: I wanted to fix a bug in the JGit library that I encountered while tweaking a git merge driver.

In my contribution report about Git, I explained my interest in custom merge drivers to support my work on OpenRefine. Writing my own little merge driver was easy enough to get rid of merge conflicts in import statements, but I figured out it would likely be more efficient to use an existing merge driver for Java files, to avoid re-inventing the wheel. Spork is the one that looks the most mature as far as I could tell, but it lacks support for the “diff3” conflict presentation mode, which I rely on heavily to understand how to solve conflicts. So I tried adding support for this in Spork. In some cases Spork falls back on a textual merge implemented in JGit. While writing tests for Spork I encountered a case where JGit itself generated an incorrect merge output. This is the bug I tried to solve in JGit.

First contact with the project

This was a rather messy process. To report the bug, I first landed on Eclipse’s Bugzilla instance and was super confused about being asked to “select a classification” for my bug. I had no idea which classification the JGit project belonged to. I looked at existing JGit bugs in Bugzilla for inspiration, but could not find a classification there either.

It somehow occurred to me that all those bugs were pretty old, with barely no recent activity. Via JGit’s support page I indeed found a link to the GitHub issues for JGit, which seem to be the new official bug tracker. So I opened issue #38 about my bug.

I also realized I needed to sign the Eclipse Contributor Agreement to submit patches, which I did. I don’t remember the exact experience but it felt rather baroque. My notes say: “OMG DCO”.

Development environment

As usual, the first step to contributing to a new project is generally to clone its git repository, and that’s straightforward, right? Well, not in this case.

I first landed on “https://git.eclipse.org/r/plugins/gitiles/jgit/jgit”, which back then said “repository moved to https://git.eclipse.org/r/jgit/jgit”. Which was odd because “https://git.eclipse.org/r/jgit/jgit” returned a bare “Not found” error in plain text. Still, things seemed to be actively reviewed in Gerrit, so I thought I’d try to use that. I found the documentation for Gerrit in Eclipse and followed it to set up my SSH key for upload.

Pushing my changes gave:

 ! [remote rejected]     HEAD -> refs/for/master (prohibited by Gerrit: project state does not permit write)

Not great! I then checked the CONTRIBUTING.md file and found a link to a new Gerrit instance, with more recent activity: https://eclipse.gerrithub.io/q/project:eclipse-jgit/jgit+status:open. That sounds promising! But in the same file, they also still pointed to https://bugs.eclipse.org/bugs/enter_bug.cgi?product=JGit to open issues, which says “Sorry, entering a bug into the product JGit has been disabled.” Okay, then let’s make a patch for that first…

I went on to switch to the new Gerrit instance. Do I need to add my SSH key again? No, it’s handled already when creating my account on GerritHub, fine.

I did yet another change of git remote with git remote set-url origin "ssh://wetneb@eclipse.gerrithub.io:29418/eclipse-jgit/jgit" Ah, I also forgot about the DCO so I need to do git commit --amend -s.

Finally, I ran the magical git push origin HEAD:refs/for/master and success, I got a link back: https://review.gerrithub.io/c/eclipse-jgit/jgit/+/1177977

That was a really convoluted process, which took me a lot of time to figure out. I think I landed in the project at a rather infortunate time, when they had just made the switch to a new forge and the documentation wasn’t quite up to date. Maybe forges need better ways to indicate that the project has moved to a different space.

It doesn’t help that because the bug tracker is now on GitHub, it would intuitively make sense to be able to submit pull requests there. Alas, pull requests apparently cannot be disabled on GitHub projects, so you need to resort to pull request templates to indicate where to submit changes. Despite that, there is currently an open pull request with what looks like a rather good contribution, which the author abandonned because they could not figure out how to submit it. Not ideal!

To summarize, I considered in total four different places where to submit my changes:

Of course it makes sense for JGit to use Gerrit, which I have nothing against. There are some pretty nice features that I miss in GitHub, such as showing the merge conflicts with other pending patches.

With all this, I still haven’t made code changes! Let’s open the IDE. This is a project of the Eclipse Foundation, used by the Eclipse IDE itself, so you’d think that working on it via Eclipse should work rather well, no? Think again! I imported it in Eclipse successfully, but then noticed that Eclipse makes changes to its project settings which are tracked in the git repo. This means dozens of tracked files with unstaged changes, which can easily slip into a commit if using git commit -a. Maybe I’m not using the right version of Eclipse, or my environment isn’t configured correctly in another way? In any case that’s not very pleasant to work with.

Finding my way into the code base

Given that I noticed the bug from another project using this library, I already knew the entry point through which the bug was manifesting itself, so from there it was relatively easy to trace it back to the problematic code. The algorithm to merge text files is rather fiddly but I found JGit’s implementation to be still fairly readable, especially in comparison to Git itself.

Reviewing experience

My Gerrit patch didn’t attract a lot of attention initially. I reached out by email to the person who implemented diff3 support in JGit to have their opinions on my change, but did not get a reply. After two weeks, I tried to rope in some random folks I saw were reviewing other patches. I could eventually attract a reviewer who was kind enough to approve the patch even if they were not familiar with this part of the code base.

Actively pinging people to find reviewers for my contributions is something I often do in such contexts. I feel a bit bad about asking for more work from maintainers who are generally already quite stretched. I try to do it in a kind way, without pressuring people or making them feel bad about the delays. I guess the etiquette around this varies from project to project. The XZ incident did shine some light on the social dynamics at play in such situations.

Testing infrastructure

I am used to working with Eclipse, so it was pretty simple to run unit tests interactively from the IDE. The particular algorithm I wanted to fix was covered by a pretty good test suite, which I could imitate to add my own tests.

However, running the entire test suite from the command line with mvn clean test fails on my machine:

[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   RacyGitTests.testRacyGitDetection:57 expected:<...de:100644, time:t0, [length:1, content:a][b, mode:100644, time:t0, length:1], content:b]> but was:<...de:100644, time:t0, [smudged, length:0, content:a][b, mode:100644, time:t0, smudged, length:0], content:b]>
[INFO] 
[ERROR] Tests run: 5676, Failures: 1, Errors: 0, Skipped: 110

The test suite is known to have flaky tests like this one, which is a bit sad. Not being familiar with those tests and the code they cover, I’d just propose to mark them as “expected failures”, so that they don’t stand in the way of new contributors, but I suspect this wouldn’t be accepted as it would be better to fix the tests themselves. I don’t have the capacity to start investigating race conditions in those tests, as this can be quite time-consuming.

Code formatting

Eclipse automatically reformats code when I save, which is nice. I assume this must conform to the project’s settings. The downside of this integration is the problem I mentioned earlier of Eclipse updating its configuration files tracked in Git.

Governance and roadmap

Looking at the mailing list, the project is actively looking into bringing more people and requiring two reviews before merging contributions. Maybe they’ll be interested in this blog post.

Concerning gouvernance, I am not completely sure how it works. The Eclipse Foundation has general guidelines about the gouvernance of the projects they host, but JGit’s own Gouvernance page only lists the releases, which seems rather unrelated. The Getting Involved page does not say much about that either.

Would I contribute again?

JGit a library that’s quite far upstream from my own projects, so it’s difficult to justify the time to contribute to it, but if I end up having more free time, I guess it could be fun to make some more contributions. In a sense, the fact that it’s a re-implementation of an existing tool means that you don’t need to make too complex design decisions, which can be a nice thing. The rather tedious contribution experience I encountered so far is a bit chilling, but maybe it would get better now that I passed those initial hurdles.