Downplaying the “Distributed” Dogma


Benjamin recently wrote about the current state of our effort to try to import our CVS repository to… something from this century.

His conclusion is spot on, although I think it… minimizes the head-banging he and I have been going through for a couple of weeks. My original characterization didn’t turn out to be far from the truth, it seems… except, it’s me and my quad-core-P4-with-4-gigs-of-RAM sitting there, bloodied and bruised on the floor, not ClearCase. ;-)

I was somewhat surprised by the number of responses to Benjamin’s post that seemingly amounted to “Can’t you just use Subversion? Subversion works. And if you want distributed, use SVK.”

Well, the first issue with that is cvs2svn1 doesn’t seem to import the Mozilla CVS tree anymore: it’s hitting the error that Hg tends to hit2, and while completely dying is arguably more correct, bzr and cvs2svn 1.3.0′s approach—annotating and ignoring the error, so the import can actually continue—is much more satisfying.

The second issue is that the march towards a distributed version control system really isn’t about a distributed version control system; it’s about using a tool that support merging algorithms that weren’t invented in the 80s, back when you never did branches anyway, because it was annoyingly difficult with the tools of the time.

During the original discussion, the main issue that limited Subversion’s advancement in the race was that it didn’t support any better merging functionality or techniques than its predecessor. It requires external tools to record which merges had been performed and the actual algorithms used are the old ones we all love and/or hate.

Now don’t get me wrong: I use Subversion for all my personal stuff and I like it. I think it’s a great improvement over CVS (which I used for years and imported from) and in many (most?) cases, I would recommend it.

But when you’re going to be doing the kind of “agile”3, disruptive, reconstructive work that Mozilla 2.0 requires, at a minimum, you need a tool that makes branching and merging easy. SVN does work for me (and lots of other people and projects) because I’m not faced with, for example, renaming nsIFrame::GetPresContext, a task where a branch makes a lot of sense, and I’m going to be doing hundreds of renames.

I contend that it’s not so much that we require (or necessarily even want) a “distributed” version control system. In fact, as a counter example, Perforce is a [closed source] centralized VCS that has a lot of great features, including merging primitives that are awesome. Accurev is another (although, I’ve never personally used it.)

We just happen to be focused on “distributed” VCSs because those are the only open source offerings that have merging facilities that handle complicated situations and get the merging stuff right. This is likely because a distributed version control system isn’t worth anything if you can’t merge your work back in easily and [more importantly] reliably.

I’ll concede, of course, that once you have things like offline diff/commit and easy patch sharing among peers, all built-in-and-tracked-by the VCS, that’s (possibly addictive) icing on the cake.

But it’s not about “distributed” part. It’s about the capable-merging part.4

Breaking code apart is easy. Putting it back together is hard.

We want and need a tool that intrinsically expects, is designed to handle, and expertly supports the latter.

1 As of 1.5.0
2 Which amounts to deleting files which don’t exist on branches [possibly yet] that they’re being deleted from.
3 I hate using that [buzz] word.
4 Coincidentally, Joel recently blogged about version control systems and large teams, and it seems the Windows team uses a model very similar to that of the 2.6 kernel developers, and possibly similar to what we’ll end up using. It seems that easy branching (which is easy) and easy merging (which is hard) is the only real way to scale a development project into the thousands.