The Incompetent Build Master

11/24/2010

Buildbot is the worst continuous integration system I have ever used.

There, I said it.

Now, I will admit: my sample size isn’t huge, but my trials with the tool has, on many an occasion, had me pining for Tinderbox11, which to those who’ve used it, is probably saying a lot.

I had always assumed my experience with Buildbot was largely a function of the method by which it was crammed down rolled out . Besides, Buildbot seemingly had a large, and reportedly happy community of users, a reason constantly parroted as a justification for the ill-considered deployment.

But as I ranted over the years2 about Buildbot’s (many) shortcomings, the echoes I got in return of other developers and users who… well… frankly hated their lives every time they had to interact with Buildbot surprised me.

My complaints fall into three general buckets:

  1. Buildbot’s unfortunate and stifling reliance on twistd; there isn’t a problem Buildbot doesn’t think the twistd hammer is large enough to bludgeon:

    • Each slave must be constantly connected to the master; if for any reason that connection is interrupted IN THE SLIGHTEST, your build will die. It is difficult to put into words how asinine and ill-considered this fundamental design flaw is, and how many hours of time it has wasted for developers, release engineers, and organizations using it.
    • (Due to the above) Buildbot’s bandwidth usage is incredibly high; it may not matter if your Buildbot master and slaves are across a switch, but across the Internet can be more problematic3.
    • Using the same thread for build management and status reporting: if you want to use Buildbot’s status reporting4, it relies on twistd’s anemic web server. If Google decides to index you, this can cause your builds to fail5.
    • Speaking of this “web” “server” and ignoring the lack of-flexibility and -scalability, I really wish I could password protect the administration interface: I know we’re all friends, but spam bots kicking off builds? I fail to understand why any author of a continuous integration system would consider that acceptable.
    • This twistd obsession requires users6 to install a number of libraries7 on all of their build machines. It’s annoying on Linux and Mac. You’ll want to make sure you have a noose handy if you have to install it on that “other” operating system.
  2. Buildbot’s design is conceptually inconsistent and counterintuitive, making developing code to interact with it incredibly tedious and annoying:
    • Buildbot provides sample code that builds (toy) GNU-style ./configure && make && make install8 apps, attempting to illustrate its ease. In reality, Buildbot makes projects trivial, cookie-cutter projects marginally easy to handle; anything mildly complicated or not considered by its designers borders on frustratingly impossible.
    • Buildbot makes multi-repository checkouts9 a nightmare to manage.
    • Build properties are cumbersome and mostly-useless, but the only way Buildbot allows you to express certain build-time differences; there’s no runtime evaluation to allow dynamic setting of a property, and I have to write scripts that emit the property? Huh?
    • If you integrate your own Python code directly with Buildbot10, buildbot reload won’t pick up code changes in your modules; the only way to do that is a buildbot restart. Oh, you had a four-hour build that was almost done? Too bad.
    • With its ISchedulers and IUpstreamSchedulers and ISourceStamps this code reads more like someone trying to make their program use every example in the venerable design patterns book11 than solving real problems. It wouldn’t be such an annoyance if it didn’t get in the way of doing real work…
    • And what is this obsession with ensuring objects only contain the things Buildbot wants them to contain13? It’s like drawing up this beautiful UML diagram of your design patterns-approved program, and then purposefully disallowing developers to create useful subclasses using it.
    • In what is the worst of Buildbot’s design quirks: Buildbot blurs the lines of “continuous integration system” and “build harness/system” and leads your entire build system happily down a path of lock in. Buildbot prompts you to put your entire automation configuration and process into its master.cfg‘s, and express those processes in a meta-language14 that is meaningless outside of Buildbot. It means if you ever want to develop an automation process, you must have available a full master/slave setup15 to do even the tiniest bits of development. Migrating away from this incurs huge costs16 should you make this mistake17.
    • If you think I’m the only one who has problems with Buildbot’s strange and wild inconsistencies, one of Buildbot’s poster-child users, Mozilla Corporation, apparently has these problems too.
  3. Buildbot discourages community participation:
    • Requiring random firewall ports to be opened so a slave can talk to a master is a non-starter in many organizations, both for slave admins and master admins.
    • Adding a slave to provide continuous integration for your random, under-loved platform now requires consent of a “buildbot master administrator.” It requires access to that master. Tinderbox made this trivial. It’s like Buildbot wants to make it as difficult as possible to let community members help a project with platforms that possibly only a minority of contributors care about.
    • Running that random, under-loved integration slave at home? Reference that bit above about all that constant bandwidth! Hope you’re not paying your ISP by the byte…
    • For quite awhile, the Buildbot project seemed abandoned18. It now has a project maintainer, but both the original author and the project maintainer work for Mozilla Corporation, If Mozilla Corporation’s handling of XULRunner is any indicator, you’d better want to use Buildbot for exactly the same purposes Mozilla Corporation uses Buildbot for. There’s a large body of evidence to indicate that other uses will be deemphasized and patches for that ignored.

One might ask “Well, ok, so what would you recommend be used?”

I don’t have a clear answer; lots of people swear by Hudson. Tinderbox had its failing, but its children revisions are, in my opinion, even worth investigating over Buildbot.

I’d welcome others’ suggestions.

But based on lessons paid for in frustration, tears, and wasted time I could’ve been working on something productive for the enginering teams I support, I know very clearly which one I would not use.

_______________
1 Whose reputation as the first real continuous integration tool is largely forgotten
2 Often on Twitter
3 And for what? The eye-candy of a moving build lot, which is amusing for all of 30 seconds?
4 Waterfall page, etc.
5 Due to twistd spitting out build logs to the Googlebot instead of reading your build slaves’ sockets
6 Often overworked, under-staffed release engineering teams
7 twistd, twistd-core, zope interface
8 Buildbot thinks every project is so like that, there are even specific buildsteps written JUST. FOR. THAT.
9Increasingly common with the proliferation of distributed version control
10 One actual benefit, but you’ll soon see why it’s not…
11 Brendan Eich’s oft-repeated dig at Gecko’s original authors of “My First Object-Oriented Rendering Engine12” comes to mind…
12 Page 155, Coders at Work
13 If I see another “Foo.__init__ got unexpected keyword argument(s)” error, I swear I’ll…
14 Which is python
15 Either locally or in staging, or in production if you’re not so lucky
16 Of various kinds, not just temporal
17 As illustrated by Mozilla Corporation’s switch to Mozharness
18 Near as I can tell, this happened ironically around the time Mozilla Corporation switched their release engineering infrastructure to it.