Incompetent Build Master is [Still] Incompetent
Well, that apparently struck a nerve1.
Much has been said about those who argue on the Internet2, so I’ll state the obvious: I’m pretty sure my chance of convincing Buildbot’s developers that the tool they work on suffers from ongoingly poor engineering design decisions is about the same of them convincing me to be an advocate for their tool3.
If you don’t have anything fanboy-riffic to say… |
Let me start by saying I’m a bit disappointed in the Buildbot community: Amber Yust starts by characterizing my post as a “rant,” presumably to cast doubt on its credibility, before providing a single counterpoint.
The project’s current maintainer, Dustin Mitchell, called me a “curmudgeon” and purported to possess some definitive knowledge into when I last used Buildbot; I can guess where he got this notion, but it’s incorrect4.
All this after Buildbot developers called me “incompetent“5, and responded to points I never made, but to anyone who had assumed I had made them, it sure sounded like I was suggesting absurdities.
After the name calling was out of the way, every person who said anything6 did, at some point, intimate “Well, this guy has a point.”7
So before I respond to some of the specific points, it’s important to note that the Buildbot community’s response to criticism they eventually agreed with at least parts of was to level personal mischaracterizations and verbally throw sand? I wonder if there’s a coincidence between this sort of apparently-acceptable behavior and the “Developer Recruitment” agenda line item on their upcoming summit…
That said:
- I am using an older version of Buildbot8; I haven’t invested much time looking into future revisions, since core design failures—the always-connected-slave problem, for instance—have yet to be addressed; right now, if I had that time, it would be utilized to replace Buildbot.
- With respect to features that are supposedly in later versions, I was very careful about the missing features I chose; I understand all software is a “work in progress,” but I picked those that I felt their lack of inclusion in any shipping version of a capable continuous integration system to be egregious.
- My complaint about properties was poorly explained; I’ll use one of my patented footnotes, in what may be the longest yet in the history of this blog, to rectify this9.
- Yust claims that Buildbot does not “force” you to describe a build process in its venerable master.cfgs; she’s right, and I didn’t claim otherwise. I said it happily leads you to its “encode your entire build process in my non-portable master.cfg“-trough and dares you to drink. The sample code nudges you in this direction, and one need only look at Mozilla Corporation’s current11 effort to undo this costly mistake. I have yet to see a Buildbot deployment that hasn’t initially made this mistake, which makes sense: people copy the sample code and modify it.
- The suggestion to run Buildbot on port 80 largely misses the point that dealing with large, corporate IT departments often involves working with rules that don’t make immediate sense. Requiring custom ports running unaudited software to be open to the world is usually a non-starter.12,13
- Regarding the list of “It’s On the List”s14, frankly “I don’t care.”
Why?
The amount of time organizations and companies have wasted re-running builds that had to be restarted due to this crazy assumption of a bulletproof network connection is a design error of Challenger-esque proportions.
For any engineer to say with a straight face “Well yah, it’s a problem,” and in the same breath say “it’s on a list,” and then not have any movement on it IN SEVEN YEARS is completely inexcusable from any reasonable engineering perspective.
The fact that there is such a list of what are, in some respects, similar calibers of engineering design failures, that are apparently on some list somewhere to eventually correct someday when maybe there’s time lends credence to my point that Buildbot advertising itself as a production continuous integration system should be reconsidered.
It is Yust’s conclusion, however, that really gets to the heart of the matter: “Buildbot is not a just-add-water CI server.”
I wholeheartedly agree.
The problem is: either through the (original?) authors’ claims, the verbiage on the website, the documentation, the way the community positions the tool, or some other factor, this is not the notion would-be users walk away with.
Buildbot was originally sold to me as “A better Tinderbox, with a bustling community.” This was right before Brian Warner all but abandoned it, which was fitting, since the other statement didn’t turn out to be particularly true either.
As a build/release engineer who has plenty of other things to be doing to support his organization’s software development and QA teams I have ZERO INTEREST in customizing, tweaking, and effectively dumbing down a “distributed build manager” to be, in Yust’s words, “just a [continuous integration] server,” which is all I need it to be.
Compound that with these whacky, mostly-unstated requirements of a perfect network under ideal intra-company/project group relationships, and Buildbot is a poor choice that wastes a bunch of time that many release engineering teams, mine included, do not have.
If the Buildbot community’s position is “Buildbot is not a continuous integration server that usable for standard software projects, in the real world, under real networking and systems conditions,” and is intended to be some redesigned distcc offering, just say that. Stop letting engineering managers and VPs think otherwise15.
And if your answers to your users’ complaints are always “Well, Buildbot wasn’t designed to do that,” as Brian Warner reportedly stated, then you, Buildbot community, need to do a better job of clarifying what, exactly, Buildbot is designed to be competent at.
(Bonus points for attempting that without the name calling or wholesale mischaracterizations.)
In either event, as a stressed, time-constrained release engineer, I need a “set it and forget it”-solution17 that gets my builds out the door without requiring me to deal with a lot of weird software dependencies, mysterious, unexplained assumptions, and long-lingering, still-uncorrected, painful design decisions.
And based on Yust’s own statements, one thing we apparently agree on: the system best suited for that purpose is not Buildbot.
_______________
1 I had planned to link to two blog posts, since it was implied another would be written, but I haven’t seen it appear anywhere yet…
2 Googling that phrase is left as an exercise for the reader
3 At least at this point in time
4 And irrelevant to any cogent argument
5 Points go to “exarkun” for the money shot quotation: “The guy is probably incompetent, most guys are.”
6 Save Dustin Mitchell
7 Which point people agreed I had, however, wasn’t consistent, which is fine; I’ll take what I can get…
8 0.7.8, to be exact
9 I’m really conflating runtime evaluation of code and properties; my complaint is that properties shouldn’t be necessary, but are if you want to control certain elements of the build process. This requires adding a step, and often the contents of that step is echoing the output of a command, which is also somewhat asinine. My particular problem was I wanted to set a port value that a tool picks up from the environment; I had code that changed this port based on the type of build and some other attributes, and set it in the environment for the step by providing a hash; unfortunately, the port number never changed, and it took a few hours to figure out that Buildbot doesn’t support this common use case without setting properties; this is a poor, counterintuitive design when you’re allowed to write and integrate your own Python modules into Buildbot.10
10 While I’m correcting errors, I kept referring to “twistd,” which is Twisted’s cute name for its daemon; everywhere I wrote twistd, I was referring to Twisted.
11 Quite painful, from what I hear
12 If you ever wondered why Mozilla Corporation’s Buildbot masters aren’t open to the world, that’s why; incidentally, that was one of the last deployment discussions I was involved in, and at the time, everyone agreed with me.
13 And no Mook, I don’t think the transport layer should be SMTP, though I always did admire that resiliency of that particular Tinderbox data transport design decision; I would be happy with what the rest of the world uses these days: HTTP.
14 Or “the points I made that they discovered they maybe sorta kinda agreed with after getting the personal attacks out of the way and reading what I had written
15 Assuming, of course, they’re not planning on hiring Buildbot’s project maintainers16
16 Who are apparently already spoken for…
17 Does RonCo do software?
hey preed. so what’s the replacement. (i really don’t have any involvement or experience with this tool, but having read to the end of this entertaining post, I was sort of let down that there wasn’t a “and that’s why I’m moving to X” conclusion.)
Asa,
See my previous post on the subject, specifically here.
I’d be curious to know where this notion that one has to have a suggestion of alternatives before expecting their specific, concrete criticisms (which, I will point out again, many of Buildbot’s own developers share) to be taken seriously.
It seems like there’s been a push in the industry towards this model of “If you don’t have a solution, you can’t complain,” which seems to me to not only be misguided, but just a method people who aren’t interested in hearing any (valid) criticisms use to shut people (with valid criticism) up.
preed, I wasn’t suggesting your complaints weren’t OK without a solution. I was assuming you’d given up hope and were about to tell us all about the better world elsewhere.
Asa,
My misunderstanding, then.
As I mentioned in this post, for resource-constrained teams, it’s very difficult to move to another system wholesale; I would never [advocate to] jeopardize the schedule of a release just so that continuous integration systems could be switched. Unfortunately, whenever I do need to make a change involving Buildbot, I’ve learned to inflate my estimates by a healthy multiplier, due mostly to the design inconsistencies I’ve discussed.
Footnote 3 in this post is very relevant: if Amber Yust (and, indeed, the Buildbot community) consider their system to be a platform on which one would create a continuous integration system, then it would be interesting if they were to consider a “BuildbotLite” or “BuildMiniBot” or some such, with much stricter integration points (and constraints).
If Buildbot’s community were able to solve some of the longstanding problems even they agree with (the builds-failing-when-a-slave-can’t-talk-to-its-master being the foremost among those), and were able to provide a version that, to use Yust’s words, is a “Just-add-water CI server,” I would be entirely happy to reconsider it.
FWIW, I did try to buy into Buildbot’s design in the beginning, and it certainly has some aspects that are better than, for instance, Tinderbox. But after being burned time after time after time, I couldn’t, in good conscience, ignore these design defects and propagate the “Buildbot is wonderful and has no problems”-line.
As my initial blog post said, and indeed what was most surprising to me was, when I brought it up to others who’d worked with Buildbot, they surprisingly tended to agree with me more often than not.
In some sense, that’s what these posts are about: pointing out that there’s a lot of fan-person-dom about Buildbot, but there are some really important drawbacks, and to blindly switch to it within an environment based on its reputation has, apparently, bitten a lot of people. Some organizations have the manpower and time to tend to these wounds; for those who don’t, it leaves a nasty scar.
I have a question: Would you still be so vehemently against Buildbot if it didn’t require persistent connections.
Hey Ben,
I believe the persistent connection issue is the most egregious of the poor design decisions, but I discuss some of the other items I consider to be bad in more depth in the previous post.
I spoke a little bit to what I’d like to see before seriously reevaluating Buildbot in my response to Asa above.
Frankly, I would also find it difficult to rely on a tool so core to my particular role when it is maintained by a community who apparently thinks it’s acceptable behavior to respond to issues others have either defensively (“That’s not what Buildbot was designed for!!”) or to level ad hominem attacks against them.
This may explain why it’s taken years to solve problems even the community itself agrees are problems.
For what it’s worth, I don’t generally consider “rant” a derogatory term – I use it to describe any act of expressing gripes about a particular thing. It’s perfectly credible (in my usage) to rant about something!
Amber,
Someone recently told me a saying used at their office, which I liked: it’s whining if you’re not goin to do anything about it and ranting if you are.
So I suppose under that criteria, this is a whine.
Having said that, I don’t think my assessment of your use of “rant” as a derogatory statement,(ultimately irrelevant to your argument) is unreasonable, given where you and your colleagues immediately went in IRC; maybe if the channel’s initial response hadn’t been a bunch of personal attacks, I wouldn’t have assessed the use of “rant” so negatively, and more as it was intended.