Tuesday, April 3rd, 2012
My first experience with release engineering was almost fifteen years ago: I did a stint with Netscape’s release engineering team for a summer. I know I didn’t quite get why at the time, but I was hooked immediately.
My professional focus has been on build/release engineering ever since.
At various times, it’s been a difficult road to walk. I truly believe release engineering and configuration management to be an incredibly important part of the software development process, that when done right, can provide an incredible amount of value, from the smallest startups to the largest multinational corporations.
But “RelEng” is often the last part of the process to be designed, and the last part of the team to be filled. It’s often under-resourced, and at that point, it’s usually interrupt driven, (by software that needs shipping). It’s also difficult to change this trajectory, because releng is one of those things that is generally unseen and unheard, until they (or the infrastructure they’re responsible for) become the blocking factor to a ship-date.
| |
 How can we approach complex and obstacle-ridden processes safely and successfully every time? |
I think there’s a better way to approach these problems.
Friends and colleagues often giggle at me when I make repeated references to aviation in the context of release engineering. Being a pilot, I can’t help but call out the similarities as I see them. But I see so many similarities because I really do feel there is an aspect to that nature of work—whether it be air traffic control, 9-1-1 dispatch, or management of the utility grid—that we, as software engineering practitioners, can learn from and apply to make shipping software more simple, transparent, and predictable.
To that end, today, I’m launching a company: Release Engineering Approaches.
We provide a full range of build/release engineering consulting services including release engineering consulting, build engineering automation & tooling, and DevOps infrastructure support.
Our approach is to integrate those “operational” lessons and practices that so many other industries have (often painfully) learned, to make shipping whatever type of software you need to ship… as simple as possible. Every time.
If consistently and simply shipping your bits are something you struggle with, give us a shout out and let’s discuss how we can help!
To new approaches!
aviation, continuous integration, organizational interactions, processes, release engineering, strategy
Preed on Build/Release Engineering, Releases, Releng Machinery, personal | 4 Comments »
V1047
Saturday, December 4th, 2010
Well, that apparently struck a nerve1.
Much has been said about those who argue on the Internet2, so I’ll state the obvious: I’m pretty sure my chance of convincing Buildbot’s developers that the tool they work on suffers from ongoingly poor engineering design decisions is about the same of them convincing me to be an advocate for their tool3.
 If you don’t have anything fanboy-riffic to say… |
Let me start by saying I’m a bit disappointed in the Buildbot community: Amber Yust starts by characterizing my post as a “rant,” presumably to cast doubt on its credibility, before providing a single counterpoint.
The project’s current maintainer, Dustin Mitchell, called me a “curmudgeon” and purported to possess some definitive knowledge into when I last used Buildbot; I can guess where he got this notion, but it’s incorrect4.
All this after Buildbot developers called me “incompetent“5, and responded to points I never made, but to anyone who had assumed I had made them, it sure sounded like I was suggesting absurdities.
After the name calling was out of the way, every person who said anything6 did, at some point, intimate “Well, this guy has a point.”7
So before I respond to some of the specific points, it’s important to note that the Buildbot community’s response to criticism they eventually agreed with at least parts of was to level personal mischaracterizations and verbally throw sand? I wonder if there’s a coincidence between this sort of apparently-acceptable behavior and the “Developer Recruitment” agenda line item on their upcoming summit…
That said:
- I am using an older version of Buildbot8; I haven’t invested much time looking into future revisions, since core design failures—the always-connected-slave problem, for instance—have yet to be addressed; right now, if I had that time, it would be utilized to replace Buildbot.
- With respect to features that are supposedly in later versions, I was very careful about the missing features I chose; I understand all software is a “work in progress,” but I picked those that I felt their lack of inclusion in any shipping version of a capable continuous integration system to be egregious.
- My complaint about properties was poorly explained; I’ll use one of my patented footnotes, in what may be the longest yet in the history of this blog, to rectify this9.
- Yust claims that Buildbot does not “force” you to describe a build process in its venerable master.cfgs; she’s right, and I didn’t claim otherwise. I said it happily leads you to its “encode your entire build process in my non-portable master.cfg“-trough and dares you to drink. The sample code nudges you in this direction, and one need only look at Mozilla Corporation’s current11 effort to undo this costly mistake. I have yet to see a Buildbot deployment that hasn’t initially made this mistake, which makes sense: people copy the sample code and modify it.
- The suggestion to run Buildbot on port 80 largely misses the point that dealing with large, corporate IT departments often involves working with rules that don’t make immediate sense. Requiring custom ports running unaudited software to be open to the world is usually a non-starter.12,13
- Regarding the list of “It’s On the List”s14, frankly “I don’t care.”
Why?
The amount of time organizations and companies have wasted re-running builds that had to be restarted due to this crazy assumption of a bulletproof network connection is a design error of Challenger-esque proportions.
For any engineer to say with a straight face “Well yah, it’s a problem,” and in the same breath say “it’s on a list,” and then not have any movement on it IN SEVEN YEARS is completely inexcusable from any reasonable engineering perspective.
The fact that there is such a list of what are, in some respects, similar calibers of engineering design failures, that are apparently on some list somewhere to eventually correct someday when maybe there’s time lends credence to my point that Buildbot advertising itself as a production continuous integration system should be reconsidered.
It is Yust’s conclusion, however, that really gets to the heart of the matter: “Buildbot is not a just-add-water CI server.”
I wholeheartedly agree.
The problem is: either through the (original?) authors’ claims, the verbiage on the website, the documentation, the way the community positions the tool, or some other factor, this is not the notion would-be users walk away with.
Buildbot was originally sold to me as “A better Tinderbox, with a bustling community.” This was right before Brian Warner all but abandoned it, which was fitting, since the other statement didn’t turn out to be particularly true either.
As a build/release engineer who has plenty of other things to be doing to support his organization’s software development and QA teams I have ZERO INTEREST in customizing, tweaking, and effectively dumbing down a “distributed build manager” to be, in Yust’s words, “just a [continuous integration] server,” which is all I need it to be.
Compound that with these whacky, mostly-unstated requirements of a perfect network under ideal intra-company/project group relationships, and Buildbot is a poor choice that wastes a bunch of time that many release engineering teams, mine included, do not have.
If the Buildbot community’s position is “Buildbot is not a continuous integration server that usable for standard software projects, in the real world, under real networking and systems conditions,” and is intended to be some redesigned distcc offering, just say that. Stop letting engineering managers and VPs think otherwise15.
And if your answers to your users’ complaints are always “Well, Buildbot wasn’t designed to do that,” as Brian Warner reportedly stated, then you, Buildbot community, need to do a better job of clarifying what, exactly, Buildbot is designed to be competent at.
(Bonus points for attempting that without the name calling or wholesale mischaracterizations.)
In either event, as a stressed, time-constrained release engineer, I need a “set it and forget it”-solution17 that gets my builds out the door without requiring me to deal with a lot of weird software dependencies, mysterious, unexplained assumptions, and long-lingering, still-uncorrected, painful design decisions.
And based on Yust’s own statements, one thing we apparently agree on: the system best suited for that purpose is not Buildbot.
_______________
1 I had planned to link to two blog posts, since it was implied another would be written, but I haven’t seen it appear anywhere yet…
2 Googling that phrase is left as an exercise for the reader
3 At least at this point in time
4 And irrelevant to any cogent argument
5 Points go to “exarkun” for the money shot quotation: “The guy is probably incompetent, most guys are.”
6 Save Dustin Mitchell
7 Which point people agreed I had, however, wasn’t consistent, which is fine; I’ll take what I can get…
8 0.7.8, to be exact
9 I’m really conflating runtime evaluation of code and properties; my complaint is that properties shouldn’t be necessary, but are if you want to control certain elements of the build process. This requires adding a step, and often the contents of that step is echoing the output of a command, which is also somewhat asinine. My particular problem was I wanted to set a port value that a tool picks up from the environment; I had code that changed this port based on the type of build and some other attributes, and set it in the environment for the step by providing a hash; unfortunately, the port number never changed, and it took a few hours to figure out that Buildbot doesn’t support this common use case without setting properties; this is a poor, counterintuitive design when you’re allowed to write and integrate your own Python modules into Buildbot.10
10 While I’m correcting errors, I kept referring to “twistd,” which is Twisted’s cute name for its daemon; everywhere I wrote twistd, I was referring to Twisted.
11 Quite painful, from what I hear
12 If you ever wondered why Mozilla Corporation’s Buildbot masters aren’t open to the world, that’s why; incidentally, that was one of the last deployment discussions I was involved in, and at the time, everyone agreed with me.
13 And no Mook, I don’t think the transport layer should be SMTP, though I always did admire that resiliency of that particular Tinderbox data transport design decision; I would be happy with what the rest of the world uses these days: HTTP.
14 Or “the points I made that they discovered they maybe sorta kinda agreed with after getting the personal attacks out of the way and reading what I had written
15 Assuming, of course, they’re not planning on hiring Buildbot’s project maintainers16
16 Who are apparently already spoken for…
17 Does RonCo do software?
buildbot, continuous integration, tools
Preed on Build/Release Engineering, Releng Machinery | 8 Comments »
V589
Wednesday, November 24th, 2010
Buildbot is the worst continuous integration system I have ever used.
There, I said it.
Now, I will admit: my sample size isn’t huge, but my trials with the tool has, on many an occasion, had me pining for Tinderbox11, which to those who’ve used it, is probably saying a lot.
I had always assumed my experience with Buildbot was largely a function of the method by which it was crammed down rolled out . Besides, Buildbot seemingly had a large, and reportedly happy community of users, a reason constantly parroted as a justification for the ill-considered deployment.
But as I ranted over the years2 about Buildbot’s (many) shortcomings, the echoes I got in return of other developers and users who… well… frankly hated their lives every time they had to interact with Buildbot surprised me.
My complaints fall into three general buckets:
- Buildbot’s unfortunate and stifling reliance on twistd; there isn’t a problem Buildbot doesn’t think the twistd hammer is large enough to bludgeon:
- Each slave must be constantly connected to the master; if for any reason that connection is interrupted IN THE SLIGHTEST, your build will die. It is difficult to put into words how asinine and ill-considered this fundamental design flaw is, and how many hours of time it has wasted for developers, release engineers, and organizations using it.
- (Due to the above) Buildbot’s bandwidth usage is incredibly high; it may not matter if your Buildbot master and slaves are across a switch, but across the Internet can be more problematic3.
- Using the same thread for build management and status reporting: if you want to use Buildbot’s status reporting4, it relies on twistd’s anemic web server. If Google decides to index you, this can cause your builds to fail5.
- Speaking of this “web” “server” and ignoring the lack of-flexibility and -scalability, I really wish I could password protect the administration interface: I know we’re all friends, but spam bots kicking off builds? I fail to understand why any author of a continuous integration system would consider that acceptable.
- This twistd obsession requires users6 to install a number of libraries7 on all of their build machines. It’s annoying on Linux and Mac. You’ll want to make sure you have a noose handy if you have to install it on that “other” operating system.
- Buildbot’s design is conceptually inconsistent and counterintuitive, making developing code to interact with it incredibly tedious and annoying:
- Buildbot provides sample code that builds (toy) GNU-style ./configure && make && make install8 apps, attempting to illustrate its ease. In reality, Buildbot makes projects trivial, cookie-cutter projects marginally easy to handle; anything mildly complicated or not considered by its designers borders on frustratingly impossible.
- Buildbot makes multi-repository checkouts9 a nightmare to manage.
- Build properties are cumbersome and mostly-useless, but the only way Buildbot allows you to express certain build-time differences; there’s no runtime evaluation to allow dynamic setting of a property, and I have to write scripts that emit the property? Huh?
- If you integrate your own Python code directly with Buildbot10, buildbot reload won’t pick up code changes in your modules; the only way to do that is a buildbot restart. Oh, you had a four-hour build that was almost done? Too bad.
- With its ISchedulers and IUpstreamSchedulers and ISourceStamps this code reads more like someone trying to make their program use every example in the venerable design patterns book11 than solving real problems. It wouldn’t be such an annoyance if it didn’t get in the way of doing real work…
- And what is this obsession with ensuring objects only contain the things Buildbot wants them to contain13? It’s like drawing up this beautiful UML diagram of your design patterns-approved program, and then purposefully disallowing developers to create useful subclasses using it.
- In what is the worst of Buildbot’s design quirks: Buildbot blurs the lines of “continuous integration system” and “build harness/system” and leads your entire build system happily down a path of lock in. Buildbot prompts you to put your entire automation configuration and process into its master.cfg’s, and express those processes in a meta-language14 that is meaningless outside of Buildbot. It means if you ever want to develop an automation process, you must have available a full master/slave setup15 to do even the tiniest bits of development. Migrating away from this incurs huge costs16 should you make this mistake17.
- If you think I’m the only one who has problems with Buildbot’s strange and wild inconsistencies, one of Buildbot’s poster-child users, Mozilla Corporation, apparently has these problems too.
- Buildbot discourages community participation:
- Requiring random firewall ports to be opened so a slave can talk to a master is a non-starter in many organizations, both for slave admins and master admins.
- Adding a slave to provide continuous integration for your random, under-loved platform now requires concent of a “buildbot master administrator.” It requires access to that master. Tinderbox made this trivial. It’s like Buildbot wants to make it as difficult as possible to let community members help a project with platforms that possibly only a minority of contributors care about.
- Running that random, under-loved integration slave at home? Reference that bit above about all that constant bandwidth! Hope you’re not paying your ISP by the byte…
- For quite awhile, the Buildbot project seemed abandoned18. It now has a project maintainer, but both the original author and the project maintainer work for Mozilla Corporation, If Mozilla Corporation’s handling of XULRunner is any indicator, you’d better want to use Buildbot for exactly the same purposes Mozilla Corporation uses Buildbot for. There’s a large body of evidence to indicate that other uses will be deemphasized and patches for that ignored.
One might ask “Well, ok, so what would you recommend be used?”
I don’t have a clear answer; lots of people swear by Hudson. Tinderbox had its failing, but its children revisions are, in my opinion, even worth investigating over Buildbot.
I’d welcome others’ suggestions.
But based on lessons paid for in frustration, tears, and wasted time I could’ve been working on something productive for the enginering teams I support, I know very clearly which one I would not use.
_______________
1 Whose reputation as the first real continuous integration tool is largely forgotten
2 Often on Twitter
3 And for what? The eye-candy of a moving build lot, which is amusing for all of 30 seconds?
4 Waterfall page, etc.
5 Due to twistd spitting out build logs to the Googlebot instead of reading your build slaves’ sockets
6 Often overworked, under-staffed release engineering teams
7 twistd, twistd-core, zope interface
8 Buildbot thinks every project is so like that, there are even specific buildsteps written JUST. FOR. THAT.
9Increasingly common with the proliferation of distributed version control
10 One actual benefit, but you’ll soon see why it’s not…
11 Brendan Eich’s oft-repeated dig at Gecko’s original authors of “My First Object-Oriented Rendering Engine12” comes to mind…
12 Page 155, Coders at Work
13 If I see another “Foo.__init__ got unexpected keyword argument(s)” error, I swear I’ll…
14 Which is python
15 Either locally or in staging, or in production if you’re not so lucky
16 Of various kinds, not just temporal
17 As illustrated by Mozilla Corporation’s switch to Mozharness
18 Near as I can tell, this happened ironically around the time Mozilla Corporation switched their release engineering infrastructure to it.
buildbot, continuous integration, tools
Releng Machinery | 7 Comments »
V577