Continuous Integration

Release Schedule

May 2020
S M T W T F S
« Nov    
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Branches

Twitter

Simply ship. Every time.

Incompetent Build Master is [Still] Incompetent

12/04/2010

Well, that apparently struck a nerve1.

Much has been said about those who argue on the Internet2, so I’ll state the obvious: I’m pretty sure my chance of convincing Buildbot’s developers that the tool they work on suffers from ongoingly poor engineering design decisions is about the same of them convincing me to be an advocate for their tool3.


If you don’t have anything fanboy-riffic to say…

Let me start by saying I’m a bit disappointed in the Buildbot community: Amber Yust starts by characterizing my post as a “rant,” presumably to cast doubt on its credibility, before providing a single counterpoint.

The project’s current maintainer, Dustin Mitchell, called me a “curmudgeon” and purported to possess some definitive knowledge into when I last used Buildbot; I can guess where he got this notion, but it’s incorrect4.

All this after Buildbot developers called me “incompetent5, and responded to points I never made, but to anyone who had assumed I had made them, it sure sounded like I was suggesting absurdities.

After the name calling was out of the way, every person who said anything6 did, at some point, intimate “Well, this guy has a point.”7

So before I respond to some of the specific points, it’s important to note that the Buildbot community’s response to criticism they eventually agreed with at least parts of was to level personal mischaracterizations and verbally throw sand? I wonder if there’s a coincidence between this sort of apparently-acceptable behavior and the “Developer Recruitment” agenda line item on their upcoming summit

That said:

  1. I am using an older version of Buildbot8; I haven’t invested much time looking into future revisions, since core design failures—the always-connected-slave problem, for instance—have yet to be addressed; right now, if I had that time, it would be utilized to replace Buildbot.
  2. With respect to features that are supposedly in later versions, I was very careful about the missing features I chose; I understand all software is a “work in progress,” but I picked those that I felt their lack of inclusion in any shipping version of a capable continuous integration system to be egregious.
  3. My complaint about properties was poorly explained; I’ll use one of my patented footnotes, in what may be the longest yet in the history of this blog, to rectify this9.
  4. Yust claims that Buildbot does not “force” you to describe a build process in its venerable master.cfgs; she’s right, and I didn’t claim otherwise. I said it happily leads you to its “encode your entire build process in my non-portable master.cfg“-trough and dares you to drink. The sample code nudges you in this direction, and one need only look at Mozilla Corporation’s current11 effort to undo this costly mistake. I have yet to see a Buildbot deployment that hasn’t initially made this mistake, which makes sense: people copy the sample code and modify it.
  5. The suggestion to run Buildbot on port 80 largely misses the point that dealing with large, corporate IT departments often involves working with rules that don’t make immediate sense. Requiring custom ports running unaudited software to be open to the world is usually a non-starter.12,13
  6. Regarding the list of “It’s On the List”s14, frankly “I don’t care.

    Why?

    The amount of time organizations and companies have wasted re-running builds that had to be restarted due to this crazy assumption of a bulletproof network connection is a design error of Challenger-esque proportions.

    For any engineer to say with a straight face “Well yah, it’s a problem,” and in the same breath say “it’s on a list,” and then not have any movement on it IN SEVEN YEARS is completely inexcusable from any reasonable engineering perspective.

    The fact that there is such a list of what are, in some respects, similar calibers of engineering design failures, that are apparently on some list somewhere to eventually correct someday when maybe there’s time lends credence to my point that Buildbot advertising itself as a production continuous integration system should be reconsidered.

It is Yust’s conclusion, however, that really gets to the heart of the matter: “Buildbot is not a just-add-water CI server.

I wholeheartedly agree.

The problem is: either through the (original?) authors’ claims, the verbiage on the website, the documentation, the way the community positions the tool, or some other factor, this is not the notion would-be users walk away with.

Buildbot was originally sold to me as “A better Tinderbox, with a bustling community.” This was right before Brian Warner all but abandoned it, which was fitting, since the other statement didn’t turn out to be particularly true either.

As a build/release engineer who has plenty of other things to be doing to support his organization’s software development and QA teams I have ZERO INTEREST in customizing, tweaking, and effectively dumbing down a “distributed build manager” to be, in Yust’s words, “just a [continuous integration] server,” which is all I need it to be.

Compound that with these whacky, mostly-unstated requirements of a perfect network under ideal intra-company/project group relationships, and Buildbot is a poor choice that wastes a bunch of time that many release engineering teams, mine included, do not have.

If the Buildbot community’s position is “Buildbot is not a continuous integration server that usable for standard software projects, in the real world, under real networking and systems conditions,” and is intended to be some redesigned distcc offering, just say that. Stop letting engineering managers and VPs think otherwise15.

And if your answers to your users’ complaints are always “Well, Buildbot wasn’t designed to do that,” as Brian Warner reportedly stated, then you, Buildbot community, need to do a better job of clarifying what, exactly, Buildbot is designed to be competent at.

(Bonus points for attempting that without the name calling or wholesale mischaracterizations.)

In either event, as a stressed, time-constrained release engineer, I need a “set it and forget it”-solution17 that gets my builds out the door without requiring me to deal with a lot of weird software dependencies, mysterious, unexplained assumptions, and long-lingering, still-uncorrected, painful design decisions.

And based on Yust’s own statements, one thing we apparently agree on: the system best suited for that purpose is not Buildbot.

_______________
1 I had planned to link to two blog posts, since it was implied another would be written, but I haven’t seen it appear anywhere yet…
2 Googling that phrase is left as an exercise for the reader
3 At least at this point in time
4 And irrelevant to any cogent argument
5 Points go to “exarkun” for the money shot quotation: “The guy is probably incompetent, most guys are.”
6 Save Dustin Mitchell
7 Which point people agreed I had, however, wasn’t consistent, which is fine; I’ll take what I can get…
8 0.7.8, to be exact
9 I’m really conflating runtime evaluation of code and properties; my complaint is that properties shouldn’t be necessary, but are if you want to control certain elements of the build process. This requires adding a step, and often the contents of that step is echoing the output of a command, which is also somewhat asinine. My particular problem was I wanted to set a port value that a tool picks up from the environment; I had code that changed this port based on the type of build and some other attributes, and set it in the environment for the step by providing a hash; unfortunately, the port number never changed, and it took a few hours to figure out that Buildbot doesn’t support this common use case without setting properties; this is a poor, counterintuitive design when you’re allowed to write and integrate your own Python modules into Buildbot.10
10 While I’m correcting errors, I kept referring to “twistd,” which is Twisted’s cute name for its daemon; everywhere I wrote twistd, I was referring to Twisted.
11 Quite painful, from what I hear
12 If you ever wondered why Mozilla Corporation’s Buildbot masters aren’t open to the world, that’s why; incidentally, that was one of the last deployment discussions I was involved in, and at the time, everyone agreed with me.
13 And no Mook, I don’t think the transport layer should be SMTP, though I always did admire that resiliency of that particular Tinderbox data transport design decision; I would be happy with what the rest of the world uses these days: HTTP.
14 Or “the points I made that they discovered they maybe sorta kinda agreed with after getting the personal attacks out of the way and reading what I had written
15 Assuming, of course, they’re not planning on hiring Buildbot’s project maintainers16
16 Who are apparently already spoken for…
17 Does RonCo do software?

The Incompetent Build Master

11/24/2010

Buildbot is the worst continuous integration system I have ever used.

There, I said it.

Now, I will admit: my sample size isn’t huge, but my trials with the tool has, on many an occasion, had me pining for Tinderbox11, which to those who’ve used it, is probably saying a lot.

I had always assumed my experience with Buildbot was largely a function of the method by which it was crammed down rolled out . Besides, Buildbot seemingly had a large, and reportedly happy community of users, a reason constantly parroted as a justification for the ill-considered deployment.

But as I ranted over the years2 about Buildbot’s (many) shortcomings, the echoes I got in return of other developers and users who… well… frankly hated their lives every time they had to interact with Buildbot surprised me.

My complaints fall into three general buckets:

  1. Buildbot’s unfortunate and stifling reliance on twistd; there isn’t a problem Buildbot doesn’t think the twistd hammer is large enough to bludgeon:

    • Each slave must be constantly connected to the master; if for any reason that connection is interrupted IN THE SLIGHTEST, your build will die. It is difficult to put into words how asinine and ill-considered this fundamental design flaw is, and how many hours of time it has wasted for developers, release engineers, and organizations using it.
    • (Due to the above) Buildbot’s bandwidth usage is incredibly high; it may not matter if your Buildbot master and slaves are across a switch, but across the Internet can be more problematic3.
    • Using the same thread for build management and status reporting: if you want to use Buildbot’s status reporting4, it relies on twistd’s anemic web server. If Google decides to index you, this can cause your builds to fail5.
    • Speaking of this “web” “server” and ignoring the lack of-flexibility and -scalability, I really wish I could password protect the administration interface: I know we’re all friends, but spam bots kicking off builds? I fail to understand why any author of a continuous integration system would consider that acceptable.
    • This twistd obsession requires users6 to install a number of libraries7 on all of their build machines. It’s annoying on Linux and Mac. You’ll want to make sure you have a noose handy if you have to install it on that “other” operating system.
  2. Buildbot’s design is conceptually inconsistent and counterintuitive, making developing code to interact with it incredibly tedious and annoying:
    • Buildbot provides sample code that builds (toy) GNU-style ./configure && make && make install8 apps, attempting to illustrate its ease. In reality, Buildbot makes projects trivial, cookie-cutter projects marginally easy to handle; anything mildly complicated or not considered by its designers borders on frustratingly impossible.
    • Buildbot makes multi-repository checkouts9 a nightmare to manage.
    • Build properties are cumbersome and mostly-useless, but the only way Buildbot allows you to express certain build-time differences; there’s no runtime evaluation to allow dynamic setting of a property, and I have to write scripts that emit the property? Huh?
    • If you integrate your own Python code directly with Buildbot10, buildbot reload won’t pick up code changes in your modules; the only way to do that is a buildbot restart. Oh, you had a four-hour build that was almost done? Too bad.
    • With its ISchedulers and IUpstreamSchedulers and ISourceStamps this code reads more like someone trying to make their program use every example in the venerable design patterns book11 than solving real problems. It wouldn’t be such an annoyance if it didn’t get in the way of doing real work…
    • And what is this obsession with ensuring objects only contain the things Buildbot wants them to contain13? It’s like drawing up this beautiful UML diagram of your design patterns-approved program, and then purposefully disallowing developers to create useful subclasses using it.
    • In what is the worst of Buildbot’s design quirks: Buildbot blurs the lines of “continuous integration system” and “build harness/system” and leads your entire build system happily down a path of lock in. Buildbot prompts you to put your entire automation configuration and process into its master.cfg‘s, and express those processes in a meta-language14 that is meaningless outside of Buildbot. It means if you ever want to develop an automation process, you must have available a full master/slave setup15 to do even the tiniest bits of development. Migrating away from this incurs huge costs16 should you make this mistake17.
    • If you think I’m the only one who has problems with Buildbot’s strange and wild inconsistencies, one of Buildbot’s poster-child users, Mozilla Corporation, apparently has these problems too.
  3. Buildbot discourages community participation:
    • Requiring random firewall ports to be opened so a slave can talk to a master is a non-starter in many organizations, both for slave admins and master admins.
    • Adding a slave to provide continuous integration for your random, under-loved platform now requires consent of a “buildbot master administrator.” It requires access to that master. Tinderbox made this trivial. It’s like Buildbot wants to make it as difficult as possible to let community members help a project with platforms that possibly only a minority of contributors care about.
    • Running that random, under-loved integration slave at home? Reference that bit above about all that constant bandwidth! Hope you’re not paying your ISP by the byte…
    • For quite awhile, the Buildbot project seemed abandoned18. It now has a project maintainer, but both the original author and the project maintainer work for Mozilla Corporation, If Mozilla Corporation’s handling of XULRunner is any indicator, you’d better want to use Buildbot for exactly the same purposes Mozilla Corporation uses Buildbot for. There’s a large body of evidence to indicate that other uses will be deemphasized and patches for that ignored.

One might ask “Well, ok, so what would you recommend be used?”

I don’t have a clear answer; lots of people swear by Hudson. Tinderbox had its failing, but its children revisions are, in my opinion, even worth investigating over Buildbot.

I’d welcome others’ suggestions.

But based on lessons paid for in frustration, tears, and wasted time I could’ve been working on something productive for the enginering teams I support, I know very clearly which one I would not use.

_______________
1 Whose reputation as the first real continuous integration tool is largely forgotten
2 Often on Twitter
3 And for what? The eye-candy of a moving build lot, which is amusing for all of 30 seconds?
4 Waterfall page, etc.
5 Due to twistd spitting out build logs to the Googlebot instead of reading your build slaves’ sockets
6 Often overworked, under-staffed release engineering teams
7 twistd, twistd-core, zope interface
8 Buildbot thinks every project is so like that, there are even specific buildsteps written JUST. FOR. THAT.
9Increasingly common with the proliferation of distributed version control
10 One actual benefit, but you’ll soon see why it’s not…
11 Brendan Eich’s oft-repeated dig at Gecko’s original authors of “My First Object-Oriented Rendering Engine12” comes to mind…
12 Page 155, Coders at Work
13 If I see another “Foo.__init__ got unexpected keyword argument(s)” error, I swear I’ll…
14 Which is python
15 Either locally or in staging, or in production if you’re not so lucky
16 Of various kinds, not just temporal
17 As illustrated by Mozilla Corporation’s switch to Mozharness
18 Near as I can tell, this happened ironically around the time Mozilla Corporation switched their release engineering infrastructure to it.

Grand Decentral Station

11/16/2010

Github apparently had an outage over the weekend.

The Twittersphere was all… well… atwitter1 about it, so much so that you would’ve thought the Internet’s Subversion repository had gone down.

All I could keep coming back to every time I read a tweet complaining about the outage was: I thought the argument the big distributed version control proponents always made such a huge deal out of was the complete lack of a “central server” or “authority”.

“With git and mercurial2, developers are free of such antiquated shackles” is what I’ve been told over and over again.

But when the “hub” of “git” went down, a central code repository turns out to be a necessary link in the chain to getting one’s work done with distributed version control?

Color me confused.

Or maybe not.

In all of the distributed version control deployments I’ve seen, developers still rely on a centralized repository: whether it’s github, mozilla-central3, or Linus’ laptop4, if you don’t intend on making sure that flight has wifi, so people can keep pulling from you, someone other-than-you has to be the authoritative source of your source.

What is especially interesting is these deployments usually take it a step further, with online (centralized) personal repositories for actual sharing and a separate repository that is the “official” (central!) repository-of-record5.

So anyone claiming that decentralized version control totally negates the need for a central hub is trying to sell you something.

And anyone who really believes that illusion should learn the words the Simon and Garfunkle classic “I Am a Rock” because without anywhere to push, that fancy distributed version control system repository sitting on your machine is just another nameless, impossible-to-find island.

_______________
1 Atweeted?
2 And monotone and bazaar and…
3 They’re not even trying to hide the non-distributness of the repository!
4 Or wherever you’re supposed to pull from these days to figure out what the heck the authoritative kernel source is…
5 For the people who, y’know, don’t care about every feature, bug, and blue-sky branch of every developer involved in the project

The Cost of Failing to Brief Your Approach

10/28/2010

One of the blogs I follow, the Aviation Mentor, recently wrote a fascinating post about the Oakland VORTAC1.

For the av-geeks in the crowd, I definitely recommend the couple of minutes’ reading.

For those not here for the av-geekery, there was some road construction2 done at the Oakland airport about six years ago. Since that time, certain portions of the navigation signal provided by the VORTAC have been unreliable and/or unusable. The FAA has been trying to fix the problem ever since then, but each “solution” has caused another problem.

For instance, they recently “dopperlized”3 the VORTAC to improve accuracy and try to make the previously-unusable radials usable again. But that caused the navigation beacon to be unusable for high-altitude operations4

This whole debacle may seem familiar to software engineers: someone or some project spec requires a change in the way the system was designed, and all sorts of unintended consequences, of varying degrees, result.

Sometimes, they’re not all even known when the initial change occurs, since they are either too subtle or disjoint to be immediately noticed, or because they’re caused later by a patch-to-a-patch-to-a-patch solution to solve a problem that was initially created by this forced change.

Configuration management really only exists to analyze and effectively manage change, so I’m not arguing that we should never make any changes to the systems on which we work. But this story is a great illustration of the myriad unintended consequences of failing to adequately study, weigh, and plan that change, and then executing it in a non-haphazard way, so as to minimize those “gotchas!”

The idea that we make the time time to perform this analysis has seemingly become unpopular in recent years: it often gets characterized as “being the enemy of the good” or “stop energy.”

Like most carpenters, I’ve never understood why “measure twice, cut once” is demonized so. Maybe it’s because there’s an assumption that we, as software engineers, have an infinite amount of “wood” available.

Back to the Oakland VORTAC, investigations continue, but the unintended consequences of this probably-unnecessary and arguably-illegal initial construction have turned out to be many: various approach and departure procedures are no longer available unless you’re flying GPS-equipped aircraft, which apparently many cargo aircraft are not.

This causes many FedEx and UPS heavy jets to be required to fly over communities during early morning and late night hours, until they are high enough to receive vectors on course from air traffic control, when before they could have flown a defined departure procedure over the Bay.

In some cases, even that won’t help, and the procedure just isn’t available. As the blog post above mentions, this caused him and his student to be stranded, and they weren’t even trying to land in Oakland! But—you guessed it—the approach to that airport was unavailable because the missed approach segment was defined by (drum roll) the Oakland VORTAC.

All of this consternation because someone with a personal profit motive rammed through the addition of a $100 million “feature” that was tacked onto without consideration to its impacts to users.

In some sense, it should be comforting that it turns out this story is not only age-old, but can be observed in fields other than software development… but it isn’t.

_______________
1 And here, you thought aviation navigational aids were boring!
2 Certainly of ethically questionable nature; the FBI investigated it determine whether it was also legally questionable
3 I learned a new term today!
4 Which is important, since OAK defines six Jet airways.

The L Word[s]

10/01/2010

Today marked the end of an air traffic control phraseological era: the clearance instruction “position and hold,” used when controllers want a plane to enter the runway environment but hold for the takeoff clearance, is to be used no longer.

Most people probably wonder what the big deal about a few changed words is.

It’s important because when pilots and controllers talk to each other, they’re often not actually listening to specific words.


Particularly skilled use of holding instructions at KSFO.

When most non-pilots listen to their first snippets of air traffic control, they tell me they have trouble understanding what is being said. Aviation radios use amplitude modulation, so the tonal quality of the human voice is often reduced.

Because of this, pilots and controllers learn early in training to communicate through a set of known set of words and phrases. Often times, what with the workload of flying the plane and a noisy cockpit, a pilot may only hear part of what was said, but can still decipher what was meant by listening for those key phrases, much like the human brain really only needs the right letters at the beginning and end of words to figure the word out.

That’s why changing phrases is a big deal.

What’s it being replaced with, you ask?

“Line up and wait,” which is the ICAO-standard phrase for that particular clearance instruction.

The reasoning behind changing it was to make the National Airspace System more “compatible” with the rest of the world1, and since being cleared onto and active runway is, y’know, one of the more important instructions, standardization is relevant.

(There was also an argument floated that it was easy to confuse “position and hold” with “position and roll,” but I don’t know that I buy this argument; a clearance of “position and roll” makes little operational sense, either for a pilot or a controller.)

It’s interesting to note that while the phrase is changing, the phrase notifying pilots approaching a runway that traffic is on it remains “traffic holding in position,” not “traffic lined up and waiting,” which seems an odd inconsistency.

It’s also been interesting to watch this change be announced, debated, and implemented. Some controllers have made the argument that the change in policy around the use of “position and hold,” in addition to the phraseology changes actually reduces safety3, contrary to the main reason cited for the change.

And, of course, phraseology is near and dear to many pilots’ and controllers’ hearts, and some people will always be stubborn4.

No matter which phrase feels more correct6, observing this change’s roll out presents some interesting questions for those who make configuration management7 their business.

There were some real tradeoffs to be made in terms of trying to get all of the users of the National Airspace System to change on a timetable that’s useful.

The consistency benefits of using ICAO-standard phrases must be weighed against the inconsistencies created by the fact that we continue to use older, “hold”-based phrases to describe aircraft in a state of what used to be “position and hold.”

And making these sorts of “human changes” always becomes an exercise in human factors analysis, but of all of the operational phrases to toy with, the one guarding planes off of active runways when they shouldn’t be there is a… pricey one to tinker with.

Despite all of that, until we’ve had a few months of lining up to wait, we’ll all just have to take a position on it, and hold… for now.

_______________
1 And thus more familiar to foreign pilots2 operating in our airspace
2 Many of whom speak English as a second language already
3 The argument mainly centers around the fact that not being able to clear an aircraft onto the runway to hold before takeoff reduces an air traffic controller’s ability to predict the operational context in which they’re working
4 Although, given the gravity of the “line up and wait” clearance, it’s required to be read back, the interaction above may be common to the point until it’s drilled out of pilots5
5 You do NOT want to get into this type of argument with a controller on the frequency…
6 Or seems cooler
7 Which, at its core, is a guiding principle of the National Airspace System as well

[Redacted] Your Firefox

09/21/2010

RockYourFirefox.com recently reviewed TwitterFox er… TwitterNotifier er… Echofon!

It’s a Firefox extension to interface with Twitter, with a very unobtrusive and intuitive UI. That, with a bunch of other nice features made it a pretty slick addon.

Until about five weeks ago.

In mid-August, Naan Studio released an update—1.9.6.5—that basically breaks any non-Mac, non-Win32, unofficial build. The supposed reason was support of forced-Twitter’s OAuth rollout, but since 1.9.6.4 had OAuth debug statements scrolling across my screen1, there seems to be… a less obvious reason for breaking this.

In any event, I posted a comment to the review a couple of days ago saying:

I echo these comments; Twitterfox/Echofon was a great addon… until Firefox 3.6 came out, and introduced a number of really annoying focus bugs that Naan Studio decided to totally ignore.

Then there was the recent OAuth debacle, which removed all support for not-just-Linux, but any non-Mozilla Corporation build of Firefox, it was the final WTF-straw. (I would really be interested in hearing from Naan why they purposefully broke OAuth for other platforms, since it had been working just fine before version 1.9.6.5.)

Maybe it’s better that they broke their own extension; I never noticed until I started poking around the 43 kb (yes, kilo) licensing PDF in their extension that has some pretty… “interesting,” shall we say, terms…

If you thought it was an open source/community driven extension, well… go read the licensing terms, and think again.

Unfortunately, user comments on RockYourFirefox.com are moderated2, and even though a comment posted a few hours earlier than mine can be seen on the post, mine seems to have gotten lost in the ether.

I’m hoping this was a mere oversight3, and that it’s not the case that only certain viewpoints are now allowed on AMO or RockYourFirefox reviews.

My hope is brightened by the fact that a number of other comments on both the review itself and Echofon’s comments on AMO from [Linux] users who had up to this point enjoyed Echofon.

The only difference I can see between those comments and mine was I did point out Echofon’s licensing restrictions might not be what most Firefox are used to4.

I really wish Naan Studios would unbreak OAuth for non-Mozilla Corporation-official builds; I’ve been wracking my brain trying to find a decent Linux Twitter client7, and I still haven’t found anything as unobtrusive or functional as Echofon used to be.

As for Rockin’ Out with my Firefox Out, may I suggest to the reviewers that they pre-screen the comments on extensions that Mozilla is willing to put their brand behind promoting.

It’s just feels wrong to see an extension with so many obviously disgruntled and angry users in recent AMO comment history, who have up to this point been totally ignored, get a nod and recommendation from Mozilla.

_______________
1 And it continues to work fine in my copy of Firefox 3.5
2 They must get a ton of spam!
3 Maybe author Elise Allen is on vacation?
4 Including gems like “You may not: (1) reverse engineer, decompile, disassemble, or create derivative works of the Software5, “Use of the Software may involve the transmission of data over the Internet to naan studio, to Twitter, and, as discussed in Section 8 above, to other third-party Services, and you may not be notified in each instance of the transmission of information from your computer,” etc.
5 So if you were planning on fixing OAuth6 in releases > 1.9.6.5 and releasing that yourself, you can screw off…
6 Which is, in part, a binary XPCOM component. Nice.
7 I’ve tried microblog-purple, gwibber8, Yoono, and I tried to get AIR working to use TweetDeck, but that was an even worse joke than disabling OAuth
8 Which looked really cool, but it so beta it’s not even funny, and looks more like an experiment in using interesting APIs than writing something functional

“Documentation” Review: Coders at Work

09/16/2010

A couple of months ago now1, I finished Peter Seibel’s Coders at Work.

It’s a compilation of “interviews with some of the top programmers of our time,” including people like Jamie Zawinski, Brendan Eich, and Ken Thompson.

It was an accomplishment for a couple reasons: I am embarrassed to admit I don’t have make as much time as I’d like should to read these days and since the book clocks in at smidge over 600 pages, it wasn’t a small feat.

That’s a testament to the curation of the interviews, though, I think: it’s a very readable book, and the style with which Siebel interviewed these programmers is conversational and easy to follow, but without being one-dimensional or trite2.

As a programmer3 might expect, Siebel asks a set of standard questions to each programmer, but it’s not constrained at all; each interview is marked by the tangents of its protagonist, but he works in a few set of core questions for all, including when and where did they get their start, do they feel more like an artist or a scientist, do they own Knuth’s books, and if so, how do they make use of them4, and how they feel about such things as agile and code ownership within software development shops.

The book also sets in motion an interesting, yet natural progression starting with programmers famous for their (concrete, practical applications of) work on the Web, moving through to computer languages and more formalized areas and ending with Mr. Computer Science himself, Knuth.

A few favorite highlights from the interviews include:

  • Coming to the realization of the myriad complexities5 that Brad Fitzpatrick was dealing with while us LiveJournal users were complaining left and right; I actually had a newfound respect for him after reading his interview7
  • Getting the back story as to why Brendan Eich seems to have such disdain for threads8
  • Ken Thompson—yeah, that guy—can’t check in any code written at C at Google.
  • My favorite answer given was from L. Peter Deutsch:

    Deutsch: [...] But that brings me to the other half, the other reason why I like Python syntax better, which is that Lisp is lexically pretty monotonous.

    Seibel: I think Larry Wall described it as a bowl of oatmeal with fingernail clippings in it.

    Deutsch: Well, my description of Perl is something that looks like it came out of the wrong end of a dog. I think Larry Wall has a lot of nerve talking about language design—Perl is an abomination as a language. But let’s not go there.

  • It was interesting to sort of notice which programmers immediately resonated with me, and which it took awhile to understand. I sort of immediately identified with pretty much everything Peter Norvig had to say, but found Joe Armstrong‘s explanations and general context for his answers totally confusing.
  • I was a bit disappointed that there’s only one woman—Fran Allen—interviewed in the book, but maybe that’s more a commentary on our industry than anything else9.

If you’re professionally and personally interested in the art and practice of computer science and programming and you’re looking for an approachable set of first-person accounts of the last fifty years of the industry, mixed with a splash of humanity and dose of trivia mixed in, I highly recommend this read.

_______________
1 Has it really already been that long?
2 It’s particularly a good pre-sleepy time read
3 Or social scientist/psychologist
4 If at all; he offers up “bookshelf decoration” as a valid option to that question
5 Including the reminder that programmers seldom tend to be good business people6
6 This by Fitzpatrick’s own admission
7 The almost-eights years working in the industry, providing a nice contrast to the college student I was back then, has nothing to do with it, I’m sure…
8 Apparently had to do with his tenure at SGI and weird Heisenbugs caused by issues in both kernel and on chips
9 Which is still disappointing; her points of view were particularly interesting, especially since she was working in the 60s and 70s

A New Theme for an Old Theme

09/09/2010

Anyone who has struck up a conversation with me about release engineering1 will tell you I make a lot references and analogies to aviation and the operational characteristics of the national airspace system.

Most people have had the experience of flying2 and I find it an easy, familiar way to explain, using concrete terms, my professional aesthetic of software engineering3.

Reactions run the gamut: head shaking in bemusement4 to confusion5 to awe-inspiring6 to disbelief.

That last one—disbelief—is an attitude I’ve never understood.

The counterargument I hear most often is “That’s aviation and flying! This is software. And since we’re not building medical lasers or nuclear power plant controls, no one is gonna die. So don’t waste time and energy thinking about it that way.”

The aspect those dismissing out of hand the analogy miss is that for a software company, customers are your “passengers”, and software your “plane.”

The release-vehicle—the infrastructure, support, and processes—by which your customers board that plane obtain your software and fly to their destination use it is just as important to any software development organization, as it is to any airline.

Or… at least it should be.

Discounting a model that prompts a context for thinking of your customers as passengers and your products as aircraft can easily leads to a sense of complacency7, sloppiness, and unnecessary risk taking.

Over time, these invariably lead to bits that were never meant to see the light of day getting into the hands of some customer(s).

In the best case, your teams will have a messy, but fixable, fire-drilled release to get out, one that causes customer confusion8 and lost sleep9 for your engineering and supporting teams.

In the worst case, you get some free advertising in the tech media, but probably not of the kind the marketing department is looking for.

On the other hand, using a model that by its very nature prompts consideration of customers, and their (hopeful!) reliance on your products helps to set a tone for guiding discussions of tough tradeoffs, analyzing risk, and responding to quickly evolving solutions in a safe, reasoned manner, with repeatably successful outcomes.

But don’t take my word for it.

Other industries, including the healthcare industry, manufacturing, and first responders are adapting techniques from an industry that has seen explosive growth in the last 50 years, yet has gotten safe during that time.

50,00010 planes landing safely every day is a hard statistic to ignore.

***

I’m often asked what the background image on this blog is.

It’s a tweaked image of the terminal IFR chart for San Francisco International. It also has various build logs pasted into it.

Given the above explanation, hopefully it now makes some sense.

When I originally moved my blog, it was my first experience with WordPress and theming and such; my initial attempt… wasn’t great.

I need to give a shout out to Travis Forden, CSS Wizard Extraordinaire, who gutted all of of my CSS hackery11 and table-based layouts and replaced it with the cool, sleek design you see.

I guess that’s two new themes for an old theme.

Thanks Travis!

_______________
1 n. masochist.
2 Some, even, with me!
3 A phrase I coined as an young, pre-collegiate whippersnapper, when asked “Why do you like being on the build team?”
4 “What a nerd!”
5 “How is pushing bits to an FTP site like a plane?”
6 “I really never thought of it that way; tell me more!”
7 “We’ve released this software how many times before?”
8 Especially if you did a big press push about how that release was an end-of-life release…
9 Or, if you prefer, totally wasted effort
10 Yes, you read that right.
11 I totally admit: I never really grokked CSS

If You [Can't] Build It, They Will [Still] Come

08/05/2010

Slashdot ran a Your Rights Online post a couple months ago posing the question “Do Build Environments Give Companies an End Run Around the GPL?

The upshot: are companies1 basing their device firmwares on Linux breaking the GPL by posting only their source code, but omitting details2 regarding the environment required to to build that firmware, much less flash a device with these customized firmware.

It’s certainly an interesting question.

Ask any developer and they probably wouldn’t consider the build environment as at all related to GPL-compliance requirements. That’s likely because the vast majority of open source software builds on any standard GNU/Linux machine3; the context of “GPL-compliance” is version 2 of the GPL, released in 1991, when Linux-using embedded devices wasn’t on anyone’s radar.

But as embedded and mobile consumer electronics companies have leveraged the wealth of open source software to bring products to market quickly, this has become a very real issue, and the keepers of the GPL, the Free Software Foundation, have realized that could be a problem.

One of the main issues is “tivoization“, named after the obvious reference after they disallowed the execution of firmware containing modifications on their hardware. Such behavior is specifically restricted in version 3 of the GPL.

In the latest version of the GPL, this behavior was specifically called out as a allowable use of the end user and/or developer.

This problem doesn’t just affect embedded devices.

A common starting point for an open source hacker poking at these types of products it to try to reproduce what a company ships to its users4. But these days, that may not even be possible in software.

Mozilla Corporation, for example, now builds some of its builds with profile guided optimizations. This requires building Firefox, running it in an instrumented fashion, and using that data to guide the optimization of hot spots.

To provide a good, real world set of runtime data, Mozilla tests with a static page set of about 200 websites on the web. But it can’t release this archived content—now used in the build process—due to copyright restrictions.

So, even though open source developers are building Firefox on Win32, it won’t match what Mozilla Corporation actually ships to millions of users5.

To be sure, the issue of build systems, specifically, involves a very narrow interpretation of the GPL. Even in the cases the GPLv3 specifically addresses, there isn’t broad agreement among open source developers that such use should be disallowed7.

It may be annoying to open source hackers to be unable to fully experiment with custom firmwares their shiny new embedded, consumer product whiz-bang device.

It may take more time to reverse engineer how to flash these devices8 to get firmwares on them.

This sort of “freedom” may require a tradeoff in terms of enjoyment or functionality of these devices as manufacturers move more functionality into service offerings in the cloud, they can refuse access by these custom firmwares10.

But up to now, users have voted with their dollars and download clicks: they don’t care.

Most open source developers apparently don’t either.

_______________
1 who have the resources, but not the resolve
2 Either images or documentation
3 The venerable ./configure && make && make install triad
4 The so-called identity proof
5 To be fair, I cite Firefox as an example of this problem because it’s one I’m familiar with; Mozilla requires probably one of the most complex open source build environments around, and they’ve done a good job of, especially, documenting their build environments to the extent possible6 and making it easier to build on Win32.
6 Something I have no qualms asserting a large chunk of responsibility for having made happen
7 Linus Torvalds, for one, has stated that he doesn’t consider such restrictions on the use of the hardware to be a problem
8 And make no mistake, there are open source hackers out there, that totally get off on that sort of stuff, so they will find a way to do it9
9 Even if it nets them a few bricked devices
10 As Sony has done with its “Other Operating System” feature, and now routinely does with the Playstation Network and PS3 updates

The Widest Common Denominator

07/28/2010

One of the common issues raised on my build system1 code reviews is why I went to the effort to add hooks in a particular place or make certain things variables instead of just hard-coding the values.

“No one will ever change that,” the reviewing developer argues.

The ensuing explanation always reminds me of a story a mentor of mine once told me:

The scene: a meeting between various high-level engineering managers, including a few of the company’s founders2, and of course my mentor, the build/release team lead.

The topic? Whether or not rewrite the aging build system… and if so, how?

With an increasing list of products, we’d outgrown the build system currently driving the builds3

In the course of the meeting, various developer pain-points with the current system were vented; efficiency problems4 were detailed; limitations on the release-side, which were starting to affect business decisions were described.

Eventually, everyone reached the same conclusion: this was one rewrite that really was necessary. Excitement from those who fought daily with the current system permeated the room, like a nerdy kid in a candy store being told “Hey, grab whatever you want and make an über-treat!”

In the way only an engineer noticing their opportunity to really solve a problem they understand5 can, discussion shifted immediately to implementation details: what should the new system be responsible for? What should the interfaces be? What are the high level components? What language should we use? [Should we reinvent the wheel?] Who should write it? How will we deploy it?

Granted, it was totally out of scope for the meeting, but as you might expect, such discussion was of much more interest to the engineers.

One of the issues raised was how one should be able to invoke the shiny, new, to-be-implemented build system. One of the engineers in the room said “Oh, that’s easy; you’ll just run this command; done.”

My mentor raised his hand and said “Well, if we’re going to hook this into the release tools, we’ll need to be able to call it in these couple of ways, since the current system supports that and we make heavy use of it.”

The engineer replied “No, we don’t need to support that; you can just change the interface in the tools; it’s really easy.”

Another engineer blurted out “No, we need that; I do my daily builds by using those targets. Besides, the performance team hooks in by calling that other target, too.”

The two engineers started arguing with each other about whether or not their use cases were valid. The conversation drew the other engineers in the room into the fray.

Like a panel from a fight in one of those old Peanut comics trips, statements like “You build the product by doing that? Really?!! Why on god’s green earth would you ever do it that way” were heard to be uttered, as some of the smartest engineers in the company argued with each other about how they did their own builds,

My mentor, quiet as a mouse, just sat back in his chair… and smiled.

I can only imagine the smirk on his face was due largely to the fact that the discussion had turned into an argument about individual work-flow preferences. Those are values differences, which try though one might, an engineer won’t come to a “correct” solution.

Because there isn’t one.

This is why so often in build system-related code, one will see myriad ./configure options despite the default being used 98% of the time; why the call to that common utility program can be overridden by the environment; why there are “prologue” and “fin” hooks that can be plugged into.

Good build engineers internalize early on in their career6 the necessity for this sort of code.

It’s not “over-engineering” or “useless complexity.” That target that makes no sense to you? We added that for the developer who’s now the CTO. The environmental override? Yah, that allowed one my colleague to get our product building on PH-UX in a a couple of days instead of three weeks. Those weird “pre-” and “post-” hooks? The QA team uses them for automation. And the performance testing team. Oh, and us. We use those to do, y’know, releases.

These “structured layers of indirection” address the requirement that a build/release organization support the maximum number of developer and QA use-cases it reasonably can, as often as it is able to do so; they are such a ubiquitous solution, they’re probably better characterized as a “build system design pattern7.”

To this day, when I find myself reminded of my mentor’s anecdote while explaining this pattern manifested in code, I often long for another engineer to randomly walk by, innocently eavesdropping at just the right point to blurt out to the other developer “Wait, you want to do what with our build system?!”

It really is the most tangible, likely-to-stick explanation available.

And it’s certainly more amusing for me.

_______________
1 Think makefiles and scripts used during the build itself
2 All still hard-core developers within the organization
3 Given that system had been in place since the company was the founders’ “summer project”, no one could claim it didn’t have a good run…
4 When a complete build of the product takes 5 or 6 hours, and is trending in the wrong direction, everyone starts paying attention to this
5 Or so they think…
6 Assuming they make a career of it
7 More on… well… more of these later. Hopefully…

Newer Posts
Older Posts