Continuous Integration

Release Schedule

October 2015
S M T W T F S
« Nov    
 123
45678910
11121314151617
18192021222324
25262728293031

Branches

Twitter

Simply ship. Every time.

Standard Library Rage Of The Now

06/02/2008

Probably one of the most common operations that build/release automation systems do is running other simple commands1, and making sure that they succeeded. Sometimes, you end up wanting to parse the output of the command you ran in your native language. Maybe you want to log it to a file, too.
At first, you might think this is an easy problem: most scripting language give you a multitude of ways to do accomplish this seemingly trivial and common operation.
Except… they mostly leave something out. A lot of the most commonly-used methods for this—backticks and system()—really mess it up, save for the trivialist, “No I don’t actually care if you ran the command I told you to run”-cases.
It turns out that if you want to call lots of commands in robust ways, it’s a somewhat complicated process2
I’m generally defining “robust” as

  • Handles command arguments safely and securely
  • you can pass it a timeout value, and it’ll do the right thing
  • if some external force kills your subprocess, it’ll do the right thing, and communicate as much information as possible about what happened, and
  • you can easily get access to standard out and standard error (SEPARATELY3), and manipulate them to your liking
  • Has other niceties, like letting you know how long the program took to execute and changing the working directory

None of Perl’s4 built-in constructs do this for you, which is why there’s a beautiful5 263 lines of Perl I wrote almost two years ago to do this correctly. And it took about nine months to get all the corner cases correct and all of the bugs ironed out.6
Here at the Nest, we mostly use Python for build stuff, so when the need for “running a subprocess and collecting its output” came up, as it invariably does, I thought to myself “Well, with the Python book coming in at almost 1,600 pages, there’s gotta be something that wraps all the ugly, operating system-level (and -specific!) functions to do this!”
It turns out there is: the subprocess module!
At first, I was ecstatic, but as I played around with it more, it turns out that it’s completely and utterly useless for arbitrary data output sizes.
And I don’t mean “arbitrary” in the “gigabytes of output”-sense, but “arbitrary” in the “whatever the size of your kernel’s pipe() buffer”-sense.
Oh, and no… there isn’t a desire to make the library actually… y’know… useful.

Sigh.
Oh Lazy Web7 please tell me I’m missing something very obvious, and that there is a Python module out there for me.
I have high hopes that this is, indeed, 2008, and I don’t have to implement this myself.
Again.8
‘Cause I don’t wanna.
I really don’t wanna.
________________________________
1 Like “make” and “cp”
2 No pun inten… oh, sure, whatever. Sure. Pun intended.
3 Or not.
4 Let us say…
5 If I do say so myself…
6 Well… not exactly, ’cause I just found a minor bug while perusing it now…
7 Dear Lazy Web: I cannot link to you because you are not defined on Wikipedia or Urban Dictionary. Plz to be fixing so I can has. <3 preed
8 For the fourth time in the third language…

It’s all about the Chases

05/08/2008

Maybe I only notice these things because Gerv has so religiously run the “n00,000 Bugzilla bug”-contests
Or maybe it was randomly running across Wil’s post about all of the web team’s commits the other day…
Or maybe… it’s just because I’m a dork about these sorts of things…
… buuut I was amused to realize the other day that here at the nest, we’re due to hit the 10k mark on both bugs (I just filed bug 9087) and commits (changeset 9200 just went in).
I’m betting we’ll probably hit 10k on both before 0.6 ships… although, I find myself wondering who will make it across the milestone first. Even though Bugzilla is behind at this point, I’m betting it’ll catch up to commits before we ship.
Anyone bets on when we’ll hit 20k?
(I was going to ask about the next-order-of-magnitude milestone, but… a Wilson party is probably a couple years away… at least.)

The Changing of the Guard

04/29/2008

Almost four weeks ago now1, dbaron took some time and separated out the various guts of the old Mozilla tools module, and divvied the contents of the directory2 into two new Mozilla modules: a Build & Release Tools module and a Code Analysis and Debugging Tools module.
I wanted to call this change out, because I think it’s a step in the right direction, for a lot of reasons; both of these new modules have seen a lot more activity in the last 2-3 years and I think it really helps to separate them out from a code management standpoint, but also from the standpoint of the way we think about these sets of tools: the guts of any organization’s build/release infrastructure and analysis/debugging tools should be first-rate citizens, and this change helps to frame the way those parts of the code are thought about.
So, thank you, dbaron; it’s been a long time coming, and I appreciate you taking the time to sort that goop of code all out into [more] logical units.

***

A couple of days after the announcement, I brought up the fact that with Rob Helmer’s announcement, the Build & Release module would not have anyone at Mozilla Corporation who could offer reviews for the module, given that rhelmer, Chase and myself were listed as the owners, and there weren’t any peers yet.
This obviously makes little-to-no sense.
After a lively discussion about how to best remedy that, I also wanted to note that long-time Mozilla community member Nick Thomas3 is the new module owner for the Build/Release Tools module, with Chase, Rob Helmer, and myself staying around as peers.
Coop was also added as a module peer, whose omission was, let’s face it, somewhat ludicrous4. I’m happy to see that got corrected as well.
Congratulations, Nick; you follow a long line of Mozilla Build Engineers, and I know you’re going to [continue to] hit it out of the ballpark.
Let me know where to send the bottle of scotch.
_____________________
1 Wow… on my birthday, no less! What a present! :-)
2 which had mostly become an “island of misfit code
3 also known as the “Build Engineer Formerly Known as CF”
4 No one escapes Mozilla Build/Release, Coop. No one. ;-)

Once More Unto the Bleat, Dear Friends

04/15/2008

I think it is pretty well known that I am a sheep, and have a natural tendency to follow the crowd.
At home:

[preed@underworld ~]$ uname -a && history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head

Linux underworld 2.6.19.7 #14 SMP Mon Nov 26 12:18:50 PST 2007 i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux

3870 mtail
1

1110 mutt

965 j
2

717 imtail
3

472 ps

374 fg

313 mocp

142 screen

139 cd

112 mail4

For comparison’s sake, at work:

preed@preed-desktop:~$ uname -a && history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head

Linux preed-desktop 2.6.22-14-generic #1 SMP Tue Feb 12 02:46:46 UTC 2008 x86_64 GNU/Linux

1043 svn

861 dir
6

836 cd

267 more7

229 ssh

221 ls

200 vi

136 grep

134 fg

119 ps

It would seem I pretty much am only using my workstation for reading email and listening to music. At least my work history has some semblance of related productivity tools.
Apparently, I also use a ton of bash aliases.
Huh.
I think I tend to run “mtail” and “ps” as nervous ticks when I’m thinking about something…
__________________________
1 mtail is a bash alias to tail -n 12 ~/.fetchmail.log ~/.procmail.log
2 j is a bash alias to jobs
3 imtail is a bash alias to parse-timpslog -d ~/timpslog -a, which gives me a log from the AIM proxy I use
4 Not what you think; mail is aliased to mtail1,5
5 First recursive footnote! Booyah!!
6 dir is a bash alias to ls -laih --color; yah, yah… I know.
7 more is a bash alias to less -X -I

A First Timer, All Over Again

03/31/2008

[Editor's note: this was written on Thursday evening, but I spent my weekend moving1; by the time I got Internet Tubes installed at the new place, the weekend was over.]
Astute readers may have noticed we released Songbird 0.5 last night. It’s got some pretty cool fu in it, too! Definitely worth playing with.
A lot of people worked really hard on this release, putting in plenty long hours and a couple of long weekends.
The first release any release engineer does in a new environment is noteworthy. It usually feels scarier than it really is, and you end up taking a lot of notes, not only for the “gotchas,” but for the “Oh man, we gotta fix that”-s.
It’s weird that I vividly remember the first product I shipped2 ever since I’ve been a build engineer.
At VMware, it was a pretty disconnected, formal process involving creating tarballs of specific formats for another group3 to get them ready for the website. There was some weird rule about only being able to do so many releases at certain points in time, because the website could only handle so much traffic. There was also an ISO image that got created for those Enterprising individuals who only took their software in bits smushed onto plastic-form.
It was all very interesting, but being an enterprise product, the team seldom got any feedback, good or bad, until a couple of months later when “enough early adopters” had installed the new software to get an idea of what was and wasn’t broken. In some sense, it was4 something like sending your kid off to college… knowing you’d see them again for Christmas, but that’s about it.
Contrast that with my first Mozilla release, where the Community was abuzz with news of the release5, and the second I posted the bits, people started downloading them. I vividly remember sitting at my desk, ready to press Enter on the rsync command that would dump updates out to millions of users.
Chase was standing behind me. “Looks good,” he said, and patted me on the back. I rolled my chair back from my keyboard, and stuck my finger out to press the key. As my finger got close to the keyboard, I cringed, not unlike someone detonating a suitcase bomb, and not knowing if there were firecrackers in it, or C4.
When it was all said and done, it didn’t matter which it was; I had done it right6. Chase chuckled as he went back to sit at his desk. I think he knew it at the time, but pressing “return” got progressively easier with each subsequent release, even though it was updating more and more people.
At Songbird, the experience was much the same, and given that the technologies involved are similar, it was mostly an exercise in learning where Songbird’s engineers had moved the mines in the field that is Mozilla’s Release Process.7
Songbird’s requirements, of course, are very different than Firefox and Thunderbird’s, and it’s been very interesting to see how the same problem was addressed with different contextual constraints.
The handoff mechanics are similar to Firefox: updates get verified in a staging environment, installers then get posted, and then updates get pushed to production8. I had to learn to shut up when Redfive was explaining something to me, since lots of things really are the same in the Birdnest, but lots of things are not, and the reasons behind those deltas are really important. I didn’t even notice that I was doing the same bomb-in-suitcase maneuver until Redfive asked me about it as I was getting ready to run the push_production_all script.
Speaking of history repeating itself, we had a minor problem with AUS updates that turned out to be due to a typo on my part9. In a weird way, I felt right at home again. (It probably had something to do with having to read updater’s source code.)
I must admit, the one thing I love about both release processes like Songbird and Mozilla’s is the fact that you run a command, and Stuff Happens ™.
Immediately.
People get the software, and they’re using it, and they want to tell you about that experience immediately. It’s much more like sending a kid off to kindergarten: you know you’ll get to hear about their day shortly, and it’s almost always either a really good day, or a really, really bad day10. Either way, it keeps you engaged, and you can work on helping make the next time around better.
Songbird does have one thing up on Mozilla’s release process: right after we shipped and updates starting permeating the ‘Net, we all gathered around to savor a bottle of scotch.
Any release checklist that has that codified on it11 is definitely a place I can get on board with.
__________________________
1 More on that shortly…
2 Which is, of course, overloaded in our industry, but doubly so in ReleaseLand, since “shipping” usually involves being the person to see the product out the door
3 IT, as I remember
4 I can only imagine…
5 As they always are
6 Thanks to QA, as always, for helping me make sure I did get it right
7 Which, having implemented portions of that process, in policy and code, I say with an unwavering amount of love
8 All while other teams are busily working on things like release notes and website updates that I’m, frankly, quite happy not to have to work on
9 Yah, we’ve NEVER HEARD OF THAT BEFORE.
10 But don’t read too much into that analogy, though.
11 The closer the drinking is to the end of the checklist, of course, the better; this metric actually turns out to be a pretty good measure of the quality of an organization’s release process…

Head in the Clouds, Juliet

03/23/2008

It occurred to me this weekend that I haven’t written about going aloft in quite some time1, which is especially egregious given that I’m a birder these days.


Multi-hundred dollar bread-bowl of clam chowder.
And Mexican Coke.

We happened to get Good Friday off2, so I decided to reserve a plane for ten hours to go… somewhere, on the off chance that the weather would be good.
The weather turned out to be beautiful!
I didn’t have the time to really do any flight planning, so I ended up asking a close friend to head down to San Luis Obispo, the town of my alma mater, for a $600 bowl of clam chowder which, since I’ve flown the route more-fingers-than-I-have times, requires all of five minutes of flight planning3.
We ended up going down the coastal route instead of the inner-Salinas Valley route4, which provided great beach views for almost ninety minutes. Plus, I was able to enjoy the fifty-plus miles of visibility on the flight back by having my friend fly us home5.
My friend saw fit to capture the flight with his new camera, documenting both his expert pre-flight suggestion and a secret routine all pilots conduct before every flight, but that is an entirely unmentionable ritual of the industry.
The entire Flicker6 set here.
______________________________
1 a couple of posts excluded from the count
2 which is no one else in the known Government Holiday-world ™ seems to get off, but on the other hand, it’s smack dab in the middle of MLK-day and Memorial Day, so it turns out to be a better distribution of holiday-ness.
3 I mean, what’s easier than KNUQ VPLEX SNS SARDO—you want to be sure to avoid restricted-2504—PRB direct?!!
4 KNUQ VPLEX KMRY BSR MQO direct, roughly…
5 I was somewhat mean; I turned the GPS off, and said “if we get to the Oregon or Nevada borders, I’ll let ya know we’ve gone too far.”
6 Non-web two-point-oh spelling

What does God need with a starship?

03/03/2008

There is currently an interesting discussion1 going on right now in mozilla.dev.platform which asks a seemingly simple—but if the number of responses are any indication, actually difficult—question: “Is ‘Gecko’ 1.9 only Firefox?”
It’s definitely required reading if you hack on anything built on the Mozilla platform that isn’t Firefox.
To be honest, I find it a little surprising that others find it surprising that the answer is anything but “Yes.”
There are a lot of principles discussed in the newsgroup posting, but in practice, ever since I was involved with Firefox releases, it is Firefox—not Thunderbird, not SeaMonkey, not Calendar, and not (arguably most appropriately) XULRunner— that has driven Gecko releases. The best determining factor of Gecko releases, i.e. when the Gecko platform version is bumped, was originally standardized as part of the Firefox release process, and is now codified by the release automation.2
I fully admit that I am interpreting the question from the (possibly too narrow) standpoint that I’m mostly familiar with: the Firefox release process. It is entirely possible that the triage process is different.3 But it has the benefit of being entirely devoid of a process4 with lots of human input and interaction which necessarily implies non-deterministic outcomes.
But the answer to “When is Gecko released?” is pretty binary. Since there are no separate XULRunner releases, just look at when the Gecko versions are bumped: for quite some time, it’s been at Firefox release time.
Immaterial of how we want to do things or would like to do them in a utopian world, that’s what the code in CVS does today, right now. It would seem the requirements and triage process mimic the code, namely in that Firefox drives the process.
Almost ten months ago now, it was stated pretty clearly that “XULRunner improvements should benefit a range of other applications but the focus of Mozilla employees will be on browsing.” There was a lot of discussion about what the material impact of the policy statement would be, including my own two cents on the matter.
I may be misunderstanding the posit of the original question posed in the newsgroup, but it would seem that this is an concrete, identifiable example of a question directly arises from a policy of having a singular product and its release schedule drive the development of the platform.
If this isn’t the outcome developers on Mozilla’s non-Firefox projects expected, then it may be time to ask “where was the disconnect between what was said and what was heard?”
For me, the discussion raises some meta-questions that, to my mind, are not only more important than “Is ‘Gecko’ 1.9 only Firefox?” but whose answers also turn out to be more relevant than just what happens in the next N months, as lots of people work really hard to get an awesome Firefox 3.0 out the door:

  • Is there going to be a separate XULRunner 1.9.0 release before or during5 the Firefox 3.0 release time frame? If so, how are triage decisions for that release (being) made? Does the process include stakeholders for projects that aren’t-Firefox?
  • (I admit that this may be a totally stupid question on my part, but it’s totally unclear to me:) who is the final determiner of the answer to the above question? There’s a lot of really good discussion going on in the newsgroups, but who makes the final call?
    Everyone tells me drivers@ and staff@ are “gone,” and have been for quite some time. Does this mean the Firefox 3 release drivers6 will decide what a “Gecko 1.9″ is? The Steering Committee? A vote of everyone who’s posted more than five times in mozilla.dev.planning?
  • If that answer is the Firefox 3 release drivers or the Steering Committee, as an open source project, is it relevant to ask if any of the lessons learned in previous days—when the Mozilla Project was largely dependent on a single company, i.e. AOL/Netscape, for engineering work—are applicable?7

Distinct from any discussion of the questions above, I stand ready, as I did ten months ago, to help in any way that I can. Like most people, I use and adore Firefox and agree with the argument that its Mozilla Corporation’s best lever to move the (Open) Web forward.
But I also use and love Thunderbird. I have used9, and relied upon Calendar. And I believe that the XULRunner platform is the best lever the Mozilla Project has to future-proof against whatever develops on the Internet10, as well as provide a compelling answer to proprietary rich Internet application platforms from you-know-who.
Probably the most useful answer is the one to the question: tell me, an engineer with admittedly limited experience with the Mozilla build system, Tinderbox, and the Mozilla release process, how I can help the answer to the original question be a demonstrable, unequivocal “No.”
_______________________
1 Which was brought to my attention by a Rumbling Thunder post
2 rhelmer implemented the first version; I added the respin logic
3 Although, blocking1.9-flag triage was what prompted the question originally, so who knows
4 Which Brendan aptly describes as “fluid
5 “after Firefox” is a possibility to, but arguably of limited utility
6 Is this a complete/up-to-date list? A couple of blocking1.9 flags have been set by people absent from the list.
7 The question is posed merely by the fact that those two groups are made up of entirely8 Mozilla Corporation employees. I’m not implying any judgment regarding people who work on those groups; I’m merely pattern matching their relational structure to the Project from previous eras.
8 or nearly entirely; the FFx3 drivers list has one non-MoCo-er on it, that I can find anyway
9 and with the number of meetings I’m now responsible for, need to set up again
10 Who knows what we’ll all be using in ten years. Gopher could make a comeback!

Twelve is the new ten

02/21/2008

One of the important roles of a build team is to help define, create, support, and use an official build environment.
In the case of a company working with a product that is open sourced, this can turn out to be a… colorful, shall we say, task, since it can often involve community members trying to build on “exotic”1 platforms like AIX and BeOS.
MoCo uses virtualization in the build farm (and on the IT side, too) for operating systems work as guests2, the result of a decision that made a lot of sense back when I was trying to get various important build systems-that-had-no-defined-environment off of shaky, failing hardware.3
As machines have gotten faster (and these days, as they have more cores), virtualization provides an easy, reasonably performant way to create, isolate, and distribute official build environments.4
Therefore, it wasn’t a surprise to find VMs in production roles for build-related tasks here at the Nest.
One of the common complaints about virtualization is that a completely unmodified operating system image run under reasonable (or high) workloads tend to lose time.
Virtualization may seem somewhat magical, but there’s a pretty simple reason for why this happens.5 In the good ol’ days, VMs tended to only lose time, but now with the 2.6 Linux kernel trying to be smart6, VMs can gain time, too.
In our case, a whole lotta time.
The originally reported symptom of the issue was build deliverables with timestamps far in the future. After reading up on the current fixes to the time-syncing problem (which VMware tends to rev as operating systems change), I tried a number of the standard fixes, all of which had worked for me in the past. None of them worked.
We were still gaining two minutes for every ten minutes of wall clock time. That works out to almost one-and-a-quarter days per week the VM leapt into the future.
Something was obviously, seriously wrong with this machine. I’ve been working with VMs for quite a few years now, and I’d never seen clock skew this bad.
This was a 64-bit VM, and it turns out that the kernel options you pass to fix time skew on 64-bit Linux VMs differs from the options for 32-bit. But even the “correct” options didn’t fix it.
After much Googling and forum trolling, someone suggested turning off the CPU frequency scaling daemon in the guest OS. This, of course, makes perfect sense, but the guest VM didn’t have it on.
Then it hit me: we’re using hosted virtualization. Originally, it didn’t even dawn on me to poke around the host OS’s settings, since MoCo uses the enterprise version of VMware7, and I was used to tackling VM problems from that angle. ESX runs VMware’s own proprietary (vm)kernel8, which I’m sure turns all the whizbang features on modern CPUs that would confuse the hypervisor off.
Five minutes of poking around, and I found the cause of the problem: powernowd. It was running on the host OS and “helpfully” scaling the CPU for us.
To understand why this totally screwed us, an analogy may be in order: imagine, if you will, that you must drive a car at thirty mph down a street or it’ll blow up. “Easy,” you think to yourself. But imagine that someone else is erratically controlling the gas pedal and you have to manage your speed by applying opposite braking power. This is the original virtualization timing problem.
Now imagine that the car—an eight cylinder gas guzzler—randomly decides how many cylinders to fire at any given time. It might decide to use just two, or it might decide to start using all eight.
Now what do you think your chances of maintaining a steady thirty mph would be?
A quick apt-get remove, and the next day, the builds were not… “from the future,” and the clock was still synced to the host (courtesy of vmware-guestd).
Sure, we’re using a few more electrons… but at least their deaths are for a good cause: keeping space-time from ripping apart.
In the virtual machine’s universe, at least.
____________________
1 These days, anyway…
2 [Insert complaint to Steve Jobs about lacking MacOSX Virtualization here.]
3 P2V to the rescue!
4 It also turns out to be cheaper and faster to have a SAN-backed, VMotion-capable build infrastructure than triply-redundant Mac Minis doing the same thing. I have to imagine it’s certainly more reliable, and less of a physical-management headache.
5One of the core techniques involved in virtualization is playing with clock interrupts; this means you don’t get a constant stream of clock interrupts, but rather (especially under load), you may get one or two interrupts followed by none for an interval, followed by 3 or 4 to catch you up. Of course, if you lose these interrupts over time, you’ll lose time.
6 The 2.6 kernels contain a few algorithms to attempt to “recover” lost clock interrupts on physical hardware. This doesn’t work in a VM, but it can cause the kernel to count interrupt that never existed, thus time moves quicker.
7 MoCo originally used hosted virtualization—GSX, to be specific, the product on which VMware Server is sorta-kinda based—as well; I whined (sorry Justin) until we tried out the server product.
8 Which, popular to contrary belief, is not a set of patches against RHEL; it merely uses RHEL to boot, and I’ve heard they can boot the vmkernel on its own in the labs in Palo Alto.

“Norcal Approach, Songbird 34, six for five, two-fifty to join the localizer.”

02/11/2008

Today was my first day at Songbird1 as the new Manager of the Build/Release Team.


San Francisco International Airport’s
Runway two-eight-right Category III
ILS Approach Procedure

Contrary to (popular?) irc.m.o legend, I didn’t die or get run over by a compiler.
I was, however, spending some quality (and deserved, I think) time on my couch with various DVD box sets and, post-Christmas, my new PS3.2
I must admit, even though I spent most of today setting up my workstation and email and wikis and all that (important) stuff, I’m worried I may have looked like I was a little bit out of it: I was so excited about today, I only got about ninety minutes of actual, useful, real sleep before I had to get up at 6 am.
(Oh, and now I’m finding out all the various ins and outs of CalTrain schedules. I think I need some more practice, as I totally got on the wrong train going home today. But hey, I say we all need a tour of the Silicon Valley corridor from time to time to keep our appreciation fresh.)
I’m really looking forward to the new opportunities and challenges working in an environment like Songbird presents, but I think I’m most stoked about:

  • Working on a project and product related to Mozilla and Mozilla’s myriad technology.
  • Once again being able to work on an open source project. I’ve always believed it’s a privilege to be able to make a living working on open source. Plus, I think as software engineers and hackers, working in that context keeps us honest about our code… or at the very least humble. ;-)
  • Working on a project whose focus is on open media on the web, which I admit I’m just beginning to really feel like I grok. As an admitted die-hard Linux user who now-has-a-commute and is getting a shiny-new-iPod-for-said-commute, having an open application to get various media from various sources onto that iPod, on an open operating system, is… really important to me.

Admittedly, this is just from my sense from a single day, one with way-not-enough-sleep. :-)
Anyway, I’ll be hanging out on IRC—even moreso now3—so if you need to find me… just holler, and you’ll get me.4
Still.
______________________
1 Or Pioneers of the Inevitable—a name my mother absolutely loved when she heard it—if you prefer…
2 Xbox-ers and Wii-ers: no making fun. I actually like it! And now that “Bluray has won,” I’m even more pleased with it.
3 Which admittedly wasn’t a ton the last couple months ’cause… well… come on… West Wing. And Arrested Development!
4 Or… Reedone of us…

Newer Posts
Older Posts