Twelve is the new ten
One of the important roles of a build team is to help define, create, support, and use an official build environment.
In the case of a company working with a product that is open sourced, this can turn out to be a… colorful, shall we say, task, since it can often involve community members trying to build on “exotic”1 platforms like AIX and BeOS.
MoCo uses virtualization in the build farm (and on the IT side, too) for operating systems work as guests2, the result of a decision that made a lot of sense back when I was trying to get various important build systems-that-had-no-defined-environment off of shaky, failing hardware.3
As machines have gotten faster (and these days, as they have more cores), virtualization provides an easy, reasonably performant way to create, isolate, and distribute official build environments.4
Therefore, it wasn’t a surprise to find VMs in production roles for build-related tasks here at the Nest.
One of the common complaints about virtualization is that a completely unmodified operating system image run under reasonable (or high) workloads tend to lose time.
Virtualization may seem somewhat magical, but there’s a pretty simple reason for why this happens.5 In the good ol’ days, VMs tended to only lose time, but now with the 2.6 Linux kernel trying to be smart6, VMs can gain time, too.
In our case, a whole lotta time.
The originally reported symptom of the issue was build deliverables with timestamps far in the future. After reading up on the current fixes to the time-syncing problem (which VMware tends to rev as operating systems change), I tried a number of the standard fixes, all of which had worked for me in the past. None of them worked.
We were still gaining two minutes for every ten minutes of wall clock time. That works out to almost one-and-a-quarter days per week the VM leapt into the future.
Something was obviously, seriously wrong with this machine. I’ve been working with VMs for quite a few years now, and I’d never seen clock skew this bad.
This was a 64-bit VM, and it turns out that the kernel options you pass to fix time skew on 64-bit Linux VMs differs from the options for 32-bit. But even the “correct” options didn’t fix it.
After much Googling and forum trolling, someone suggested turning off the CPU frequency scaling daemon in the guest OS. This, of course, makes perfect sense, but the guest VM didn’t have it on.
Then it hit me: we’re using hosted virtualization. Originally, it didn’t even dawn on me to poke around the host OS’s settings, since MoCo uses the enterprise version of VMware7, and I was used to tackling VM problems from that angle. ESX runs VMware’s own proprietary (vm)kernel8, which I’m sure turns all the whizbang features on modern CPUs that would confuse the hypervisor off.
Five minutes of poking around, and I found the cause of the problem: powernowd. It was running on the host OS and “helpfully” scaling the CPU for us.
To understand why this totally screwed us, an analogy may be in order: imagine, if you will, that you must drive a car at thirty mph down a street or it’ll blow up. “Easy,” you think to yourself. But imagine that someone else is erratically controlling the gas pedal and you have to manage your speed by applying opposite braking power. This is the original virtualization timing problem.
Now imagine that the car—an eight cylinder gas guzzler—randomly decides how many cylinders to fire at any given time. It might decide to use just two, or it might decide to start using all eight.
Now what do you think your chances of maintaining a steady thirty mph would be?
A quick apt-get remove, and the next day, the builds were not… “from the future,” and the clock was still synced to the host (courtesy of vmware-guestd).
Sure, we’re using a few more electrons… but at least their deaths are for a good cause: keeping space-time from ripping apart.
In the virtual machine’s universe, at least.
____________________
1 These days, anyway…
2 [Insert complaint to Steve Jobs about lacking MacOSX Virtualization here.]
3 P2V to the rescue!
4 It also turns out to be cheaper and faster to have a SAN-backed, VMotion-capable build infrastructure than triply-redundant Mac Minis doing the same thing. I have to imagine it’s certainly more reliable, and less of a physical-management headache.
5One of the core techniques involved in virtualization is playing with clock interrupts; this means you don’t get a constant stream of clock interrupts, but rather (especially under load), you may get one or two interrupts followed by none for an interval, followed by 3 or 4 to catch you up. Of course, if you lose these interrupts over time, you’ll lose time.
6 The 2.6 kernels contain a few algorithms to attempt to “recover” lost clock interrupts on physical hardware. This doesn’t work in a VM, but it can cause the kernel to count interrupt that never existed, thus time moves quicker.
7 MoCo originally used hosted virtualization—GSX, to be specific, the product on which VMware Server is sorta-kinda based—as well; I whined (sorry Justin) until we tried out the server product.
8 Which, popular to contrary belief, is not a set of patches against RHEL; it merely uses RHEL to boot, and I’ve heard they can boot the vmkernel on its own in the labs in Palo Alto.
s/elections/electrons/
What a Freudian slip…
But hey… maybe we’re using a few more of *those*, too…
Hah, that sounds like it must have been fun to debug :-/
This problem also might have been, in part at least, specific to the CPUs your physical machines run. VMware’s canonical timebase is the CPU’s timestamp counter (TSC). New CPUs tend to run the TSC at a constant rate, since more and more operating systems (very recent Linuxes, Mac OS X) also use the TSC as their primary timebase and the OSes are all starting to expect TSC rates to remain constant.
For a while, though, CPUs ran the TSC at whatever rate the physical CPU clock ran at. CPU frequency scaling meant scaling the TSC frequency too. Naturally this will really confuse any OS that relies on the TSC as its timebase, and it will do a number on VMware’s timebase.
(If I knew exactly which CPUs used which scaling method, I’d be working on a different team