Old Checksums Collide Hard
Our industry still insists on using MD5 checksums. And I don’t know why.
Since 2004, it’s been widely reported that “the security of the MD5 hash function is severely compromised.” When talking about collision attacks in hash functions, the answer to whether or not they’re broken—and thus useless for the sole purpose we employ them—is not unlike answering whether or not one is pregnant.
And yet, I still see MD5 checksums floating around all over the place, providing a false sense of security.
It’s interesting to speculate on why this might be.
Obviously, there’s the inertia argument: release infrastructure is one of the least exciting places for the business to invest in, so if you’re publishing checksums at all1, changing the hash functions—”What do those even do? And does any customer use them?”—probably isn’t the top of management’s list.
(It doesn’t help that OS vendors don’t make this easier either: OS X Mavericks still ships an command line md5 tool, but nothing to easily calculate SHA12. And you wanted to easily checksum a file on Windows? Good luck with that.)
Perhaps it has to do with the fact that (oddly?) most people just don’t seem to care to validate that the bits running on their machines are what the software company intended to ship? I guess they assume that nobody is tampering with their Internet traffic? It’s obviously a point of consternation when their machine goes rogue on them, but… that never actually happens, right?
Anyway, ever since 2006, I’ve been suggesting to my clients that they use SHA1 checksums, and forgo even publishing MD5. But that advice may not be correct anymore.
At the recent Chaos Computer Club conference in Germany, researcher Rüdiger Weis presented evidence (English translation) that SHA1 should be taken as effectively broken too.
The news is particularly interesting, especially in light of the recent revelations regarding the NSA: part of his justification for arguing its compromised status is that SHA1 is in the same family as SHA03, which was developed by the NSA, without a published technical specification; it, too, is considered broken. He also notes the modification to SHA1 was made by the NSA without any technical explanation.
(Weis puts it more colorfully: “I would personally rather put a wager on Britney Spears’ virginity than as to the security of SHA-1.”4)
His recommendation for a hash function? SHA35. Unfortunately, it’s less clear what we should do.
Security guru Bruce Schneier says SHA3 isn’t necessary, and SHA512, a member of the SHA2 family, is as of 2012 “still looking good6.” It doesn’t help that even if we wanted to use SHA3, I couldn’t find a utility on Linux implementing it; sha512sum is implemented in coreutils 8.x.
Maybe we’ll all end up moving toward what Gentoo does, and brute-forcing it by validating multiple checksums, from multiple families; portage currently checks: sha256, sha512, Whirlpool and file size.
Either way, anyone care to wager on how long it’ll take for us to stop relying on SHA1 checksums to validate bits? (That is… if we ever do…)
- Good on you!↵
- Yes, I know you can use openssl for that purpose, but it’s still annoying↵
- And SHA2, for that matter↵
- Translation mine, with the help of Google Translate↵
- For encryption, he recommended against ECC, in lieu of 256 bit AES, 4096 RSA DHE and use a 512 bit hash function↵
- I guess he’s less worried about NSA-backdooring than Weis↵
OS X has a tool called shasum that will by default calculate a SHA1 checksum. It’s written in Perl and even supports SHA512.
Hey Youssuf,
Well, I learned something new today! Thanks for the tip!
I still wonder why Apple had to “Think Different” on this, and call it something other than sha[bitwidth]sum (even if it wasn’t from coreutils, which I would take no exception with).
Hmm, do the hashes published alongside the software actually protect against malicious tampering? I sort of assumed they were the moral equivalent of CRC32, really. After all, they’re distributed in the same directory as the files that might be tampered… GPG seems more useful in this case, since (in theory) you might have access to the keys ahead of time, assuming you actually check the keyid for correctness (instead of a different key with the same name written there).
This, of course, isn’t really relevant to whether or not the hash functions are compromised.
Hey Mook,
Funny you should mention that; GPG keys were mentioned on Twitter as a solution to this.
So obviously, GPG (used correctly) would address this problem… sort of. There’s still the bit about verifying that the key was actually created by the software vendor. Mozilla GPG signed (signs?) all of their Linux bits, but I’d imagine for largely historical reasons; it’s unclear that other than the PGP key server, the key is even validated by anyone useful.
To your point about publishing the hashes next to the bits, it’s still useful in a couple of contexts:
1. If you have a setup like Mozilla did for the longest time—volunteer mirrors—then it can still be useful to publish the checksum on your (SSL secured, hopefully) website, to validate against bits you download from a random mirror. This was especially useful in Mozilla’s case, due to the fact that bouncer somewhat obscured where you were actually downloading things from.
2. Another case (that’s much more common) is the handling of drops between partners. (I actually had a couple paragraphs on this, but edited them out. It’s often useful when communicating with partners to transfer the bits via FTP, Dropbox, or “other”, and then send a hash via email to confirm that the bits were received intact, and had the contents that was intended.
In both examples, the key point is that the hash is used in a more-or-less “out of band” context from moving the bits around… and in all cases, a hash function is a much lower barrier to entry than trying to set GPG up, import the key, and then validate the bits using the key. (I can remember going back and forth and back and forth when a particular consumer electronics vendor you might be familiar with, trying to get their side to just send me a hash to validate that we had the right bits, since there was confusion on this point. Getting them to GPG sign their drops would’ve been a total non-starter.)
So in that regard, I believe hashes are still useful, and here to stay. But we should at least use uncompromised ones, as I’m sure you agree.
Debian has been using PGP/GPG signatures forever. The thing you need is some kind of web/tree of trust, which you don’t have for arbitrary software off the internet. Perhaps some kind of DNSSEC/HTTPS rooted security mixed with signed downloads is the key. eg: I know this software I downloaded from somewhere is valid because it’s signed by your private SSL key.
Yah, the distros generally have this solved. And in their case, it’s an easy solution, since they can include the appropriate keys on the installation media, along with the package manager.
But it wasn’t always so; I can still remember back in the late 90′s/early-2000′s, where you had to forage for specific RPMs, and no one bothered checking the signatures, usually because they weren’t signed anyway. (Think Redhat 6.x/7.x era.) But even then, I never checked the checksums anyway.
It’s still useful for coordinating and it can play, I think, an important role for non-Linux operating systems.
PowerShell 4 (available on Windows 7, Windows 8.1, Server 2008 R2, Server 2012, and Server 2012 R2) has a core command for checksums – Get-FileHash It supports several hash algorithms (SHA1, SHA256, SHA384, SHA512, MACTripleDES, MD5, and RIPEMD160) and defaults to SHA256.
Steven, you do realize that basically every claim I make about Windows, I expect you to come along and say “Actually, with Powershell…” … right?
(I do appreciate someone keeping me honest on the platform’s capabilities.)
That’s what I’m here for. Windows went off and grew up over the last few years (from a management standpoint..)
[...] Paul reflects on hashing algorithms [...]