Old Checksums Collide Hard


Our industry still insists on using MD5 checksums. And I don’t know why.

Since 2004, it’s been widely reported that “the security of the MD5 hash function is severely compromised.” When talking about collision attacks in hash functions, the answer to whether or not they’re broken—and thus useless for the sole purpose we employ them—is not unlike answering whether or not one is pregnant.

And yet, I still see MD5 checksums floating around all over the place, providing a false sense of security.

It’s interesting to speculate on why this might be.

Obviously, there’s the inertia argument: release infrastructure is one of the least exciting places for the business to invest in, so if you’re publishing checksums at all1, changing the hash functions—”What do those even do? And does any customer use them?”—probably isn’t the top of management’s list.

(It doesn’t help that OS vendors don’t make this easier either: OS X Mavericks still ships an command line md5 tool, but nothing to easily calculate SHA12. And you wanted to easily checksum a file on Windows? Good luck with that.)

Perhaps it has to do with the fact that (oddly?) most people just don’t seem to care to validate that the bits running on their machines are what the software company intended to ship? I guess they assume that nobody is tampering with their Internet traffic? It’s obviously a point of consternation when their machine goes rogue on them, but… that never actually happens, right?

Anyway, ever since 2006, I’ve been suggesting to my clients that they use SHA1 checksums, and forgo even publishing MD5. But that advice may not be correct anymore.

At the recent Chaos Computer Club conference in Germany, researcher Rüdiger Weis presented evidence (English translation) that SHA1 should be taken as effectively broken too.

The news is particularly interesting, especially in light of the recent revelations regarding the NSA: part of his justification for arguing its compromised status is that SHA1 is in the same family as SHA03, which was developed by the NSA, without a published technical specification; it, too, is considered broken. He also notes the modification to SHA1 was made by the NSA without any technical explanation.

(Weis puts it more colorfully: “I would personally rather put a wager on Britney Spears’ virginity than as to the security of SHA-1.”4)

His recommendation for a hash function? SHA35. Unfortunately, it’s less clear what we should do.

Security guru Bruce Schneier says SHA3 isn’t necessary, and SHA512, a member of the SHA2 family, is as of 2012 “still looking good6.” It doesn’t help that even if we wanted to use SHA3, I couldn’t find a utility on Linux implementing it; sha512sum is implemented in coreutils 8.x.

Maybe we’ll all end up moving toward what Gentoo does, and brute-forcing it by validating multiple checksums, from multiple families; portage currently checks: sha256, sha512, Whirlpool and file size.

Either way, anyone care to wager on how long it’ll take for us to stop relying on SHA1 checksums to validate bits? (That is… if we ever do…)

  1. Good on you!
  2. Yes, I know you can use openssl for that purpose, but it’s still annoying
  3. And SHA2, for that matter
  4. Translation mine, with the help of Google Translate
  5. For encryption, he recommended against ECC, in lieu of 256 bit AES, 4096 RSA DHE and use a 512 bit hash function
  6. I guess he’s less worried about NSA-backdooring than Weis