sean cassidy : How to Implement Crypto Poorly

This is a summary of the talk I gave at GrrCon '16.

We're always told: don't roll your own crypto!

This has always felt like a kind of abstinence-only education to me. Of course, it's correct, that almost certainly if you decide to use your own encryption mechanism instead of say, TLS, that you'll almost certainly do a worse job than the IETF. You'll certainly fail at making a better block cipher than Daemen and Rijmen did. But there was always a sort of "don't even learn about it" tone to this recommendation to me.

Is this recommendation effective? That is, do people or companies actually roll their own crypto? Are the crypto systems they made horribly broken? I decided to find out.

The Survey

I needed an area to survey to answer this question. Where can I find lots of examples of custom cryptography? Can I find a lot of common issues in those implementations?

It turns out that there is a lot of custom cryptography in one particular place: custom single sign-on implemenations. I found 21 implementations of companies that offer some kind of custom single sign-on for their product.

Custom Single Sign-on

Single sign-on is any system which grants access to other systems by virtue of being authenticated against it. For instance, Facebook Connect is a popular single sign-on mechanism for many websites. Instead of registering with every website you use, you can sign in with Facebook and the website will get the user information it needs from Facebook directly on your behalf. OAuth2 and SAML 2.0 are examples of open standards that provide single sign-on.

But what if that's not quite what you want?

What if you want "a few lines of PHP" in order to have users be authenticated against your site? Best if it works with Wordpress and whatever weird Java 6 system some of your enterprise customers use. No need to worry about what a bearer token is and why you'd want to refresh it.

What if instead you made your own little crypto function that combined some secret and gave it to your customers, who could then authenticate their users to your service?

For instance, say Alice has a TODO list service that her customers buy. Alice buys Bob's helpdesk software so that her customers can file support tickets when they have a problem.

When one of Alice's customers wants to file a support ticket because their TODOs were missing, Alice computes something like this:

H(user's email, shared secret)

Where H is some kind of HMAC or hash function (or even something terrible only dreamt of in nightmares), and shared secret is a secret shared by Alice and Bob.

Alice then redirects the user to Bob's website, with the result of that computation and the user's email, like:

https://bob.example/alice/login?email=user@example.com&hash=59bcc3ad6775562f845953cf01624225

Bob then uses the same email address and the same shared secret, and hopefully comes up with the same hash value, 59bcc3ad6775562f845953cf01624225. If so, the user is successfully authenticated to Alice's support site, hosted by Bob. The user didn't need to register on Bob's website, so, to the user, it was seamless.

Since the user doesn't know the shared secret, the user can't compute the hash value themselves.

Common Flaws

The good thing about these custom single sign-on implemenations is that they're simple. The bad thing is that they're often dangerously insecure. For example, this bug reported in Freshdesk resulted from the name and the email being concatenated. There are plenty of tricky little bugs that can impact these systems.

For this study, I picked seven flaws that I thought would be common problems with these custom SSO solutions, and examined each solution's publicly available documentation and example code for the problems. I didn't do a deep inspection of each implementation, but rather just enough to determine if the flaw was present or not.

No HMAC

Essentially, these single sign-on implemenations are trying to pass an authenticated message by an untrusted third party, the user. The best way to do that is with an message authentication code (MAC).

An HMAC combines a hash function, a secret key, and a message in a secure way that resists length extension attacks and provides preimage resistance. Not using an HMAC or any kind of real message authentication opens up the SSO implementation to many different kinds of attacks.

Uses Obsolete Crypto Primitives

Does the implementation use known bad crypto primitives? For this, I counted HMAC-MD5 as bad, as MD5 is known bad, even though there are no known attacks against HMAC-MD5 specifically. As with some other flaws I studied, not all of these problems I wanted to identify were critical. I also wanted to study less important flaws to understand how fast or slow the adoption of new crypto primitives, like SHA-3 were.

Spoiler alert: no one used SHA-3.

Short Keys

Shared secret keys are often distributed in hexadecimal, like this:

35f7c022a53662e813952e4a7425533a

If you're not paying attenion, you might do something like this, in Java:

String secretKey = "35f7c022a53662e813952e4a7425533a";
byte[] keyBytes = secretKey.getBytes();

This does not give you 16 byte array like [0x35, 0xf7, … 0x3a]. Instead, it gives you a 32 byte array of the UTF-8 representation of the string, like this: [0x33, 0x35, … 0x61], which is almost certainly not what you meant.

If your cipher takes only the number of bytes it needs, it will leave some of the key material out! This means that if you're using a 128-bit key, it could be using only a 64-bit key. That's a massive reduction in the number of available keys.

Replay Attacks

Since the "authenticated message" is being passed to a potentially untrusted user, it's important to make sure that the message has some kind of expiration. One simple way is just to attach a timestamp that expires soon after the message is generated. Another way would be to use a nonce, which ensures that the message cannot be used more than once.

If a user can execute a replay attack, they could use another user's compromised SSO URL, or use an older URL of their own to stay logged in.

Static Initialization Vector

Block ciphers have modes. These modes make it possible to use block ciphers on more than one block of data. These mode typically require an initialization vector (IV) that's random. Some modes, like CTR and CBC, require that the IV isn't reused, otherwise it will leak information. In CTR mode, IV reuse is particularly catastrophic, so much so that some crypto experts are recommending against CTR mode.

For SSO implementations that used a block cipher, I wanted to see if they made this classic error.

Known Plaintext

Usually it's best to limit what the attacker knows. Like the secret key. Best not to share that with your attacker. But sometimes even knowing (or controlling) the plaintext can help the attacker. With well designed crypto systems, this shouldn't matter at all. Attackers could encode whatever messages they want and not learn anything about other messages or the key. But many crypto systems are not well designed, so I kept track of which implementations had plaintexts that the attacker knew or controlled.

Random Crap

This category is a bit tongue-in-cheek, and actually came about after reviewing the implementations. I noticed a lot of weird stuff that absolutely has no effect on the crypto guarantees (or lack thereof) of the system. Twiddling bits, reversing strings, taking the MD5 of the SHA-1 of the MD5 of the SHA-1 of the key, and so on.

Survey Results

Here are the aggregate problems found of the 21 custom single sign-on implementations studied:

Several of the implementations that had the short keys problems used an HMAC that did not truncate the key, therefore those aren't so much vulnerabilities as sloppy programming. Similarly, using obsolete primitives is not always immediately exploitable, but it is no longer best practice.

One implementation used a block cipher (AES) in a mode that requires the IV to be used only once, and it failed to do so.

Only one implementation was free from all problems studied.

The response from vendors was disappointing. Of the 20 implementations that had problems, nearly half did not acknowledge my vulnerability report. Two claimed that the problems I found were not bugs. Only one implementation fixed the bugs I reported.

Custom cipher

Interestingly, one implementation decided that even traditional cryptography primitives, like MD5 or SHA-1 or AES were too fancy for them and wanted to make their own. Here it is, edited for clarity:

def encrypt(plaintext, input_key):
    key = hashlib.sha1(input_key).hexdigest()
    result = ''
    for i, character in enumerate(plaintext):
        val = ord(character)
        adder = ord(key[i % len(key)])
        result = result + base36encode(val + adder)[::-1]

    return result

This code has several obvious flaws. It operates on a per character basis, which means there's no avalanche effect. It naively just adds together a hex character (0-9A-F) and the plaintext, and then base 36 encodes it. For some reason it reverses the resulting two characters, probably to add some mystery.

Here's a table of what a few iterations of this function does for a plaintext of ASCII zeroes and the key "hello":

Plaintext Key Val Adder Addition Base 36
0 a 48 97 145 41
0 a 48 97 145 41
0 f 48 102 150 46
0 4 48 52 100 2S

To reverse this, it's a simple matter of taking the ciphertext and the plaintext and undoing the operations that were performed to get the secret key. "Val" is the ASCII value of the plaintext, "Decimal" is the decimal value of the base 36 number, "Subtract" is what happens when you subtract those two, and the key is the ASCII representation.

Plaintext Base 36 Val Decimal Subtract Key
0 41 48 145 97 a
0 41 48 145 97 a
0 46 48 150 102 f
0 2S 48 100 52 4

Here's the code for exactly that:

def get_key(plaintext, ciphertext):
    if len(ciphertext) != 2 * len(plaintext):
            return None

    key = ''
    for i, character in enumerate(plaintext):
            base36 = ciphertext[i * 2: i * 2 + 2][::-1]
            value = int(base36, 36)
            key = key + chr(value - ord(character))

    return key

There's probably a more clever ciphertext-only attack that's possible because of how bad this cipher is, but I didn't do it because the attacker has access to the plaintext and the ciphertext in this attack. For anyone that has done any cryptanalysis or CTFs, this "encryption" function is a joke.

When the attacker gains the shared secret key, the attacker can then impersonate any user, including admins. This is a classic privilege escalation attack, done over single sign-on.

Takeaways

Should you roll your own crypto? No.

Do people roll their own anyway? Yes.

The standard recommendation from the security community about learning and implementing your own cryptography has been to avoid it. Let cryptographers do cryptography. However, it's clear from these results that people will implement their own crypto regardless.

This strikes me as similar to absintence-only education. We've tried telling everyone not to write custom crypto code, but because of product demands, ignorant developers, or hubris, that hasn't been successful. I think it's time we try a different tactic: teach everyone crypto. Make it a standard part of being a software engineer. Everyone should at least know what an HMAC is and why it should be used, what an initialization vector is and how to handle them, and how to securely generate random numbers (hint: use /dev/urandom).

There should be simple, imitable implemenations of various real world problems that cryptography solves. No one wants to make insecure systems. We need to make it harder to mess up.

And, should you learn cryptography? Yes.

Resources for learning cryptography

I covered this in an older blog post of mine, "So, you want to crypto", but it needs updating. So here's the updated version:

Courses

A good place to start is Cryptography I at Coursera. Your local university's cryptography course is also handy. Formal courses are good for a foundation in some of the mathematics behind cryptography.

Books

My favorite practical cryptography is far and away Cryptography Engineering by Ferguson et al. It combines the best aspects of theoretical knowledge and practical experience. However, there are many good cryptography books. Some, like Introduction to Cryptography with Coding Theory, is completely on the theoretical/mathemetical side, but also covers some historical ciphers, which sometimes pops up in CTFs.

Learn to break it

Learning to break cryptography is perhaps the most effective way. After you get a solid foundation from an introductory course or book, I'd recommend doing the Cryptopals Crypto Challenges. This will get you thinking about how crypto systems break, which is arguably the most important way to design and examine them.

Learn by imitation

Look at pre-existing solutions to problems. Most crypto protocols have had several versions where they fixed critical bugs. That's interesting because you can learn from their mistakes. Implementations that I'd recommend are: AWS Authentication, OAuth2, and Double Ratchet. Why were they designed the way they were? What flaws do they have? Is their complexity necessary? Are they too simple?

Articles

  1. Your Own Verifiable Hardware RNG with bladeRF SDR
  2. Sherlock Holmes Debugging
  3. Your Interface is what Matters
  4. So, you want to crypto
  5. Hackers and Engineering School