The reactions to the latest major security hole in a popular operating system have been amusing to watch. If you haven't yet read Alex Langley's analysis, do it now, it's very good.
For reference, the bug that in non-TLS v1.2 SSL connections, the signature was not checked, allowing impersonations. This was because the code skipped to the end of the method always, without actually checking the hash result was correct.
What's the root cause of such a bug? I've been looking around today and here's a short list of reasons people gave:
- It's hard to write correct code in C
- Apple engineers are bad at their jobs
- The coding style allows omiting curly braces
- No formal code reviews at Apple
- Use of gotos
- Automated merge artifact
- Not making compiler warn about dead code
- Not using static analysis tools
- Lack of testing
All of these reasons1 likely played a role. Is there a single, dominant root cause?
The diff of the code isn't really helpful for figuring this out, as the bug on line 631 seems to arise from nowhere. Perhaps a merge artifact or a stupid copy/paste error.
Given that it's difficult or impossible to give a single root cause for this bug, what can we say to do about it? Lots of people are saying that it's the lack of curly braces around the code in question. People like Zed:
Clearly the dude who wrote this Apple SSL C code didn't read my C book https://t.co/dZodpR6Aox ALWAYS USE BRACES!— zedshaw (@zedshaw) February 23, 2014
This is the 5 Whys version of bike shedding. The root cause of this problem was not a lack of curly braces. It could not possibly have been as the insertion of the extra goto makes no sense with or without braces.
Why is it that we, as programmers, often complain most about coding style and not about correctness during code reviews? While bad coding style can hide bugs, it does not in-and-of-itself cause problems. We overvalue visual consistency and undervalue correctness. If we valued correctness higher, we would not have critical code that was almost completely untested.
Using a correct coding style will not prevent most bugs, although it will catch some. Using an easier language will reduce some types of bugs. Code reviews will catch more bugs than both of those. Static analysis will catch lots of bugs.
Good testing will catch many and prevent more bugs. This is why it's critical that certain projects, such as the new Python cryptography library, have 100% code coverage. This type of glaring error could not happen there.
Untested cryptography code is broken cryptography code.
This is why it's silly to be complaining about braces. In this case, braces did make the problem harder to detect, but that wasn't the root cause, nor is using braces a complete solution. It's just bike shedding.
Except gotos. Using gotos is perfectly acceptable when using them properly, which is to use them in error handling scenarios. ↩