The sociology of open-source security fixes, continued

Why a second anecdote about security bugs in open-source software

In a previous episode, having just reported a vulnerability in the open-source, connected program cpuminer, I uselessly worried about what to do. In fact, there was no urgency and no responsibility, as the vulnerability had already been advertised a couple of months earlier on the fulldisclosure mailing list.

The same thing happened again recently, with a different enough conclusion to warrant discussion.

tiny-AES128-C, a “small and portable implementation of … AES128”

So tiny-AES128-C is the name of this project on GitHub. There is no documentation in the form of a separate file from the implementation, but the header file says:

CBC enables AES128 encryption in CBC-mode of operation and handles 0-padding.

The words “handles 0-padding” mean that although AES works on 16-byte blocks, the library will handle a message length that is not a multiple of 16 by padding the last block with zeroes for you.

Or will it? Meanwhile the comment in the implementation file says:

NOTE: String length must be evenly divisible by 16byte (str_len % 16 == 0).
You should pad the end of the string with zeros if this is not the case.

When in doubt, use the source! No, not this source, the non-comment part of the source:

void AES128_CBC_encrypt_buffer(uint8_t* output, uint8_t* input, uint32_t length, …)
{
  intptr_t i;
  uint8_t remainders = length % KEYLEN;
              /* Remaining bytes in the last non-full block */

So despite some confusion in the documentation available in the form of comments, it looks like last non-full blocks are supported after all. But the attention of my colleague Vincent Benayoun was drawn to what the function AES128_CBC_encrypt_buffer does next:

  for(i = 0; i < length; i += KEYLEN)
  {
    XorWithIv(input);
    …
    input += KEYLEN;
    …
  }

A partial specification of XorWithIv, if the library contained one, might look like:

/*@ 
  requires \valid_read(Iv + (0 .. 15));
  requires \valid(buf + (0 .. 15));
  assigns buf[0..15] \from Iv[0..15], buf[0..15];
*/
void XorWithIv(uint8_t* buf);

In other words, XorWithIv assumes that the pointer passed to it points to a complete block, and exclusive-ors this block in place as part of CBC mode.

Concretely, say you are using tiny-AES128 to encrypt a 33-byte input buffer, like this:

char key[16] = …
char iv[16] = …
uint8_t input[33] = …
uint8_t output[48]; // rounded up to next multiple of block size
AES128_CBC_encrypt_buffer(output, input, 33 /*length*/, key, iv);

This would mean that AES128_CBC_encrypt_buffer will call the function XorWithIv successively with argument input, input + 16 and input + 32. In this last case, XorWithIv will replace bytes from input + 32 to input + 47 with the exclusive-or of their previous values with the last encrypted block.

To rephrase, an attacker that can cause the input length to be 33 can force 15 bytes past the end of input to be xored with values that they shouldn't have been xored with: the second block of the output message, which CBC mode specifies must be xored with the third input block before processing it.

If the attacker knows all of the initialization vector, key and the 33 bytes of contents of input and can influence at least one of them, then the attacker can choose what at least some of the bytes between input + 33 to input + 47 will be xored with. A single computer and a bit of patience will let the attacker brute-force a 32-byte message in the AES image of which 4 bytes chosen in the second block have the exact value convenient for each for the attacker's purposes. One can imagine that pre-computed tables and massive computational resources would let a prepared attacker choose the values of between 6 and 15 bytes in the encrypted second block, xored in memory past the input buffer.

No need to report

As in the case of cpuminer, the vulnerability had already been found a few weeks earlier. This time, there wasn't even a need to inform the developer specifically, as a fix had been submitted to them on GitHub as a pull request, but so far, ignored.

Vincent wasn't even trying to find vulnerabilities in tiny-AES128-C: the goal was to demonstrate that it didn't have any. So he naturally continued his work with Andreas Wehrmann's fixed version.

Things became interesting when Vincent found that a small problem remained in that version for some input lengths. What to do? The author of tiny-AES128 had not even reacted yet to the very serious buffer overflow reported to him weeks earlier, would they do something about a comparatively less serious out-of-bounds read that can only result in denial of service?

The sociological experiment begins with Vincent submitting a pull request to Andreas Wehrmann, to fix the problem in his fork. And what happened next was that although Andreas had apparently done little more with his GitHub account than submitting the tiny-AES128 bugfix he had found, and had not used GitHub since, the pull request was accepted within 24h.

Conclusion

In this second anecdote, the original author had abandoned their software, but a kind stranger on the Internet who had already provided a fix for the egregious bugs in it graciously accepted the responsibility for proof-reading and validating a fix for the issue that had been left over. Again, fixes happened only very quickly or not at all. Again, if you plan to depend on a piece of open-source software, the anecdote shows that you would do well to do your own Google search for known security vulnerabilities in it. You shouldn't assume that publicly known vulnerabilities have been fixed in the software component.

This anecdote also shows the other side of the GitHub-does-not-allow-private-reports coin. On the one hand, one would sometimes like to report security-sensitive issues privately. On the other hand, if the software has been abandoned by its author, it may good to know that security issues have been left without resolution, or that an issue you would give the author some time to fix was already reported months ago.

 

Vincent was working on the RAPID project AUROCHS, funded by the Direction Générale de l'Armement, while he found both the known bug and the unknown bug in tiny-AES128 described in this post.

When in doubt, express intent, and leave the rest to the compiler

Expressing intent, illustrated on an example

I sometimes get asked what stylistic choices a C developer could make that would help their code work well with static analysis or formal verification. I usually do not have a good answer—the reader of this blog or of the previous one may have categorized me as a detail-oriented person; when thus solicited, my first instinct is to look for concrete examples, but stylistic choices in programming are full of trade-offs. I might provide an answer such as “Some verification tools may have difficulties with goto, so you could try to avoid goto as much as possible. Of course, the analyzer I work on has no difficulties with goto, it is only that other techniques exist with which goto is an inconvenience. But of course if you are implementing error-handling in a C function that allocates resources, then you should use goto, because there is not really a better way, unless…” This is about the time I notice that whoever asked has gone get themselves another cocktail and a better discussion group to mingle with.

The general, non-detailed answer that I should give, and I am writing it down here for future reference, is that when in doubt, the developer should choose to express clearly the intent of their program.

Imagine that, while reviewing foreign code, you arrive to the snippet below, except that instead of being isolated inside a function, it is three lines of code within another hundred that you are trying to make sense of.

void * f2(int c, void * p1, void * p2) {
  void * r = p1;
  uintptr_t mask = -c;
  r = (void*)(((uintptr_t)r & mask) | ((uintptr_t)p2 & ~mask));
  return r;
}

What does the above code do?

The answer is that it is nonsensical. It mixes some bits from a pointer p1 with bits from a pointer p2. Perhaps we can assume that p1 and p2 point to blocks that are aligned to the same large power of two in memory, and this code is simply copying the offset of p2 inside its own block to a similar offset within the block inside which p1 points? That would depend on the value of c. You look to the code above the snippet and painfully infer that c can contain only 0 or 1. At least that explains it: the code, for c containing 0 or 1, is intended to be equivalent to the one below:

void * f1(int c, void * p1, void * p2) {
  void * r = p1;
  if (!c) r = p2;
  return r;
}

The developer’s idea may be that the f2 version is faster, because it doesn’t contain any expensive conditional branch. Let us take a look:

$ clang -O3 -S -fomit-frame-pointer -fno-asynchronous-unwind-tables cmov.c
$ cat cmov.s
…
_f2:                                    ## @f2
## BB#0:
	negl	%edi
	movslq	%edi, %rax
	andq	%rax, %rsi
	notq	%rax
	andq	%rdx, %rax
	orq	%rsi, %rax
	retq

The compiled code for f2 is beautiful: a straight sequence of arithmetic instructions that modern processors can fetch and decode well in advance to sustain a rhythm of one instruction per cycle or more. But not much more in this case, as most instructions depend immediately on the result of the preceding instruction. At least the code doesn’t contain any conditional branches that might get mis-predicted and inflict a penalty of several tens of cycles.

What about f1?

_f1:                                    ## @f1
## BB#0:
	testl	%edi, %edi
	cmoveq	%rdx, %rsi
	movq	%rsi, %rax
	retq

Looking at the compiled code for f1, it does not contain any expensive conditional branch either! It uses the “conditional move” instruction. Just after a testl %edi, %edi instruction, the cmoveq %rdx, %rsi instruction has the same meaning as if (%edi == 0) %rsi = %rdx; would have in C, but the execution flow does not depend on %edi, allowing efficient fetching, decoding and execution at one instruction per cycle.

When producing IA-32 code, you might need a commandline option to tell the compiler to emit this instruction, as it does not exist in every IA-32 processor in history. The CMOV instruction was only introduced in the Pentium Pro processor, in 1995, and you might encounter a processor that does not support it if you are in some kind of computing museum. Say “hi!” to the Missile Command arcade game for me.

Here, my compiler is producing 64-bit code by default (x86-64), and every x86-64 processor has the CMOV instruction, so I do not even need to think about whether I want 20-year-old processors to execute f2 branchlessly. The dilemma is solved for me by such an x86-64 processor not exiting at all.

In the compilation of f2, the compiler needs to produce code that works for all values of c, because it has not been provided with the information that c contained only 0 or 1. If we do provide the compiler with that information, it can do a better job at compiling it to fast code:

void * f3(int c, void * p1, void * p2) {
  void * r = p1;
  c = !!c; // force c to be 0 or 1 only
  uintptr_t mask = -c;
  r = (void*)(((uintptr_t)r & mask) | ((uintptr_t)p2 & ~mask));
  return r;
}
$ clang -O3 -S -fomit-frame-pointer -fno-asynchronous-unwind-tables cmov.c
$ cat cmov.s
…
_f3:                                    ## @f3
## BB#0:
	testl	%edi, %edi
	cmoveq	%rdx, %rsi
	movq	%rsi, %rax
	retq

This is the same code as for f1! The pattern under discussion was actually common enough that the authors of Clang have had to invent an optimization to detect that the code implements a crude branchless conditional move and compile it to an efficient one. The optimization only works if the compiler can also infer that the variable c can only contain 0 or 1, though.

Are you trying to write constant-time code?

One, perhaps the only, justification for the pattern above is that the developer is trying to write code that executes in constant time. Some architectures do not have a “conditional move” instruction and it may be important that the snippet executes in constant time even on these architectures.

Parenthesis: the world is not only x86-64 and post-1995 IA-32, after all. There is also ARM. Though I hear that ARM’s 32-bit instruction set beats everyone else at who has more conditional instructions.

It is a jump of faith to write if (!c) r = p2; and hope that the compiler will generate a constant-time CMOV instruction for it. In fact, a naïve compiler or one aiming for an unusual architecture might generate an assembly conditional branch instruction just for the !c part!

Parenthesis: hoping that the compiler generates a CMOV instruction assumes that CMOV can be relied on to execute in constant time. This has been true historically (CMOV is really the hardware version of (((uintptr_t)r & mask) | ((uintptr_t)p2 & ~mask))), and can expected to remain true for register operands (there is no point in making an instruction with low latency have variable execution time), but it could cease to be true in future generations of processors when one of the operands is a memory location.

It is a slightly narrower jump of faith to write r = (void*)(((uintptr_t)r & mask) | ((uintptr_t)p2 & ~mask)); and hope that the compiler will not generate code the execution time of which depends on c. But as the example of f3 above shows, there is no limit to the sophistication of the transformations a compiler may apply. Pray that the compiler doesn’t find a non-constant-time sequence of instructions that is faster than what you have written (or that it thinks is faster than what you have written). Or, if you are an atheist, revisit Pascal’s wager: you have nothing to lose, and you stand to win a chance of having a chance that the compiled code is constant-time!

The only strictly safe solution is not to write the code that has to be constant-time in C, because C does not allow to choose the execution time. A softer variant of the safe solution is to proof-read the assembly code generated by the compiler, perhaps in an automated fashion, grepping for any instruction the execution time of which varies. In the most recent generations of Intel processors, the execution time of integer division depends on its inputs, so this instruction should be avoided by principle, too.

Express intent!

Coming back to the initial exhortation, what do I expect from the developer?

  • if their intention is to assign either p1 or p2 to r as fast as possible, they should write if (!c) r = p2;, which any compiler worth its salt can transform to the most elegant sequence of instructions for the target architecture.
  • if their intention is to assign either p1 or p2 to r in constant time, they should write r = constant_time_select(c, p1, p2);, which expresses exactly what they are trying to do. The name constant_time_select can refer to a macro, or an inline function, or even a normal function (we said we are aiming for constant time, not speed. If the function is not inlined it will be easier to check that it was compiled to constant-time assembly code).

This matters to anyone who has to process the code. It matters to the human reader. It matter to C static analyzers loosely based on the principles of abstract interpretation. Our first reaction to the code early in this post was that it is nonsensical. This is the correct reaction for arbitrary values of c. An analyzer working its way on the code in which the snippet is included may not have inferred that c is always 0 or 1 here. Even if the analyzer has inferred this information, unless it contains special pattern-detection for the idiom at hand, it will attempt to extract the meaning of (void*)(((uintptr_t)r & mask) | ((uintptr_t)p2 & ~mask)) for general values of c, p1 and p2, because it gives meaning to programs by application of a small set of general rules that are designed to describe all cases. The analyzer will then, quite correctly, just like we did, arrive to the conclusion that the code is nonsense. If a function (or even a macro) with an explanatory name is used, then the operator can easily enough, for the purpose of functional verification, change the function’s implementation to its functional if-based equivalent.

It matters to other types of analyzers, too. Consider the popular testcase generator american fuzzy-lop. afl measures, and aims at maximizing, coverage. With the branchless version based on bit manipulations, afl cannot know that there are two cases to cover. If the same if-based equivalent is used, afl will know that there are two separate behaviors to test here, and can aim at producing testcases that exert both.

Conclusion

I will leave the safety of my detail-oriented habits, go out on a limb, and state that expressing intent in code is good for everyone.

Acknowledgments: Ben Laurie, Stephen Canon and John Regehr provided some of the parenthetical notes in this post.

Handling security bugs is not the Open-Source community’s strong point

A funny story

Here is a funny story. Earlier this century, I moved into an apartment with electrical heating, so that during Winter, electricity is free(*) while I am home. Over the course of several years, I helped simulate proteins folding onto themselves or look for the optimal Golomb ruler of order 27. Last year, I decided to teach myself SIMD assembly by hacking an scrypt cryptocurrency miner. This is something you cannot do with protein-folding or Golomb-ruler-enumerating software: the results of these are centralized and the central authority, benevolent as it may be, does not trust you not to mess up the software if you got access to the source code. With cryptocurrencies, it’s every man, woman or fish for themselves, so if you can write a better miner, good for you. And if you introduce a bug and send bad results, you can trust your pool to promptly ban you (the protocol between pool and miner is designed to allow the pool to check the validity of proof-of-work submissions by miners, if only because rewards are distributed pro rata of the work done). Earlier this year, I downloaded the source code of a client and started tightening things a bit.

Earlier this week, while my eyes rested on a bit of code that was not part of the computational core, what looked like a serious security bug stared back. A malicious pool could send extra-long strings to a miner, who would, at the time of preparing a reply, incorporate these strings in its answer. The answer was prepared in a buffer of a fixed size, on the stack, and it did not seem like lengths were checked at any point. The buffer was large enough to contain any substrings of reasonable lengths, but the client did not check that the server only sent reasonable substrings to incorporate.

(*) as long as it is 100% converted to heat, up to the point where the electric radiator’s thermostat remains off all the time

A bug report

Long story short, I reported the bug today. I only knew the author of the source code from their GitHub account, and GitHub only seems to have public bug reports, so my report was public, but who looks at bug reports for network-connected software on GitHub? Not the NSA I hope!

The bug was fixed in 40 minutes, faster than the nearest Chinese restaurant delivers takeouts on Saturday nights. This is an incredible response time, so I am going to say it again: on a Saturday, 40 minutes after reporting the bug, a fix was available for anyone to use. The fix looks sound, although Miod Vallat would like to point out that the result of malloc should be tested.

The problem now, of course, is the propagation of this bugfix. Cryptocurrency miners are Open-Source at its frictionless best. “Bazaar” does not even begin to describe the joyous mess of contributions. Anyone with an idea can fork an existing client, and many have. The code I started from was already a fork of a codebase known as “cpuminer”, and the version I used as starting point has also been used as the basis of video-card-targeting miners, which change the computing paradigm but reuse the network code. The ancestry goes all the way back to a Bitcoin-only client.

My initial idea was that I would wait for 90 days, and then look where the bugfix had propagated and make a blog post about that. I started to worry that despite reporting the bug to the place where I had found the software, I had handled the situation carelessly, so I started to investigate the means at my disposal to have the issue registered in a CVE.

The punchline

This is when I found that I didn’t have anything to do: the vulnerability I had stumbled upon had already been made public in 2014. On the plus side, this meant the study I wanted to make three months from now about the propagation of security fixes in decentralized Open-Source software could be done immediately.

So what is the current state of mining software, since a serious vulnerability was published three months ago? Until this afternoon, you could compile pooler-cpuminer from sources, using the absolute latest GitHub snapshot, and get software with a vulnerability that had been public for three months. If, last month or today, you google for a binary version of a miner for an Nvidia video card, you are likely to get an archive called something like CudaMinerOSX-2014-02-28, which predates the publication of the bug and appears to include it according to a quick disassembly:

$ otool -V -t ~/CudaMinerOSX-2014-02-28/CUDA-5.5/cudaminer.10.9
…
_submit_upstream_work:
00000001000031c0 pushq  %r14
00000001000031c2 pushq  %r13
00000001000031c4 movq   %rsi, %r13
00000001000031c7 pushq  %r12
00000001000031c9 pushq  %rbp
00000001000031ca movq   %rdi, %rbp
00000001000031cd pushq  %rbx
00000001000031ce subq   $0x190, %rsp
…
00000001000032ff leaq   0x3811a(%rip), %rcx     ## literal pool for: "{\"method\": \"mining.submit\", \"params\": [\"%s\", \"%s\", \"%s\", \"%s\", \"%s\"], \"id\":4}"
0000000100003306 movq   %rax, (%rsp)
000000010000330a xorl   %eax, %eax
000000010000330c leaq   0x20(%rsp), %rdi  ; First argument of sprintf
0000000100003311 movq   %r12, 0x10(%rsp)
0000000100003316 movq   %rbx, 0x8(%rsp)
000000010000331b callq  0x100039816             ## symbol stub for: ___sprintf_chk
…

The above ready-made binary is what I have been using to convert additional Watts to warmth. I downloaded a binary version because XCode no longer includes GCC and Clang lacks some critical component to compile CUDA code and computers are horrible. But would I have fared better if I had compiled from source? Apparently not, unless the lengths of work->job_id and xnonce2str are tested elsewhere.

Conclusion

Recent years have shown that the “many eyes” quip did not work for security bugs. One can see why it would work for functional bugs: with enough motivated users of the same software, you stand a good chance that the bug that could have annoyed you has already annoyed someone else who was able to fix it. This reasoning does not apply to security bugs. The one described here does not prevent any of the many clients to work with a pool that does not send 500-byte-long job ids.

The anecdote in this post shows another bleak trend: not only Linus’ law does not apply to security bugs, but at its most disorganized, the Open-Source community does not seem to be very good at propagating, for known security bugs even, either fixes from source to binary distributions nor the information that a vulnerability exists from fork to fork.

Ack: in addition to pooler fixing the discussed bug crazy fast and Miod Vallat pointing out flaws in pooler’s patch even faster, John Regehr once provided a link to David Andersen’s story that I cannot find again now. David Maison and Laureline Patoz helped with the general tone of the post.

Printing an unsigned char with %x or %u is not absolutely correct

The C standard library function printf() takes a format string and subsequent arguments of various types. On an architecture where arguments are passed on the stack, the format string tells the printf() function what types it should interpret the blob of arguments with. On an architecture where printf()‘s arguments are passed by (possibly specialized) registers, the format string tells printf() in what registers it should look for them.

For this reason, the arguments of printf() after the first one should match the indications of the format string. The C11 standard leaves no room for deviation:

7.21.6.1:9 […] If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

A common C pattern used for printing bytes is the following:

#include <stdio.h>

unsigned char u = …;
printf("%02x", u);

The format %x is well-known for expecting an unsigned int. The problem in the above code snippet is that printf() is not passed an unsigned int. “Default arguments promotions” are applied to printf()‘s arguments after the format string. On most architectures, an unsigned char is promoted to int, because int can represent all the values contained in unsigned char.

This mistake is quite harmless, because the types int and unsigned int are guaranteed to have the same representation, and it seems unlikely that an ABI would pass int and unsigned int in different registers. However, if you like writing for the C standard instead of for the couple of platforms that happen to be popular at the moment, you had better use the line below instead:

printf("%02x", (unsigned int) u);

This time, the type of the second argument clearly matches the format string in the first argument.

An implementation of strings that you can trust to the end of the world

Last week, security researcher lcamtuf blogged about CVE-2014-8485, a vulnerability in the Unix strings command. The strings command is a popular way to get hints when stuck in the venerable ADVENT game. Some may also use it to reveal sequences of printable characters other than the verbs in ADVENT, but the truth is that running strings on a modern binary file is a depressing experience. On my computer, strings `which frama-c` shows that the 15MiB binary contains 111 copies of the string “volatile”, 53 copies of the identifier name “vlogic_ctor_info_decl”, and two copies of “verify_unmodified: %s has conflicting definitions. cur: %a”, a debug message not intended to ever be seen by the user. And I am only going through the Vs!

In fact, the entire Turbo Pascal compiler could probably fit in the space occupied by useless messages that start with “V” in Frama-C:

$ strings `which frama-c` | grep ^[vV] | wc
3638 4788 46623

Coming back to the vulnerability, it came as a shock to Unix users—unfortunately that vulnerability name was already taken. The purpose and function of the strings command is to scan binary files of unknown origin and/or mysterious design, and suddenly lcamtuf tells us that doing so could lead to arbitrary code execution!

This has prompted a series of very short implementations of the basic functionality that strings is supposed to provide, in Python and in C. Depressing references to the evolution of the sizes of binaries notwithstanding, there is no reason for strings to be written in C, except perhaps if you anticipate that you will need to run it from a boot disk. Python seems, on the face of it, like an excellent choice. Java or OCaml would also have been reasonable. Regardless, Twitter user nils offered at one point the C implementation below:

a;main(c){for(;~c;putchar(isprint(c=getchar())?a++,c:a<4|(a=0)?13:10));}

And at another point, athoshun proposed the shorter:

c;main(a){while(~c)putchar(isprint(c=getchar(++a))?c:(6>a)*3+9+(a=1));}

These two definitions are remarkably concise; they both rely on the same trick of moving some of the responsibility onto the terminal by clever use of the “carriage return” character 13 in ASCII. This means that these implementations fool the user into thinking that they work, but the illusion doesn't hold when the output is not directly interpreted by a terminal:

$ ./a.out < a.out __PAGEZERO … dyld_stub_binder $ ./a.out < a.out | more ^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M^M ^M^M^M^M^M^M^M^M^MH^M^M^M__PAGEZERO …

Ignoring the non-standard prototype for main and the concise declarations a; and c; that are par for the C code golf course, the versions above are remarkable in that they both invoke undefined behavior in line 1:

warning: undefined multiple accesses in expression. assert \separated(&a, &a);

This is just a blip caused by the race to conciseness. The expression a<4|(a=0)?13:10 is intended as a shorter alternative to a<4||(a=0)?13:10, but unfortunately, | does not introduce the sequence point that is needed between the assignment a=0 and the use a<4. The situation is similar with (6>a)*3+9+(a=1) in the second implementation.

Other versions of strings, written by the same authors, that do not suffer from the unsequenced-side-effects undefined behavior, are:

a,c;main(){for(;c+1;putchar((isprint(c=getchar()))?a++,c:a<4||(a=0)?13:10));} a=9;main(c){while(~c)putchar(isprint(c=getchar())?a/=2,c:(a=a?13:10));}

Are these safe? We need a specification for getchar() to find out:

/*@ ensures -1 ≤ \result ≤ 255 ; */
int getchar(void);

Without this specification, it looks like c+1 in the first implementation might be a signed arithmetic overflow (which is undefined behavior). With the specification, it is clear that it isn't, since c is initially 0 and then the result of getchar(), which is at most 255.

Once the specification is provided:
$ tis-analyzer good_string1.c

warning: signed overflow. assert a+1 ≤ 2147483647;

$ tis-analyzer good_string2.c

The first implementation still has a slight issue: tis-analyzer is unable to demonstrate that, with the information provided, the variable a does not eventually overflow. Looking at the code, we can see that this can only happen for sequences of printable characters longer than 231 characters. A simple way to ensure that it doesn't is not to trust the outside world with files larger than 2GiB.

The second implementation is free of undefined behavior for as many characters as you want to feed it. Since the functionality it intentionally provides is not the remote execution of arbitrary code, you can pass any sequence of character to it confident in the knowledge that it will not accidentally provide this functionality through undefined behavior to the malicious author of a perplexing file. Consider this implementation recommended. In fact, I'll run it on frama-c right now:

$ ./a.out < `which frama-c` … … … … oh, no! … … … _unlink _usleep _utime _wait _waitpid _write dyld_stub_binder

In conclusion, much of the software around us seems to be too complex for its own good: OpenSSL, Bash, strings, … but we do not have to use the complex versions. Simpler implementations of just the right functionality are also available or can be written—though aiming for less than 80 characters is a bit extreme. It is easier to trust that the simpler versions do what you want and only what you want. When necessary, the simpler implementations can be formally verified to reach greater yet levels of confidence.

Bash bug: failure of formal approaches to programming?

You may have heard about Shellshock, a long-standing, recently-discovered bug with serious security consequences.

Julien Vanegue commented about this security failure on Twitter:

The bash bug is another shiny example which seems to completely escape traditional methods of automated deductive program verification.

This was my initial reaction too. A previous security failure, Heartbleed, was a recognizable programming error that, to be found, required no explicit specification. An OpenSSL server, astutely manipulated by a distant adversary, read up to 64 KiB from a much smaller buffer, thus getting whatever data happened to lay next in the server’s memory, and sent that data back to the adversary. Reading beyond the bounds of the buffer was the programming error. This error is comparatively easy to recognize because such an operation is never part of the intentional behavior of any program. It may or may not create a security flaw: in the case of Heartbleed, the unfortunate follow-up to the out-of-bounds read was to send what had been read verbatim to the adversary. But out-of-bounds accesses are never good; an automatic verification tool doesn’t need to worry that one it thinks it has found may be normal; you don’t need to worry that you will remove necessary functionality if you fix it; and if you fix all of them, you can be confident that no security flaw based on such an out-of-bounds access remain in the software.

In contrast, the problem with Shellshock is that Bash, a program the sole purpose of which is to execute arbitrary commands, sometimes executes arbitrary commands. One would have to formalize the specification of what arbitrary commands Bash is and isn’t supposed to execute at a very fine level of detail before the incorrect behavior exploited by Shellshock stood out.

Now, I work at a company that sells products and services based on the sort of formal verification techniques that, used in isolation, aren’t practical to identify the Bash bug. And Julien used to work on techniques of the very same category, but he has recently widened his interests to include more facets of information security. So our common conclusion both saddens me and vindicates him. No wonder he commented before me.

Still, since Julien has commented and I have been thinking about the same subject, here are some additional remarks of my own:

  • First, there is a bit of survival bias in the observation that this flaw existed despite automatic verification techniques of all kind. Bugs are found routinely with these techniques, in both legacy and new code. People who find them fix them and move on to the next one. We don’t hear about them so much when they are found before the software is released, or when they are found as part of a collection of bugs found automatically.
  • TrustInSoft provides “verification kits” for Open-Source components. The first such verification kit is for PolarSSL, an SSL library providing the same kind of functionality as OpenSSL. One piece of information provided in the verification kit is a list of the library functions that PolarSSL, configured as in the kit, calls. This list does not include system, exec, or any system call that would similarly give great power to whoever was able to control its arguments. This confirms that executing arbitrary command is not part of the intended functionality of PolarSSL as configured. The verification kit also demonstrates why PolarSSL does not unintentionally provide this functionality, by showing that it doesn’t have the sort of undefined behavior that caused the problem in the case of Heartbleed. If a customer asked for a verification kit for Bash, we wouldn’t say that exec isn’t called from Bash, nor that we can confirm that there is no dataflow between user inputs and the arguments of exec. We would probably, instead, work with the customer to understand how Bash came to be part of the security equation and how to remove it.
  • Finally, is Shellshock a failure of all formal techniques and tools that have been created since people started to write programs that manipulate programs? No, quite the contrary. Formalisms exist to describe the shape of a well-formed program. These grammars are even called “formal grammars” to distinguish them from natural language grammars. The parsing bug we are talking about is typical of hand-written parsers. Using widely available tools to mechanically generate the parser from a formal grammar does not make bugs impossible, but it makes much less likely both unintended functionality like Shellshock and undefined behavior like everyone is looking for since we have been reminded that Bash parses environment variables. In other words, Bash executing code after a closing brace is a sign that old, proven formal tools should be used more, not that they failed.
  • To conclude, Shellshock’s gravity should not be underestimated, because of its impact for old systems initially intended for a more benevolent Internet. But any recently designed system should already be avoiding Bash, or at the very least avoid feeding Bash any input that comes from the outside world. Here is another take on the same topic. Key quote: “A robustly engineered program should intersect with bash, or any other Turing-complete system, at as few points as possible”.

    A dangling pointer is indeterminate

    This blog post illustrates a lesser-known case of C undefined behavior, that is, using the value of a dangling pointer in a way that most developers consider harmless, such as pointer arithmetics or as operand of a comparison. Anyone who has ever had to debug an erratic C program knows that dangling pointers should not be dereferenced. In truth, according to the C standard, all further uses of the value of a pointer after the lifetime of the pointed object are forbidden. The post provides examples of apparently harmless uses of a dangling pointer, although they are still invalid. One is taken from the current snapshot of ntpd.

    A synthetic example

    Here is a first example:

    
    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    int main(int argc, char *argv[]) {
      char *p, *q;
      uintptr_t pv, qv;
      {
        char a = 3;
        p = &a;
        pv = (uintptr_t)p;
      }
      {
        char b = 4;
        q = &b;
        qv = (uintptr_t)q;
      }
      printf("Roses are red,\nViolets are blue,\n");
      if (p == q)
        printf ("This poem is lame,\nIt doesn't even rhyme.\n");
      else {
        printf("%p is different from %p\n", (void*)p, (void*)q);
        printf("%"PRIxPTR" is not the same as %"PRIxPTR"\n", pv, qv);
      }
    }
    

    What does the above program do? Many would see that it depends whether the compiler allocates the variables a and b at the same address. Without optimization, a and b are likely to be allocated to different stack slots, and the program prints a beautiful poem:

    $ clang t.c && ./a.out 
    Roses are red,
    Violets are blue,
    0x7fff53c28bbf is different from 0x7fff53c28bbe
    7fff53c28bbf is not the same as 7fff53c28bbe
    

    Even at the minimum level of optimization, another compiler may allocate a and b at the same address, since they are never simultaneously alive:

    $ gcc t.c && ./a.out 
    Roses are red,
    Violets are blue,
    This poem is lame,
    It doesn't even rhyme.
    

    Finally, in some cases, one may get the output below:

    $ gcc -O2 t.c && ./a.out 
    Roses are red,
    Violets are blue,
    0x7fffb4e003bf is different from 0x7fffb4e003bf
    7fffb4e003bf is not the same as 7fffb4e003bf
    $ gcc -v
    …
    gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 
    

    The above result introduces a whole new branch of mathematics : the execution enters the branch where p != q, and then proceeds to display values for p and q that make the condition p == q true. This would make an excellent pitch for a special programming episode of The Twilight Zone.

    You may think that p and q being passed to function printf() is the compiler’s excuse for generating an executable that behaves strangely, but in fact this printf() can be commented out, leaving only pv and qv, which are integers computed while the addresses are in-scope, to display strange values. The undefined behavior justifying the above output is indeed only the condition p == q.

    An explanation

    The clause 6.2.4:2 of the C11 standard defines the lifetime of an object:

    The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

    The third sentence of this paragraph is the well-known dangling-pointer-dereference undefined behavior, cause of the weakness CWE-416. The subject of this blog post is the last sentence: the value of dangling pointers itself. It is not only the pointed object but any copy of the value of a freed address that cause undefined behavior if used.

    Dangling-pointer undefined behavior in the wild

    A concrete instance of this kind of undefined behavior can be found in ntpd 4.2.7p446. At line 4421 of file ntpd/ntp_io.c, the definition of function delete_interface_from_list is:

    static void
    delete_interface_from_list(endpt *iface)
    {
        remaddr_t *unlinked;
    
        do {
            UNLINK_EXPR_SLIST(unlinked, remoteaddr_list, iface ==
              UNLINK_EXPR_SLIST_CURRENT()->ep, link,
              remaddr_t);
    
            if (unlinked != NULL) {
                DPRINTF(4, ("[...] %s [...] #%d %s [...]\n",
                  stoa(&unlinked->addr), iface->ifnum,
                  iface->name));
                free(unlinked);
            }
        } while (unlinked != NULL);
    }
    

    In this function, when the pointer unlinked is not null at the if statement, it is freed and, according to the standard, its value becomes indeterminate. The condition of the do while loop is thus an undefined behavior for those iterations.

    In order to avoid this undefined behavior, this function could be rewritten by changing the do while loop into an infinite loop (using a while (1) construct) and exiting the loop with a break; statement when the pointer unlinked is known to be null (in the else branch of the conditional).

     static void
     delete_interface_from_list(endpt *iface)
     {
         remaddr_t *unlinked;
    
    -    do {
    +    while (1) {
             UNLINK_EXPR_SLIST(unlinked, remoteaddr_list, iface ==
               UNLINK_EXPR_SLIST_CURRENT()->ep, link,
               remaddr_t);
    
             if (unlinked != NULL) {
                 DPRINTF(4, ("[...] %s [...] #%d %s [...]\n",
                   stoa(&unlinked->addr), iface->ifnum,
                   iface->name));
                 free(unlinked);
    +        } else {
    +            break;
             }
    -    } while (unlinked != NULL);
    +    }
     }
    

    Conclusion

    The ntpd software package is a venerable program, and if a compiler optimization caused it to misbehave, the fact would likely have been noticed. There are no security consequences to this bug as far as one can tell.

    One can blame the compiler for the first example. This debate is not for us to arbitrate, but note that even if GCC’s behavior on the first example was reported as a GCC bug, this would be unlikely to result in a change: the well-meaning answer would be “this is undefined behavior anyway”.

    One can also wishfully believe that optimizing compilers such as GCC and Clang will never go as far as miscompiling the second example. Unfortunately, no one can tell how smart tomorrow’s compiler will be. An optimizing compiler can remove randomness from a pseudo-random number generator that uses uninitialized variables, or optimize a useful computation into an infinite loop. Because optimization based on undefined behavior is an exercise in absurdity, there is no telling what the consequences might be if a compiler became smart enough to notice that the function delete_interface_from_list uses a dangling pointer. One nightmare scenario is if the compiler inferred that the condition unlinked != NULL within the do is false for all defined values of unlinked, the only values that matter. Such a fact can be inferred with purely local reasoning, and justifies compiling the condition as “always false”. The consequence of this undefined behavior, if it caused this optimization, would be a resource leak where only the first matching element of the linked list would be removed, instead of all of them. With any secondary consequence that the non-removal of elements after the first one might have too.

    This bug turned up during the initial steps of a verification of ntdp with TrustInSoft Analyzer. Of all the bugs that can be found this way, this one is rather in the harmless-interesting quadrant: not every bug needs a blog post just to argue that a bug exists. Other bugs, such as buffer overflows, are boring and dangerous. If pushed to its conclusion, the verification process will allow to be quite sure that neither boring nor interesting bugs remain within the chosen verification perimeter.

    Ack

    This post benefited of insights from my co-author Pascal Cuoq, as well as John Regehr, Harlan Stenn, Juergen Perlinger (who dug up this related discussion on comp.std.c), Lukas Biton, and Miod Vallat.

    An earlier version of this post contained an error in the reasoning about the possible consequences of the undefined behavior in ntpd. Reddit user rabidcow pointed out the error.

    Finding vulnerabilities in small, challenge-like C programs

    About your hosts

    I’m Pascal Cuoq, chief scientist at TrustInSoft. This is the first of a short series of technical essays, published here on a trial basis. The essays may resemble in style ones that I contributed over the years to the Frama-C blog.

    TrustInSoft is a young company, created to provide products and services based on Frama-C, working in close collaboration with CEA LIST, where Frama-C continues to be developed.

    Short, challenging programs

    We have received several short C programs, each embodying one difficult security vulnerability. These programs may start with:

    int main(int argc, char *argv[]) {

    Here we encounter the first difficulty. As long as we were analyzing safety-critical C code, the programs that were verified with Frama-C were well-understood sub-components of larger systems. They had been developed according to specifications written in advance, and our responsibility was to check that these sub-components worked according to their specifications. We could be verifying that the post-conditions were established before control was returned to the caller. We could also simply be verifying that the code did not contain implementation flaws leading to undefined behavior. Even in the latter case, we were making use of the sub-component’s specifications: we were only verifying the absence of undefined behavior when the sub-component was invoked according to the pre-conditions it had been designed to expect. The pre-conditions were simple and unambiguous, because the system designers of safety-critical systems value simplicity and unambiguity.

    In security, cheating is the rule. The vulnerabilities in the small examples we received are revealed by a malicious user invoking the program with a 2000-character-long argument, or with 500 arguments. It is a different world; but is there something that, with our safety-oriented habits, we can do?

    Assumptions about main()’s arguments

    The 5.1.2.2.1 section of the C11 standard, Program startup, describes the conditions that can be expected. Quoting the clause 2 from that section:

    If they are declared, the parameters to the main function shall obey the following constraints:

    • The value of argc shall be nonnegative.
    • argv[argc] shall be a null pointer.
    • If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. If the host environment is not capable of supplying strings with letters in both uppercase and lowercase, the implementation shall ensure that the strings are received in lowercase.
    • If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1] represent the program parameters.
    • The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

    Say that we are writing a privileged application (it could be an Unix program with the setuid bit set). How much of the above specification can we trust? The above clause says that argc contains “pointers to strings”, by which it means zero-terminated sequences of characters. However, a malicious user could perhaps pass a non-zero-terminated sequence of characters as argument. Bear in mind that the malicious user of our hypothetical Unix system is not even limited to what eir shell allows: this user can also call the function execve() directly.

    The function execve() has the prototype:

    int execve(const char *path, char *const argv[], char *const envp[]);

    The argv argument has the same type and name as the argv argument passed to the main() function of the privileged program. This is worrisome: if the caller of execve() invokes it on purpose with an array of pointers to ill-formed strings, can a buffer overflow result in the program being executed? Can an argument contain an embedded '\0'? Can the caller maliciously make argc negative, in hope of making the executed program misbehave somehow?

    The fact that execve() expects an array of well-formed strings means that the behavior is undefined if execve() is passed an array of pointers that do not point to well-formed strings. But we are working in security now. We cannot simply tell collaborative sibling components of the system please not to do that: the caller of execve() is our adversary and will use any trick that works in order to derail execution of the program. The question is for us to determine whether the undefined behavior of calling execve() with malformed arguments is harmful to security. Especially in light of the fact that it is impossible to implement in pure C an execve() function that determines whether a pointer points to a well-formed string.

    How execve() would work if I was in charge

    I cannot tell you how your operating system does it—the applicable standards only seem to describe properties the language and the OS must have to make programs work, not security guarantees, or perhaps I am looking in the wrong place. But here is how execve() would work if I had to implement it.

    POSIX documents a size limit on the list of arguments, ARG_MAX. What execve() could do is:
    – reserve a counter argc and set it to 0.
    – reserve a memory area M of size ARG_MAX, and while it is not full:
    – obtain a pointer from argv. If the pointer is null, we are done.
    – otherwise read characters from that pointer and copy them to M until either '\0' has been encountered or M is full. If M becomes full, fail with E2BIG. Increment argc.

    If a segmentation violation occurs while reading from argv or while reading characters from a pointer obtained from argv, it means that execve()’s arguments were ill-formed. So much the better: they did not have consequences, in the sense that we avoided transferring control to the program with ill-formed main() arguments. As long as ARG_MAX is chosen smaller than INT_MAX, then argc cannot overflow during the above loop, because at least one character is copied to M for each time argc is incremented.

    As a side note, I did try to create a 2147483648-argument-long argv on the IL32P64 Unix-like operating system I routinely use, and to pass it to execve(). I was hoping that the invoked program would receive a negative argc, which would have been a minor security issue justifying further investigation inside privileged programs. It did not work: the execve() call failed instead. According to this table, GNU Hurd has an unlimited ARG_MAX, which almost motivates me to install it just to see if it is possible to make argc negative there.

    It should also be noted that many programs actually fail if invoked with argc == 0 and argv containing only a null pointer. Bearing in mind that this is within specifications, and that this is easy to achieve by calling execve() directly, finding a widely-used setuid program that exhibits interesting behavior when invoked with these values of argc and argv is left as an exercise to the reader.

    An analysis entry point

    To summarize what we have seen so far, the guarantees offered by the C standard on the arguments of main() can be relied on.

    In order to analyze for security C programs with main() functions, using verification tools that reason from pre-conditions, one can always use an analysis entry point as follows:

    void analysis_entry_point(void) {
      int argc = nondeterministic_choice_between(0, N-1);
      char **argv = malloc((argc+1) * sizeof char*);
      for (int i=0; i<argc; i++)
      {
        int size = nondeterministic_choice_between(0, N-1);
        argv[i] = malloc(size + 1);
        for (int j=0; j<size; j++)
          argv[i][j] = nondeterministic_choice_between(1, 255)
        argv[i][size] = 0;
      }
      argv[argc] = 0;
      main(argc, argv);
    }
    

    The above analysis entry point has no arguments and thus does not need to assume anything about them. It creates all possible values of argc and argv and passes them to the actual main() function. For a large enough value of N (depending on the operating system), the above piece of code creates all possible arguments that the program can effectively receive.

    This could be a good analysis entry point to use in some contexts, including when discussing the notion of analyzer soundness (absence of false negatives), that always seem to give interlocutors a hard time. Obviously, in practice, the above entry point contains a lot of difficulties for only theoretical gains, and it is better to obtain a more specific description of the arguments the program is supposed to expect, or to reverse-engineer interesting numbers of arguments to analyze the program with from the first lines of its main() function.

    Instead of a bit of program that builds all possible arguments the program may be invoked with, the allowable assumptions about main()’s argument can also be expressed as an ACSL contract for the program’s main() function. The contract may look something like this:

    /*@
      requires 0 ≤ argc < N ;
      requires \valid(argv + (0 .. argc));
      requires argv[argc] == \null;
      requires \forall integer i; 0 ≤ i < argc ==>
                 \valid_string(argv[i]) && \string_length(argv[i]) ≤ N;
    */
    
    int main( int argc, char ** argv ) {
      …

    The two versions express the same assumptions. The practical differences between one version and the other are not important. The important idea is that both versions capture any argument list a malicious user can throw at a program. If a privileged application is invoked with argc and argv arguments that are not captured by the formalizations above, it is a flaw in the operating system, not in the application. If the application misbehaves when invoked with argc and argv corresponding to our formalizations, then it is a flaw in the application. But we have a chance to detect it, and the first step before detecting it was to write down the hypotheses under which we are going to work.

    Conclusion

    Security is perhaps not too different from safety after all: in both fields, one must be aware of one’s assumptions. The aura of cybersecurity is that the villain wins by cheating, by circumventing the rules. Instead, a better way to look at the difference may be that in security the rules exist and are enforced seriously, but that they are more complex and that the average software developer is not aware of them as acutely as the developer of safety-critical software is aware of the rules that apply to em.

    The example of the arguments of main() is mostly useful to illustrate this. It comes up in ten-line programs submitted to us during technical discussions, but not so much in actual study of security-critical code. As a data point, when analyzing a generic SSL server based on PolarSSL, the question of main()’s arguments did not even come up. There were unknown inputs to verify that the code was robust to. These inputs originated from the network instead, through a reception function that had to be registered via PolarSSL’s API, as alluded to in this previous blog post about PolarSSL.

    Acknowledgements: Julien Cretin, Lukas Biton, Anne Pacalet, StackOverflow users Duck and Oli Charlesworth, and Twitter users lahosken, xexd and pestophagous.

    PolarSSL Verification Kit

    polarssl_logo

    An SSL Stack free of flaws

    The PolarSSL library (now known as mbed TLS) is a dual-licensed (GPLv2 or proprietary) implementation of the SSL and TLS protocols, plus the respective cryptographic algorithms and support code required. Since SSL libraries play such a crucial role in internet security, TrustInSoft developed a report that mathematically proves PolarSSL is immune to the most common security flaws.

    The PolarSSL Verification Kit is a report that describes how to compile, configure, and use PolarSSL in order to remain immune from CWE 119 to 127, 369, 415, 416, 457, 476, 562, 690. We used TrustInSoft Analyzer(recognized by NIST) to generate this report.

    Benefits

    The PolarSSL Verification Kit guarantees the quality of the PolarSSL stack source code. We describe a PolarSSL configuration and provide the artifacts necessary for PolarSSL’s formal verification. If you follow the Verification Kit’s recommendations, you can be sure that zero-day exploits similar to Heartbleed are not present in your system’s SSL stack


    Technology

    Such a verification is possible because it relies on formal methods present in TrustInSoft Analyzer. This source code analyzer uses mathematical techniques to detect any flaw that might be present. For this reason, TrustInSoft Analyzer is used to validate mission-critical software in objects such as airplanes, military cryptographic systems, nuclear reactors, space probes, and trains.

    To get a PolarSSL Verification Kit on another version or on mbed TLS, please contact us