An old quirky libksba bug

The libksba library, used by GnuPG, provides functions for parsing X.509 cryptographic certificates. I was testing libksba with TIS Interpreter a little over a year ago. One of the bugs I found then illustrates a point I would like to make now.

The bug

Consider this function, as it was present in libksba two years ago:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  gpg_error_t err;
  AsnNode n;
  char *algo;
  size_t nread;

  if (!cert)
    return NULL;  /* Ooops (can't set cert->last_error :-().  */

  if (!cert->initialized)
    {
       cert->last_error = gpg_error (GPG_ERR_NO_DATA);
       return NULL;
    }

  if (cert->cache.digest_algo)
    return cert->cache.digest_algo;

  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The source code above contains the bug. The bug can almost be found by inspection of the function, inferring contents of types and behavior of callees from the way they are used. I have only removed commented-out code from the original. If you think that you have identified the bug I found one year ago, but that it may depend how the function ksba_cert_get_digest_algo is used, you may be on the right track. Here is how ksba_cert_get_digest_algo is invoked in tests/cert-basic.c:

  oid = ksba_cert_get_digest_algo (cert);
  s = get_oid_desc (oid);
  printf ("  hash algo.: %s%s%s%s\n",
          oid?oid:"(null)", s?" (":"",s?s:"",s?")":"");

If on the other hand, even with this clue, you are still finding the entire function ksba_cert_get_digest_algo too tedious to review, here is a synthetic view that makes that bug stand out:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  …
  char *algo;
  …
  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The bug is that the automatic variable algo remains uninitialized until _ksba_asn_find_node is called. When that function succeeds, &algo is passed as argument of the function _ksba_parse_algorithm_identifier (without looking at it, we can assume that that function does initialize algo in all cases). The fact remains that when _ksba_asn_find_node fails because of the condition !n || n->off == -1 being true, the variable algo remains uninitialized until its “value” is returned as the result of the function.

The anecdote

The funny thing about this bug is that it is very easy to produce inputs for. I was using afl, an efficient and easy-to-use fuzzer, to generate testcases. This bug was the second one I found, right after I set up afl and TIS Interpreter to work together on libksba. It turned up just after afl generated an input that demonstrated a crash caused by an unrelated first bug. Whenever the target software crashes, afl generates a README.txt file to invite the user to report their success to afl’s author. This is smart timing: when using a fuzzer to find bugs, producing an input that causes a direct crash is one of the best possible outcomes. afl’s README file is thus a subtle declaration of victory: you only see it when afl has done a good job.

The README, a pure ASCII file intended for human consumption, of all inputs pretending to be X.509 certificates, also happens to trigger the uninitialized-variable-use bug described in the first part of this post. Let they who would not have used the shell command for i in findings_dir/crashes/* ; do run_in_tis_interpreter.sh $i ; done throw the first stone.

No great coincidence happened here, neither was the README.txt file generated designed to look fiendishly similar to a X.509 certificate. Any brute-force purely random fuzzer would have instantly produced a file that triggered the bug shown. In good time, afl would have found one too—and, importantly, it would have recognized that such an input caused a different execution path than less obviously incorrect certificates to be followed inside libksba’s cert-basic test program, and would have set it apart for later inspection.

Epilog

The crash initially found by afl was fixed in a timely manner by Werner Koch in commit a7eed17, and the uninitialized-variable-use bug described in this post was fixed in commit 3f74c2c.

I never formally reported any of afl’s numerous findings to afl’s author, despite the instructions in the input file. Well, this post can be my report: all the bugs reported by me in libksba in 2016 were found using afl and TIS Interpreter together. This post is but an example to illustrate how well these two tools work together.

afl had been used on libksba before (for instance this 2014 vulnerability is listed as having been found using afl). But although the uninitialized-variable-use bug is very shallow and can be evidenced by throwing any README file at cert-basic, the bug was probably missed because it did not cause a crash. Barring weird optimizations, when executing a binary not instrumented to detect the use of uninitialized variables, the variable’s value is what happened to be on the stack. The stack is full of valid addresses: addresses of variables and buffers that have previously served, but also addressed of call sites to return to. Reading from any of these addresses, it is likely that a zero will be found before the end of the page (code contains plenty of null bytes because of how constants in instructions are typically encoded, and data contains plenty of null bytes too, of course). This means that running cert-basic directly does not reveal, via a crash, that something wrong happened. All that happens is that instructions or (possibly secret) data is printed as if it were a string. Since there are no a priori expectations for what cert-basic should print when passed an invalid file, this is difficult to notice.

One solution, of course, is to have used MemorySanitizer (MSan for short) to compile libksba’s cert-basic. One might speculate that the bug was not found earlier because MSan’s instrumentation is incompatible with ASan’s, and that users of afl, if they use a sanitizer at all, use ASan in order to find the most dangerous memory bugs.

TIS Interpreter takes more time to run a test program than it takes to compile and execute this test program with MSan or ASan instrumentation, but TIS Interpreter finds all the problems that these two sanitizers find, and more.

afl can produce minimal test suites that exert all the different execution paths it was able to identify in the target programs. This feature enables TIS Interpreter to be used harmoniously together with it.

If the only way to use the two together was to run TIS Interpreter on every input that afl generated and discarded, then the collaboration between the two tools would be fruitless. The interesting inputs would be lost in a sea of uninteresting ones and TIS Interpreter’s comparatively slow speed would make it seem like it is not finding anything. But since afl can produce minimal test suites, covering a lot of different behaviors of a typical parser with a few hundred testcases, these few hundred testcases should definitely be run in the best detector of undefined behavior that exists, even if that detector uses up a few seconds per testcase. And, for C programs, the best detector of undefined behavior in terms of detected undefined behaviors is arguably TIS Interpreter.

Auditing zlib

zlib is a venerable multi-purpose compression library, first released in May 1995. The first time I installed GNU/Linux was in late 1995, so to me zlib has somehow always existed. Although now that I think about it, the zlib source code was distributed in the Unix compress format: had I been just a little bit more perceptive, I might have inferred that zlib had not existed forever, as a consequence of something else having come before it.

Together with New York based company Trail of Bits and under the auspices of Mozilla’s Secure Open Source program, TrustInSoft has completed an automated audit of zlib. Subtle issues were found, and fixed by zlib co-author Mark Adler.

Dan Guido makes a great argument for taking advantage of all that software tools can offer for auditing software. One of the tools used in this audit is tis-interpreter, to identify C undefined behaviors along the execution traces generated by Trail of Bits’s CRS.

How do you report bugs that you alone can see?

Do you remember the TV show The Invaders? It was enormously popular in France, much more than in the US where it originated. It tells the story of one David Vincent, who alone sees that currently working C programs have a serious defect and are at risk of getting translated to flawed binaries by evil C compilers.

David Vincent

I misremember. In the series, David Vincent is alone aware that Earth is being infiltrated by extra-terrestrial beings of human appearance.

Strict aliasing as some know it

Many will know “strict aliasing” as a set of rules that make it forbidden to pass the same address twice to the following C function:

int f(int *p, float *q) {
  *p = 1;
  *q = 2.0;
  return *p;
}

Indeed, the compiler confidently generates assembly code that assumes that the arguments are not some int variable’s address &x and (float*)&x:

f:                                      # @f
        movl    $1, (%rdi)
        movl    $1073741824, (%rsi)     # imm = 0x40000000
        movl    $1, %eax
        retq

This is in accordance with the rules described in the C standard. The function f in itself is fine and can be used validly with arguments that do not alias, but passing arguments that make *q write to an int variable causes undefined behavior.

Strict aliasing as implemented in modern optimizing compilers

Not so many may know that compilers also assume that p and q do not alias in function g:

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

The function g always returns 1, according to Clang and GCC:

g:                                      # @g
        movl    $1, (%rdi)
        movl    $2, (%rsi)
        movl    $1, %eax
        retq

It is not clear that the standard allows them to do that(*). Not long ago, GCC optimized the following version, showing that any pointer computed as the address of a struct was assumed not to point to any other struct even when the lvalue used locally did not show any trace of this.

int g(struct object *p, struct thing *q) {
  int *int_addr1 = & p -> id;
  int *int_addr2 = & q -> id;
  *int_addr1 = 1;
  *int_addr2 = 2;
  return *int_addr1;
}

This function is also compiled as always returning 1:

g:
        movl    $1, (%rdi)
        movl    $1, %eax
        movl    $2, (%rsi)
        ret

(*) Here is an argument demonstrating that passing the same address for both argument of g does not break strict aliasing rules, and therefore implying that the compiler should produce code that works in this case. I understand the author of the comment to be close to the matter at hand, being the implementer of the first type-based alias analysis in GCC.

How to report bugs

So we have been working on a pretty neat thing, and it is now just working well enough to give its first results. The thing is a very early version of a static analyzer that detect violations of strict aliasing as described in the standard, such as the first example above, and as actually implemented by compilers, such as the subsequent examples.

The results are diagnostics of strict aliasing violations in widely used open-source software. How would you go about reporting these to the maintainers of the software?

It seems important to report them: type-based alias analyses have broken programs that their authors expected to work in the past, and in fact they break them again every time a Gentoo user recompiles the Linux kernel without the option -fno-strict-aliasing. It is possible that these optimizations will not become more programmer-hostile than they are now (fingers crossed), and one may think that if they did not break a particular program (that violates the rules), they never will, but compiler implementers are hard at work on inter-procedural and link-time optimizations, all of which will make information available that wasn’t before, and allow strict aliasing optimizations to fire where they didn’t.

In the particular case of the bug being reported, we are quite sure of the analyzer’s findings, but the analyzer is a bit too experimental to release yet. Not that this would necessarily help: these alien-detecting sunglasses may, to a proud developer, seem like sunglasses with aliens painted on. The analyzer is the first of its kind, too, so there is little hope of confirming the findings with another comparable alien detector.

Pointer conversion misuses of the kind of the first example are easy to recognize: a float is being converted to an int containing its representation, or vice-versa, and one only needs to convince the maintainer of the software that there are better, if more verbose, ways to do this. On the other hand, the assumption that structs can be used in the way shown in the example above is rooted even deeper in the C programming ways. Not only will it be harder to convince developers that what the code is doing is dangerous, but since any genericity at all in C is only obtained through pointer conversions, it is not easy to show which among them are invalid, without seeming to appeal to authority. Note how in the program below, the pointer conversion takes place where the function g is invoked, which isn’t the place where the strict aliasing violation takes place (it takes place inside g). Just looking at the function main, it is not obvious that the pointer conversion is one that leads to a strict aliasing violation, as opposed to one that is the only way not to implement twenty different qsort functions.

#include <stdio.h>

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

int main(void) {
  struct thing saucer;
  g((struct object*)&saucer, &saucer);
}

I have been reporting subtle errors, like use of uninitialized memory, by offering a patch that should not change the program’s behavior if the program wasn’t using uninitialized memory, and that makes it evident that it does. Here is one recent example. Perhaps the same approach can be used here. That is, for reporting a problem in the program above, one might offer the patch below, and hope that the maintainer is not already having a bad day before receiving the bug report.

$ diff -u orig.c instrumented.c
--- orig.c	2016-06-25 20:40:38.819893396 +0200
+++ instrumented.c	2016-06-25 20:35:25.253912642 +0200
@@ -6,7 +6,9 @@
 
 int g(struct object *p, struct thing *q) {
   p -> id = 1;
+  printf("writing 2  to  %p as thing\n", (void*) & q -> id);
   q -> id = 2;
+  printf("reading %d from %p as object\n", p -> id, (void*) & p -> id);
   return p -> id;
 }
 
$ ./a.out 
writing 2  to  0x7ffe3e7ebbd0 as thing
reading 2 from 0x7ffe3e7ebbd0 as object

Bear in mind that in real code, the two sites may be in different functions, or even different files. The patch above is pointing out the obvious only because the strict aliasing violation is obvious in this 6-line example.

Will this be convincing enough? I will let you know…

This blog post owes to the Cerberus project for pointing out the actual situation with modern C compilers and structs, John Regehr for writing a summary of all the ways strict aliasing-based optimizations break programs and proofreading this post, Loïc Runarvot for implementing the analysis prototype. Alexander Cherepanov suggested the use of the verb “to fire” to describe C compiler optimizations being triggered.

Trap representations and padding bits

The C programming language does not hide from you how the values you manipulate are represented. One consequence is that when padding happens, its presence may have observable effects in carelessly crafted programs. Padding is well-known to appear between members of a struct, and also possibly after the last member of a struct. The remaining space in a union when the active member is not the widest one is also considered padding. A C programmer that only cares for usual x86 platforms might be excused for thinking that, for them, this is it. As for trap representations, these may be believed to be reserved for weird hardware that use one’s complement or explicit parity bits.

Padding in structs and unions

Naïve attempts at making an x86-64 compiler take advantage of the unspecified nature of struct padding fail, as in example functions f, g and h, but the function i, provided by Alexander Cherepanov, shows that the padding of a struct x does not have to be copied along the rest of the struct after the two consecutive struct assignments y = x; z = y;:

int i(void)
{
  struct { char c; int i; } x, y, z;
  memset(&x, 1, sizeof x);
  memset(&z, 0, sizeof z);

  y = x;
  z = y;

  return *((char *)&z + 1);
}

The function i is optimized to return 0;, meaning that the entirety of the memory dedicated to x was not copied from x to z.

These occurrences of padding can be prevented when programming for a specific architecture, or a couple of architectures(*), with interstitial character array members. If, in a new project, you are implementing treatments so sophisticated that they require you to define a struct, C may be the wrong language to use in 2016. However, if you are going to do it anyway, you might want to insert these explicit interstitial members yourself: nothing the compiler can do with these unused bytes is worse than what the compiler can do with padding. Clang has a warning to help with this.

Consequences

Since padding does not have to be copied in a struct assignment, and since any struct member assignment can, according to the standard, set padding to unspecified values, memcmp is in general the wrong way to compare structs that have padding. Sending an entire struct wholesale to an interlocutor across the network (by passing its address to write) can leak information that was not intended to get out.

The rest of this post discusses padding and trap representations in scalar types, which we show that our exemplar C programmer for usual x86 platforms might encounter after all. Also, padding in scalar types cannot be eliminated with just additional members, so the problem, if rarely noticed, is in some ways more annoying than struct padding.

Padding in scalar types

It would be easy to assume that in such an ordinary architecture as x86, padding only happens because of structs and unions, as opposed to scalar types. The x86 architecture is not that ordinary: its first historical floating-point type occupied 80 bits; many compilation platforms still make that floating-point type available as long double, for some reason as a type of 12 bytes or 16 bytes of total (respectively in 32-bit and 64-bit mode), including respectively 2 and 6 bytes of padding. Padding may or may not be modified when the value is assigned, as shown in another example from Alexander Cherepanov:

int f(void)
{
  long double x, y;
  memset(&x, 0, sizeof x);
  memset(&y, -1, sizeof y);

  y = x;
  return ((unsigned char *)&y)[10];
}

The function f above is compiled as is if was return 255;, although the entire memory assigned to x was set to 0 before copying x to y with y = x;.

Trap representations

Trap representations are a particular case of padding bits. A symptom of padding bits is that a type t has fewer than 2CHAR_BIT ✕ sizeof(t). In the case of trap representations, some of the bit patterns are considered erroneous for type t. Accessing such erroneous representations with an lvalue of type t is undefined behavior.

The C11 standard latest draft contains this footnote(**):

53) Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types. All other combinations of padding bits are alternative object representations of the value specified by the value bits.

This footnote is in the context of the representation of integer types, which are made of value bits, padding bits, and in the case of signed integers, one sign bit (6.2.6.2).

“I have no access to no parity bits,” the ordinary programmer might think. “Even if redundant internal representations are part of the implementation of my high-performance computer, the interface presented to me is that of ordinary memory, where all bits are value bits (with sometimes a sign bit).”

In fact, as implemented by GCC and Clang, the _Bool type has two values and 254 trap representations. This is visible on the following example:

int f(_Bool *b) {
  if (*b)
    return 1;
  else
    return 0;
}

The function f in this example, as compiled, returns 123 when passed the address of a byte that contains 123. This value does not correspond to any of the two possible execution paths of the function. Undefined behavior is the only excuse the compiler has for generating code with this behavior, and this means that GCC and Clang, for the sake of this optimization, choose to interpret bytes containing a value other than 0 or 1 as trap representations for the type _Bool.


This post owes to Alexander Cherepanov’s examples, John Regehr’s encouragements and Miod Vallat’s remarks.

Footnotes:

(*) I would have recommended to use fixed-width integer types from stdint.h, such as uint32_t, to make the layout fixed, but that does not solve the problem with floating-point types or pointer types.

(**) The latest C11 draft contains two very similar footnotes 53 and 54 for which it is not clear that both were intended to be present in the final revision.

A non-exhaustive list of ways C compilers break for objects larger than PTRDIFF_MAX bytes

One well-documented way in which a program may break in presence of objects of more than PTRDIFF_MAX bytes is by doing a pointer subtraction that results in an overflow:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  char *p = malloc(0x81000000);
  if (!p) abort();
  char *q = p + 0x81000000;
  printf("q - p: %td\n", q - p);
}

On the Ubuntu Linux version that I have conveniently at hand, this program, compiled to 32-bit code, produces:

$ gcc -m32 t.c && ./a.out 
q - p: -2130706432

Strange results are acceptable because the overflow in pointer subtractions is undefined behavior:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.

Do not stop reading if you already know all this! Things are going to get interesting in a minute!

Incidentally, when I tried this program on OS X 10.9 with clang -m32 t.c && ./a.out, I obtained a warning that malloc had failed, and my program aborted without doing the dangerous q - p pointer subtraction. In effect, a malloc function that rejects allocations of more than half the address space goes a long way towards preventing surprising behaviors. This is interesting because at least as late as OS X 10.5, I routinely allocated 2.5GiB from 32-bit processes. On the OS X platform, the decision was courageously made to break some 32-bit applications that used to work in order to prevent some vaguely worrying problems. But then again, preserving application compatibility was never the foremost preoccupation of the Mac platform.

Even if you already knew all this, keep reading.

Subtraction of pointers to wider types

On a 32-bit platform

GCC and Clang’s behavior on the following example is not excused by clause 6.5.6p9:

#include <stdlib.h>

int main(void) {  
  short *p, *q;

  p = malloc(0x70000000 * sizeof *p); // note: product does not overflow
  q = p + 0x70000000; // note: stays in-bounds
  return q - p; // note: does not overflow ptrdiff_t
}

This program prints -268435456. Wait, what?

The example can be made more striking by making the expected result of the pointer subtraction smaller:

#include <stdlib.h>

typedef char (*arrayptr)[0x70000000];

int main(void) {  
  arrayptr p, q;

  p = malloc(2 * sizeof *p);
  q = p + 2;
  return q - p;
}

Both Clang and GCC make this program return 0 when it’s very obvious that the correct result is 2. As a psychological experiment, showing these related examples to C programmers, the average subject might reluctantly let themselves be convinced that -268435456 is 0x70000000, but they are quite certain that 2 should not be 0, and express more surprise for the second example.

The answer to the riddle—but not really an excuse in itself for Clang and GCC’s behavior—is contained in the following decomposition of pointer subtraction into elementary steps. The “sarl” instruction implements a signed operation that does not produce the correct result when the value in %eax should be interpreted as a number above 231. When the element size is not a power of two, the compiler uses a division instruction but the problem remains the same: the sequence generated by the compiler implement a signed division that doesn’t work above 231.

Partially excusing GCC’s behavior, it is somewhat documented (in replies to bug reports) that 32-bit GCC does not work with malloc functions that succeed for requests of more than 2GiB of memory. As long as the malloc calls fail in the examples above, GCC is not doing anything wrong, because the program invokes undefined behavior as soon as p + … is computed. (It is illegal to do pointer arithmetic from a null pointer.)

And indeed, if we re-introduce the malloc success check that I surreptitiously removed from the initial example, GCC does the right thing. “The right thing” being making a call to malloc, which, as semi-documented, should fail for the requested size in order to be compatible with GCC.
It’s a shame that 32-bit Glibc’s malloc does not systematically fail for such requests, but that is not GCC’s problem. It is the problem of the programmer who links together code generated from GCC and code from Glibc (note that Glibc is not intended to be compiled with a compiler other than GCC, which is unfortunate considering).

Clang, the most honey badger-like of compilers, don’t give a shit.

64-bit platforms

64-bit solves all these problems, because a malloc call requesting half the address space always fails, right?
It’s a shame that there was a difficult trade-off on 32-bit platforms, where the problem was fully realized only after some applications had gotten used to requesting 2.5GiB and getting away with it, but as even the smallest computers are moving to 64-bit, we can consider this problem solved by Moore’s law? 263-1 bytes ought to be enough for anybody?

Not really. The malloc call in the program below succeeds (the program would return 0 if the malloc invocation failed), despite Clang having the same half-the-address-space limitation for 64-bit as for 32-bit.

#include <stdlib.h>

typedef char (*arrayptr)[0x700000000000000];

int main(void) {  
  arrayptr p, q;

  p = malloc(32 * sizeof *p);
  if (!p) return 0;
  q = p + 32;
  return q - p;
}

The program not returning 0 indicates that malloc succeeded. The program returning a nonzero value other than 32 indicates that Clang is confused.

Comparisons of pointers

One might consider it an acceptable trade-off to let malloc allocate 2.5GiB on 32-bit platforms, if the developer knows that they never subsequently subtract pointers to the allocated zone. Not as char* and not as pointers to another type, since compilers are broken for these subtractions too, even when C11’s 6.5.6:9 clause perfectly defines the result.

Unfortunately, an optimizing compiler’s code appears to be littered with optimizations that assume object sizes are less than PTRDIFF_MAX. As an example, we all know that X86 Clang and GCC implement pointer comparison as an assembly unsigned comparison (“cmpl” is an agnostic comparison instruction, but here it is followed by “cmoval”, where the “a”, for “above”, indicates conditional move based on an unsigned comparison). This means that it is allowed to compare pointers inside a putative 2.5GiB array on a 32-bit platform, right?

Not all pointer comparisons.

Because of this optimization.

Note the “g” in “cmovgl”. This “g” stands for “greater”: a mnemotechnic(?) indicator that a signed comparison decides which string is returned. Determining what happens with offs1=0 and offs2=0x81000000, two values representable in a 32-bit size_t that, with a lenient malloc, can also be valid offsets into a same array, is left as an exercise for the reader.

Conclusion

It is impossible to tell when the compiler will apply this or any other optimization that assumes that a pointer difference is always representable as a ptrdiff_t. Therefore, the only safe path on 32-bit platforms is to refrain from allocating more than 2GiB in a single malloc call; it might be a good idea if Glibc’s malloc was changed to implement this limitation. Compilers could do much, much better than they currently do. Nothing would prevent malloc(0x81000000) to cause a warning (I suggest “malloc call will always return NULL” or something equally direct). Beyond the friendliness of warnings, Clang’s elimination of malloc calls that, because of Clang limits, cannot succeed, as calls that cannot fail indicate a disturbing misunderstanding of classical logic.

Acknowledgements

Alexander Cherepanov and John Regehr provided most of the examples in this post, with additional remarks coming from Daniel Micay and Rich Felker. Matt Godbolt made the absolute best platform on which to investigate these examples and show them to the world.

Out-of-bounds pointers: a common pattern and how to avoid it

A quick tis-interpreter update

At long last, tis-interpreter was released as open-source last month. It is somewhat difficult to install, and as a remedy I will regularly prepare binary snapshots for Linux (and maybe OpenBSD, OS X). At the time of this writing, the first publicly available binary snapshot can be downloaded from the bottom of the tis-interpreter homepage.

The rest of this post is about extremely recent evolutions in tis-interpreter.

A newfound strict stance on pointer arithmetic

As tis-interpreter was getting easier to use and was getting used, the feedback from applying it led to what may seem like drastic changes. Here is one example.

Among other software components, John Regehr applied tis-interpreter to Google’s implementation of their WebP image format, and then to SQLite. The former is only a compressed format for static images, but the latter is a large library with an even larger testsuite. Both were worth looking at, but the latter was a significant effort.

In libwebp, among other problems, John reported a subtle bug in a pointer comparison. libwebp would check whether the size declared for a sub-object was consistent by comparing what would be the address to continue parsing at (after the sub-object, if it had the size it said it had) to the end pointer. This idiom is dangerous. tis-interpreter warned for the dangerous comparison, I wrote a short essay, John reported the bug, and it was the end of that story.

On the SQLite front, progress was slow. The very same “dangerous pointer comparison” warning was being emitted because of one-indexed arrays implemented as int *one_indexed = ptr - 1;, as explained in John’s post. John complained that the “pointer comparison” warning was emitted on an eventual one_indexed == NULL, arbitrarily far from the allocation of the one-indexed array, which was the real culprit, and in any case, the line he wished to change so as to suppress all warnings coming from that pointer in one swoop. After multiple requests, I finally implemented detection of invalid pointer arithmetic in tis-interpreter, so that the warning would be on ptr - 1 instead.

A surprise lurking in already visited code

This week, my colleague Anne Pacalet found herself going back to analyzing libwebp. She naturally started by using the latest version of tis-interpreter, with its all-new early warnings about invalid pointer arithmetic. She discovered that apart from the reported, since long fixed end_ptr idiom where a pointer was compared after going out of bounds, other invalid pointers are computed inside libwebp. This happens when decompressing examples provided as testcases by Google itself. The line sporting the tis-interpreter warning is this one, and the surrounding code looks like this:

  for (j = 0; j < height; ++j) {
    for (i = 0; i < width; ++i) {
      const uint32_t alpha_value = alpha[i];
      dst[4 * i] = alpha_value;
      alpha_mask &= alpha_value;
    }
    alpha += alpha_stride;
    dst += dst_stride; // goes out of bounds on the last iteration
  }

Do not confuse this with a buffer overflow or a similar immediate problem! The pointer dst goes out of bounds when it is computed at the end of the last iteration, and it is never used after that. Besides, it may look like this last value of dst is a one-past-the-end pointer as allowed by the C standard, but that is only true if dst started from 0.

The function in question is called from EmitAlphaRGB (through the WebPDispatchAlpha function pointer). At line 183, dst may have been incremented by 3 before being passed as argument, and in this case, the last value of dst is way out of bounds (it is four-past-the-end, but there is no “four-past-the-end” exception for pointer arithmetic in the C standard).

Fixing the problem

It may help both to communicate the problem and to convince the maintainers to remove it if it is reported with an example of change that would keep the computation the same and remove the undefined behavior. We looked at different variations. I thought I had a solution that wasn't too ugly when I arrived at the following change:

-  for (j = 0; j < height; ++j) {
+  j = 0;
+  while (1) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
       dst[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
+    if (j == height - 1) break;
+    ++j;
     alpha += alpha_stride;
     dst += dst_stride;
   }

The above change applied, the exit test is now after dst has been used in the iteration but before it is incremented, so that the ultimate undefined-behavior-invoking incrementation does not take place. The generated assembly for the code above seems fine, but there is a subtle difference: the original function never entered the loop if height was 0, whereas this one always does at least one iteration. This is something that would have to be discussed with the developers, but the problem does not seem worth a discussion. We could pre-emptively guard the loop with if (height>0), but then the source code gets uglier, at the very least, and the generated assembly code might be longer than what was generated for the original function.

Jamey Sharp's solution

Jamey Sharp provided a brilliant insight. His remark was: “what about just doing the obvious multiply and letting strength reduction turn it into this code? Easier to maintain and correct”

Translated to code, Jamey's remark amounts to this change:

   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      dst[j * dst_stride + 4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
   }

Indeed, we are removing more code than we are adding. It solves the undefined behavior problem because now, in the source code, j * dst_stride is only added to dst at the time of accessing the memory location (in the source code, dst + height * dst_stride is never computed). The beauty of this solution is that an optimizing compiler generates exactly the same assembly code, to the comma, for this version as it did for the original version:

$ diff -u old.s jamey.s
$

In other words, we stick to legal (and concise and readable) operations in the source code. The compiler does the tedious work of knowing that on the target platform, pointers are just integers and that a multiplication by the index inside a loop can be transformed into an addition. This is the correct division of labor between developer and computer.

Miod Vallat's solution

Miod Vallat pointed out that the solution above required you to trust that the compiler would always succeed at the strength-reduction optimization. Miod's solution instead expresses in the source code the fact that pointers are just integers. It should work fine for all the ordinary platforms we are used to, because of footnote 67 in the C11 standard.

   int i, j;
+  uintptr_t dst_int = (uintptr_t) dst;
 
   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      ((uint8_t*)dst_int)[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
+    dst_int += dst_stride;
   }

With Miod's solution too, the compiler (GCC 5.2) produces the exact same assembly as before:

$ diff -u old.s miod.s
$

Conclusion

The new “bug” discussed here is, as far as I know, harmless with current compilers. In this particular instance, by removing the undefined pointer arithmetic, we are making libwebp safer from agressive compiler optimizations for the next ten or fifteen years, but we are not solving any immediate problem. It is not always that clear-cut. The pointer comparison problem that had been reported earlier in libwebp, with current compilers and execution platforms, would never have noticeable consequences on the ordinary computers security researchers use to look for problems, but it could have consequences in a peculiar configuration.

The option remains to disable the warnings on the creation of invalid pointers: if that is what you want, use:

tis-interpreter.sh -no-val-warn-pointer-arithmetic-out-of-bounds …

Fiddly buffer overrun in OpenSSL

John’s blog is hosting a post, co-authored by me, about one of the more entertaining “bugs” reported by TrustInSoft in OpenSSL. In this case the behavior was intended, and looked like a good idea when the code was originally written. I see this anecdote as a continuation of the “sociology of open-source security fixes” series. Unlike the situation in the first two post of that series, here the codebase has a long history, with some parts of it going back twenty years, but practices have changed and compilers have changed.

The incongruous pattern was present in both LibreSSL and OpenSSL, and has now been cleaned-up from both. Many thanks to all maintainers of aging open-source software, it is a difficult task.

memcmp requires pointers to fully valid buffers

A suspicious pattern in open-source software

One bug recently found by John using tis-interpreter on a widely used open-source library involved the comparison of strings with memcmp. The unexpected condition was that memcmp was, in one case, called with a pointer to a buffer shorter than the length passed as third argument, breaking one of the two symmetrical pre-conditions in the function’s ACSL contract:

/*@ requires \valid_read(((char*)s1)+(0..n - 1));
  @ requires \valid_read(((char*)s2)+(0..n - 1));
  @ …
  @*/
int memcmp (const void *s1, const void *s2, size_t n);

A reason that may have made this use of memcmp look okay to the developer is that the buffers being passed to it always differed before the end of the buffers were reached. That is, a memcmp implementation based on the following loop would not have caused any out-of-bounds read access:

for (size_t i=0; i<n; i++)
  if (((char*)s1)[i] != ((char*)s2)[i]) return …;

The first question raised was whether the pattern memcmp("a", "bc", 3) was problematic according to the letter of the C standard. If it was, the second question was whether the busy maintainer of one of the Open Source packages that make the Internet tick should be bothered with a bug report.

Reading between the lines of the C11 standard

I would like to be able to say that memcmp’s ACSL contract was the product of careful deliberation, but unfortunately this is not the case: many standard function contracts were written quickly in order to get most of the standard library covered, and have not been tested by time. Anyway, upon proofreading the relevant clause in the C11 standard, my feeling was that the ACSL formalization was, in this particular case, right, and that it was undefined behavior to pass as memcmp argument a buffer that wasn’t fully valid, even if the implementation sort-of needs to read the buffer’s characters in order for the purpose of finding the first mismatch.

7.24.4:1 The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

7.24.4.1:2 The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2.310)

310) The contents of “holes” used as padding for purposes of alignment within structure objects are indeterminate. Strings shorter than their allocated space and unions may also cause problems in comparison.

My feeling is that the above clause 7.24.2.1:2 is faithfully formalized as the ACSL pre-conditions quoted at the beginning of the post. This feeling come from the contrast with the memchr function:

7.24.5.1:2 The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s. The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found.

The sentence that I highlighted above is what the C standardization committee writes when they want to indicate that the function will not read—or will behave as if it did not read, anyway—beyond the match.

Parenthetical anecdote: not only this sentence is absent from the specification of memcmp, but there is an entire standardization committee, the POSIX one, who seeing the sentence absent from the specification of memchr in the C99 standard, thought that this absence meant that memchr was only allowed to receive fully valid buffers and was thus unacceptable, clarified the situation in their own standard and invited the C11 committee to incorporate the change. So that’s actually two standardization committees close to the matter at hand who think that if a sentence like the one above is not part of a function’s specification, the function must only be called with fully valid input buffers.

This meant that the memcmp formal specification that caused the warning to be emitted did not need to be fixed. Remained the question of how to best explain to developers to avoid an invocation that, on the face of it, looks relatively harmless.

A problem that matters in practice

Here is the implementation of memcmp in Glibc. There are two distinct optimizations for long buffers, one that applies when both buffers start at the same offset modulo the word size, memcmp_common_alignment, and one that applies when they don’t, memcmp_not_common_alignment.

The function memcmp_common_alignment is relatively well-behaved: it reads from the two buffers aligned word by aligned word, and thus reads the entire words that contain differing bytes. If the caller passed buffers that aren’t valid after the differing byte, this amounts to reading out of bounds, but this sort of out-of-bounds access is not detected by the typical MMU, which works at the scale of the page. This optimization is extremely common in standard library implementations—you would find strlen doing the same thing, in a context where there is no ambiguity about the illegality of reading past the null character. This does not constitute a convincing argument for the doubting developer.

The “not_common_alignment” case, however, tells a different story. To illustrate, the C program summarized below is made of two parts: first is Glibc’s implementation, lightly instrumented to show some of the addresses being accessed. Second is an input vector: t1 and t2 are declared as arrays of long long in order to force them to be aligned, and memcmp is passed t1 and (char*)t2+1 as arguments in order to force the “not_common_alignment” codepath to be followed inside memcmp’s implementation.

…
/* memcmp_not_common_alignment -- Compare blocks at SRCP1 and SRCP2
   with LEN `op_t' objects (not LEN bytes!).
   SRCP2 should be aligned for memory operations on `op_t',
   but SRCP1 *should be unaligned*.  */
static int
memcmp_not_common_alignment (srcp1, srcp2, len)
     long int srcp1;
     long int srcp2;
     size_t len;
{
  op_t a0, a1, a2, a3;
…
      a0 = ((op_t *) srcp1)[0];
      printf("just read %zu bytes from %p\n",
         sizeof(op_t), (void*)&((op_t *) srcp1)[0]);
…
      a1 = ((op_t *) srcp1)[1];
      printf("just read %zu bytes from %p\n",
        sizeof(op_t), (void*)&((op_t *) srcp1)[1]);
…
      a2 = ((op_t *) srcp1)[2];
      printf("just read %zu bytes from %p\n",
         sizeof(op_t), (void*)&((op_t *) srcp1)[2]);
…
      a3 = ((op_t *) srcp1)[3];
      printf("just read %zu bytes from %p\n",
         sizeof(op_t), (void*)&((op_t *) srcp1)[3]);
…
}

int
my_memcmp (s1, s2, len)
     const __ptr_t s1;
     const __ptr_t s2;
     size_t len;
{
…
  long int srcp1 = (long int) s1;
  long int srcp2 = (long int) s2;
…
      if (srcp1 % OPSIZ == 0)
        res = memcmp_common_alignment (srcp1, srcp2, len / OPSIZ);
      else
        res = memcmp_not_common_alignment (srcp1, srcp2, len / OPSIZ);
…
}

long long t1[] = { 0xFF07060504030201, 0x100f0e0d0c0b0a09 };
long long padding[2] = { 1 };
long long t2[] = {   0x0706050403020100, 0x0f0e0d0c0b0a0908,
            0x1716151413121110, 0x1f1e1d1c1b1a1918 };

int main(void) {
  printf("                    t1:%p\n"
         " pointer 'one past' t1:%p\n"
         "               padding:%p\n"
         "                    t2:%p\n"
         " pointer 'one past' t2:%p\n",
         (void*)t1, (void*)(&t1+1), (void*)padding,
         (void*)t2, (void*)(&t2+1));
         
  /* t1's validity is 16 bytes. One may think it's ok to compare it with the
  sequence of bytes at (char*)t2+1, since they differ in the 8th byte. */
  return my_memcmp(t1, (char*)t2+1, 31);
}

Compiled on the I32LP64 little-endian machine I am typing this on, the program shows:

$ clang memcmp.c && ./a.out 
                    t1:0x10733f020
 pointer 'one past' t1:0x10733f030
               padding:0x10733f030
                    t2:0x10733f040
 pointer 'one past' t2:0x10733f060
just read 8 bytes from 0x10733f028
just read 8 bytes from 0x10733f030

In other words, when passed the carefully (mis-)aligned buffers t1 and (char*)t2+1, although these buffers differ in the 8th byte, Glibc’s implementation of memcmp reads 8 bytes beyond the end of t1. By making the 16th byte differ instead of the 8th one, it is also possible to make Glibc’s implementation of memcmp read 16 bytes beyond the end of t1.

There is no particular reason why these 8 or 16 bytes should not be in the next, unmapped memory page. In fact Alexander Cherepanov modified the above example to make sure it crashed by accessing into an unmapped page:

/* prepare a copy of t1 located at the end of a page: */
size_t pagesize = sysconf(_SC_PAGESIZE);
char *p3 = mmap(NULL, pagesize * 2, PROT_READ|PROT_WRITE,
              MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (p == MAP_FAILED)
  return -1;
if (mprotect(p + pagesize, pagesize, PROT_NONE) != 0)
  return -2;

memcpy(p3 + (pagesize - 16), t1, 16);

printf("                    p3:%p\n"
       " pointer 'one past' p3:%p\n",
       (void*)p3, (void*)(p3 + pagesize));

/* expect a crash soon: */
fflush(stdout);

/* call memcmp with the prepared copy: */
printf("comparison:%d\n", my_memcmp(p3 + (pagesize - 16), (char*)t2+1, 31));

Conclusion

In conclusion, yes, some implementations of memcmp will crash when invoked with buffers that aren’t valid for the full length, even if they differ early. The circumstances are rare (probably the reason this bug was still there to be found in a library that had already been tested with all the available techniques) but outside the programmer’s control. The pattern described in this post should be reported as a bug when found.

Acknowledgements

In addition to John Regehr finding an instance of the bug and being embarrassed about reporting it, and Alexander Cherepanov insisting on the difference between word accesses that remain in the memory page or that reach out of it, this post also benefited from remarks from Rich Felker, Julien Cretin, Jed Davis, and Aaron Jacobs. Discussions around the issue have lead to a series of bug reports in Glibc’s memchr, strncat and strnlen implementations.

tis-interpreter progress report

tis-interpreter is a specialized version of Frama-C for interpreting C programs and finding bugs in them. It has its own page. The development of tis-interpreter started from Frama-C’s Value Analysis plug-in, and discarded the possibility of analyzing the behaviors of the program for all inputs. Instead, tis-interpreter monitors the program’s behavior for one input at a time. In exchange for giving up on generality, tis-interpreter gains on precision: it never emits false positives. tis-interpreter only warns about illegal operations—out of bounds memory accesses, uses of uninitialized memory, …—that actually happen in the target program for at least one input, namely, the input that was provided for execution.

Not having to deal with false positives is life-changing, to a point that the reader may perhaps only fully apprehend if they have previously used Frama-C’s value analysis or a similar system. It is liberating: suddenly, the size of, the difficulties in the target codebase are no longer the user’s problem. One finds oneself wanting to tackle large, real programs. Programs that matter.

Which leads directly into the next limitation: real programs that matter use the standard library to access the environment, the time of day, and the filesystem. TrustInSoft had developed some of the foundations of the execution environment necessary to the analysis of real programs, and the Core Infrastructure Initiative is funding the development of the missing pieces of the puzzle and the release of the whole as Open-Source in 2016.

Until recently, progress was mostly of internal interest only. Now, development has reached a point where we are analyzing open-source library after library, and finding hitherto-unknown bugs in them. For this reason the latest progress report I sent to the CII may interest the reader of this blog, and I am republishing it here, lightly edited.

A- Improving the interpreter

Since the last quarter, John Regehr (who is visiting TrustInSoft on sabbatical), the TrustInSoft team and I have been implementing the detection of seemingly minor infringements to the C standard, such as calling a standard library function with an invalid pointer (say, NULL) together with a size of zero, or passing overlapping memory zones as arguments to standard functions that do not expect this. To illustrate the first family of issues, a C programmer might think that passing NULL together with a size of 0 is fine because the size of 0 means that no memory access will take place, but as of version 4.9, GCC started to use an obscure rule in the C standard for optimization purposes, and long story short, the BIND project was bitten when GCC 4.9 came out and BIND now has to be compiled with an explicit commandline option to disable this GCC behavior.

In the last quarter, progress on libotr and libgrypt had been stumped by the latter’s use of va_list to implement homegrown variadic functions in C, and I had promised to look at SQLite instead. It turned out that SQLite uses va_list too. As a consequence we got started on support for this construct, and va_list is now well-supported enough that we have reported our first bugs in SQLite, including some bugs in SQLite’s use of va_list (!)

Testing the interpreter on the implementation of webp, a modern, very computationally intensive image format, revealed an inefficiency that was creeping up as the interpretation was going on for a long time. This inefficiency is being fixed.

Finally, using the interpreter together with a file-based fuzzer such as afl meant that the file-access functions needed to be available in the interpreter too. The library of file-access functions has been systematically completed and tested with a dedicated file-access fuzzer.
These functions are written so as first to enable the execution of programs that use files, but not necessarily to detect all errors in the usage of file access functions. The easy-to-spot misuses of file-access functions are detected, and my hope is that, in the future, we will be able to harden the file-access functions to detect all wrong uses of them.

B- Applying the interpreter on existing tests

We took great advantage of pre-generated test suites that Michal Zalewski made available for download. Although these tests had already been executed either directly or under Valgrind/ASan/UBSan, executing them in tis-interpreter reveals additional subtle bugs that were reached by the test suites but were not recognized as bugs. As an example, we reported an erroneous pointer comparison in Google’s implementation of the webp image format. The pointer comparison usually behaves as intended, and behaved as intended when Michal generated and executed the testcase, but, depending on the memory layout, it can produce the wrong result and lead to an invalid pointer access further in the library.

The same pattern was present in other libraries: it turns out to be the foremost typical mistake that can remain in libraries that have already been closely inspected with previously available tools. So I wrote a short essay on this pattern and why it should be avoided.

Bugs were also reported against libpng, libjpg, OpenSSL, LibreSSL, libsodium, and SQLite.

We have started playing with afl ourselves and looking at ways to make it and tis-interpreter interact more closely.

tis-interpreter is not publicly available yet because we would rather find its more embarrassing bugs ourselves. The bugs can be quite puzzling, and could induce a trusting user to report as a bug in the target code what was a correct idiom and the symptom of a bug in tis-analyzer. No-one involved would gain from this happening.

Sometimes, bugs in tis-interpreter may confuse because of the unusual (for Frama-C) scale of the programs being tackled. Imagine that you were applying last month’s beta version of tis-interpreter to the following program:

#include <string.h>

… 11 823 lines of robust function definitions

int main(void) {

  … 5 367 lines of perfectly good code

  strlen(NULL);

  … 7 412 lines of harmless C

}

Last month’s tis-interpreter, because of a bug in the handling of the specific function call strlen(NULL), would only tell you that this program does not terminate, with absolutely no additional information. Previous experience with Frama-C’s Value Analysis might help you infer that it had found that something was wrong along the way and simply forgotten to say what, but even that in itself does not help identify either the strlen(NULL) bug in the target code or the strlen(NULL)-handling bug in tis-interpreter.

Needless to say, I omitted the resolution of this particularly shameful bug in the official progress report I sent the CII. This one can stay between us.