Outlining the language C programs should be written in

Rich Felker, maintainer of the musl libc, recently tweeted to ask:

Anyone want to guess what gcc generates for a==b?0:a-b ? Any ideas why?

The answer to the first question is that when a and b are two pointers to char, GCC generates a sequence of several instructions, including a “conditional move” instruction, for the source-level construct a==b?0:a-b:

movq    %rdi, %rax
movl    $0, %edx
subq    %rsi, %rax
cmpq    %rsi, %rdi
cmove   %rdx, %rax

Of these five instructions, just the first and the third would be sufficient to produce the same result. On the architecture targeted by GCC here, applying the SUB instruction to two pointers that satisfy a==b can only produce a result of zero. In the case where a is a pointer one-past to some object and b a pointer to the beginning of another object, the standard allows a==b to be true or false, but in this case the source-level construct a-b is undefined, and any value computed by SUB is a valid result too.

One way to make GCC generate only the first and the third instructions would be to write a-b. Unfortunately, the C standard leaves this expression undefined when a and b are both null pointers, whereas a==b is defined (and is true). If GCC documented that it allows a null pointer to be subtracted from another (and that the result is 0), this would be a safe alternative when targeting GCC. As far as I know, however, GCC does not document this behavior. Since GCC does not document it and the C standard leaves it undefined, GCC could implement a new optimization tomorrow that improves its SPECint score by 0.5% while breaking programs relying on this expression.

The situation is similar to that of implementing rotate in C in 2013. Shifting an uint32_t by 32-n is undefined if n can be 0, but, at the time, Clang would only optimize to the efficient ROL instruction the pattern (x << n) | (x >> (32-n)) , to the exclusion of n ? (x << n) | (x >> (32-n)) : x or (x << n) | (x >> (-n%31)). Sometimes, in the transition from C-as-it-was-written-in-1990 to modern C, the compiler forces developers to write code circumlocutions to protect themselves from UB-based optimizations, and then punishes them for the circumlocutions.

Writing a sound static analyzer for C implies, first and foremost, formalizing the language that the analyzer will allow programs to use. “Just using the standard as reference” is illusory: in several places, the C standard is ambiguous, or even self-contradictory, enough that what matters is the consensual interpretation that compiler authors have arrived to. Sometimes the situation may be too obscure to even have been discussed. A question I asked myself in 2011 is somewhat related to Rich’s: when a is a null pointer and b is a valid pointer, a < b is left undefined by the standard but compilers currently produce code that makes this expression evaluate to 1, and existing code relies on this. Then, we chose to warn for pointer comparisons and to allow the subtraction of null pointers. Maybe the latter allowance is a decision that should be revisited now, since there exists at least one person in the world, Rich, who cares about this case.

Mandatory “no sane compiler would optimize this” post-scriptum

LLVM assuming that since x << n appears in the source code, n cannot be greater than 32 has already caused unintended behavior.

In order to break programs that rely on a-b when both a and b are null pointers, a compiler would only need to propagate the information that a and b aren't null pointers from any pointer subtraction involving them. This is as easy as assuming that a pointer is not null if it was passed to memcpy, even if that was with a size that was not known to be nonzero, and GCC already does this.

Differences between the B method and Frama-C in Formal Methods

My interest was piqued by a question on the questions-and-answers site Quora (the irritating one that tries to get you to register by limiting the number of answers you can view in a month otherwise; no, now you’re thinking of Medium; I mean the one that was doing this from the beginning). The question was about the differences between the B method and Frama-C. The existing answer at the time was making some good points, but I thought the phrase “First you are supposed to write or find some C source codes” in the existing answer was backwards. So I wrote my own answer there.

An old quirky libksba bug

The libksba library, used by GnuPG, provides functions for parsing X.509 cryptographic certificates. I was testing libksba with TIS Interpreter a little over a year ago. One of the bugs I found then illustrates a point I would like to make now.

The bug

Consider this function, as it was present in libksba two years ago:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  gpg_error_t err;
  AsnNode n;
  char *algo;
  size_t nread;

  if (!cert)
    return NULL;  /* Ooops (can't set cert->last_error :-().  */

  if (!cert->initialized)
    {
       cert->last_error = gpg_error (GPG_ERR_NO_DATA);
       return NULL;
    }

  if (cert->cache.digest_algo)
    return cert->cache.digest_algo;

  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The source code above contains the bug. The bug can almost be found by inspection of the function, inferring contents of types and behavior of callees from the way they are used. I have only removed commented-out code from the original. If you think that you have identified the bug I found one year ago, but that it may depend how the function ksba_cert_get_digest_algo is used, you may be on the right track. Here is how ksba_cert_get_digest_algo is invoked in tests/cert-basic.c:

  oid = ksba_cert_get_digest_algo (cert);
  s = get_oid_desc (oid);
  printf ("  hash algo.: %s%s%s%s\n",
          oid?oid:"(null)", s?" (":"",s?s:"",s?")":"");

If on the other hand, even with this clue, you are still finding the entire function ksba_cert_get_digest_algo too tedious to review, here is a synthetic view that makes that bug stand out:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  …
  char *algo;
  …
  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The bug is that the automatic variable algo remains uninitialized until _ksba_asn_find_node is called. When that function succeeds, &algo is passed as argument of the function _ksba_parse_algorithm_identifier (without looking at it, we can assume that that function does initialize algo in all cases). The fact remains that when _ksba_asn_find_node fails because of the condition !n || n->off == -1 being true, the variable algo remains uninitialized until its “value” is returned as the result of the function.

The anecdote

The funny thing about this bug is that it is very easy to produce inputs for. I was using afl, an efficient and easy-to-use fuzzer, to generate testcases. This bug was the second one I found, right after I set up afl and TIS Interpreter to work together on libksba. It turned up just after afl generated an input that demonstrated a crash caused by an unrelated first bug. Whenever the target software crashes, afl generates a README.txt file to invite the user to report their success to afl’s author. This is smart timing: when using a fuzzer to find bugs, producing an input that causes a direct crash is one of the best possible outcomes. afl’s README file is thus a subtle declaration of victory: you only see it when afl has done a good job.

The README, a pure ASCII file intended for human consumption, of all inputs pretending to be X.509 certificates, also happens to trigger the uninitialized-variable-use bug described in the first part of this post. Let they who would not have used the shell command for i in findings_dir/crashes/* ; do run_in_tis_interpreter.sh $i ; done throw the first stone.

No great coincidence happened here, neither was the README.txt file generated designed to look fiendishly similar to a X.509 certificate. Any brute-force purely random fuzzer would have instantly produced a file that triggered the bug shown. In good time, afl would have found one too—and, importantly, it would have recognized that such an input caused a different execution path than less obviously incorrect certificates to be followed inside libksba’s cert-basic test program, and would have set it apart for later inspection.

Epilog

The crash initially found by afl was fixed in a timely manner by Werner Koch in commit a7eed17, and the uninitialized-variable-use bug described in this post was fixed in commit 3f74c2c.

I never formally reported any of afl’s numerous findings to afl’s author, despite the instructions in the input file. Well, this post can be my report: all the bugs reported by me in libksba in 2016 were found using afl and TIS Interpreter together. This post is but an example to illustrate how well these two tools work together.

afl had been used on libksba before (for instance this 2014 vulnerability is listed as having been found using afl). But although the uninitialized-variable-use bug is very shallow and can be evidenced by throwing any README file at cert-basic, the bug was probably missed because it did not cause a crash. Barring weird optimizations, when executing a binary not instrumented to detect the use of uninitialized variables, the variable’s value is what happened to be on the stack. The stack is full of valid addresses: addresses of variables and buffers that have previously served, but also addressed of call sites to return to. Reading from any of these addresses, it is likely that a zero will be found before the end of the page (code contains plenty of null bytes because of how constants in instructions are typically encoded, and data contains plenty of null bytes too, of course). This means that running cert-basic directly does not reveal, via a crash, that something wrong happened. All that happens is that instructions or (possibly secret) data is printed as if it were a string. Since there are no a priori expectations for what cert-basic should print when passed an invalid file, this is difficult to notice.

One solution, of course, is to have used MemorySanitizer (MSan for short) to compile libksba’s cert-basic. One might speculate that the bug was not found earlier because MSan’s instrumentation is incompatible with ASan’s, and that users of afl, if they use a sanitizer at all, use ASan in order to find the most dangerous memory bugs.

TIS Interpreter takes more time to run a test program than it takes to compile and execute this test program with MSan or ASan instrumentation, but TIS Interpreter finds all the problems that these two sanitizers find, and more.

afl can produce minimal test suites that exert all the different execution paths it was able to identify in the target programs. This feature enables TIS Interpreter to be used harmoniously together with it.

If the only way to use the two together was to run TIS Interpreter on every input that afl generated and discarded, then the collaboration between the two tools would be fruitless. The interesting inputs would be lost in a sea of uninteresting ones and TIS Interpreter’s comparatively slow speed would make it seem like it is not finding anything. But since afl can produce minimal test suites, covering a lot of different behaviors of a typical parser with a few hundred testcases, these few hundred testcases should definitely be run in the best detector of undefined behavior that exists, even if that detector uses up a few seconds per testcase. And, for C programs, the best detector of undefined behavior in terms of detected undefined behaviors is arguably TIS Interpreter.

Auditing zlib

zlib is a venerable multi-purpose compression library, first released in May 1995. The first time I installed GNU/Linux was in late 1995, so to me zlib has somehow always existed. Although now that I think about it, the zlib source code was distributed in the Unix compress format: had I been just a little bit more perceptive, I might have inferred that zlib had not existed forever, as a consequence of something else having come before it.

Together with New York based company Trail of Bits and under the auspices of Mozilla’s Secure Open Source program, TrustInSoft has completed an automated audit of zlib. Subtle issues were found, and fixed by zlib co-author Mark Adler.

Dan Guido makes a great argument for taking advantage of all that software tools can offer for auditing software. One of the tools used in this audit is tis-interpreter, to identify C undefined behaviors along the execution traces generated by Trail of Bits’s CRS.

How do you report bugs that you alone can see?

Do you remember the TV show The Invaders? It was enormously popular in France, much more than in the US where it originated. It tells the story of one David Vincent, who alone sees that currently working C programs have a serious defect and are at risk of getting translated to flawed binaries by evil C compilers.

David Vincent

I misremember. In the series, David Vincent is alone aware that Earth is being infiltrated by extra-terrestrial beings of human appearance.

Strict aliasing as some know it

Many will know “strict aliasing” as a set of rules that make it forbidden to pass the same address twice to the following C function:

int f(int *p, float *q) {
  *p = 1;
  *q = 2.0;
  return *p;
}

Indeed, the compiler confidently generates assembly code that assumes that the arguments are not some int variable’s address &x and (float*)&x:

f:                                      # @f
        movl    $1, (%rdi)
        movl    $1073741824, (%rsi)     # imm = 0x40000000
        movl    $1, %eax
        retq

This is in accordance with the rules described in the C standard. The function f in itself is fine and can be used validly with arguments that do not alias, but passing arguments that make *q write to an int variable causes undefined behavior.

Strict aliasing as implemented in modern optimizing compilers

Not so many may know that compilers also assume that p and q do not alias in function g:

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

The function g always returns 1, according to Clang and GCC:

g:                                      # @g
        movl    $1, (%rdi)
        movl    $2, (%rsi)
        movl    $1, %eax
        retq

It is not clear that the standard allows them to do that(*). Not long ago, GCC optimized the following version, showing that any pointer computed as the address of a struct was assumed not to point to any other struct even when the lvalue used locally did not show any trace of this.

int g(struct object *p, struct thing *q) {
  int *int_addr1 = & p -> id;
  int *int_addr2 = & q -> id;
  *int_addr1 = 1;
  *int_addr2 = 2;
  return *int_addr1;
}

This function is also compiled as always returning 1:

g:
        movl    $1, (%rdi)
        movl    $1, %eax
        movl    $2, (%rsi)
        ret

(*) Here is an argument demonstrating that passing the same address for both argument of g does not break strict aliasing rules, and therefore implying that the compiler should produce code that works in this case. I understand the author of the comment to be close to the matter at hand, being the implementer of the first type-based alias analysis in GCC.

How to report bugs

So we have been working on a pretty neat thing, and it is now just working well enough to give its first results. The thing is a very early version of a static analyzer that detect violations of strict aliasing as described in the standard, such as the first example above, and as actually implemented by compilers, such as the subsequent examples.

The results are diagnostics of strict aliasing violations in widely used open-source software. How would you go about reporting these to the maintainers of the software?

It seems important to report them: type-based alias analyses have broken programs that their authors expected to work in the past, and in fact they break them again every time a Gentoo user recompiles the Linux kernel without the option -fno-strict-aliasing. It is possible that these optimizations will not become more programmer-hostile than they are now (fingers crossed), and one may think that if they did not break a particular program (that violates the rules), they never will, but compiler implementers are hard at work on inter-procedural and link-time optimizations, all of which will make information available that wasn’t before, and allow strict aliasing optimizations to fire where they didn’t.

In the particular case of the bug being reported, we are quite sure of the analyzer’s findings, but the analyzer is a bit too experimental to release yet. Not that this would necessarily help: these alien-detecting sunglasses may, to a proud developer, seem like sunglasses with aliens painted on. The analyzer is the first of its kind, too, so there is little hope of confirming the findings with another comparable alien detector.

Pointer conversion misuses of the kind of the first example are easy to recognize: a float is being converted to an int containing its representation, or vice-versa, and one only needs to convince the maintainer of the software that there are better, if more verbose, ways to do this. On the other hand, the assumption that structs can be used in the way shown in the example above is rooted even deeper in the C programming ways. Not only will it be harder to convince developers that what the code is doing is dangerous, but since any genericity at all in C is only obtained through pointer conversions, it is not easy to show which among them are invalid, without seeming to appeal to authority. Note how in the program below, the pointer conversion takes place where the function g is invoked, which isn’t the place where the strict aliasing violation takes place (it takes place inside g). Just looking at the function main, it is not obvious that the pointer conversion is one that leads to a strict aliasing violation, as opposed to one that is the only way not to implement twenty different qsort functions.

#include <stdio.h>

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

int main(void) {
  struct thing saucer;
  g((struct object*)&saucer, &saucer);
}

I have been reporting subtle errors, like use of uninitialized memory, by offering a patch that should not change the program’s behavior if the program wasn’t using uninitialized memory, and that makes it evident that it does. Here is one recent example. Perhaps the same approach can be used here. That is, for reporting a problem in the program above, one might offer the patch below, and hope that the maintainer is not already having a bad day before receiving the bug report.

$ diff -u orig.c instrumented.c
--- orig.c	2016-06-25 20:40:38.819893396 +0200
+++ instrumented.c	2016-06-25 20:35:25.253912642 +0200
@@ -6,7 +6,9 @@
 
 int g(struct object *p, struct thing *q) {
   p -> id = 1;
+  printf("writing 2  to  %p as thing\n", (void*) & q -> id);
   q -> id = 2;
+  printf("reading %d from %p as object\n", p -> id, (void*) & p -> id);
   return p -> id;
 }
 
$ ./a.out 
writing 2  to  0x7ffe3e7ebbd0 as thing
reading 2 from 0x7ffe3e7ebbd0 as object

Bear in mind that in real code, the two sites may be in different functions, or even different files. The patch above is pointing out the obvious only because the strict aliasing violation is obvious in this 6-line example.

Will this be convincing enough? I will let you know…

This blog post owes to the Cerberus project for pointing out the actual situation with modern C compilers and structs, John Regehr for writing a summary of all the ways strict aliasing-based optimizations break programs and proofreading this post, Loïc Runarvot for implementing the analysis prototype. Alexander Cherepanov suggested the use of the verb “to fire” to describe C compiler optimizations being triggered.

Trap representations and padding bits

The C programming language does not hide from you how the values you manipulate are represented. One consequence is that when padding happens, its presence may have observable effects in carelessly crafted programs. Padding is well-known to appear between members of a struct, and also possibly after the last member of a struct. The remaining space in a union when the active member is not the widest one is also considered padding. A C programmer that only cares for usual x86 platforms might be excused for thinking that, for them, this is it. As for trap representations, these may be believed to be reserved for weird hardware that use one’s complement or explicit parity bits.

Padding in structs and unions

Naïve attempts at making an x86-64 compiler take advantage of the unspecified nature of struct padding fail, as in example functions f, g and h, but the function i, provided by Alexander Cherepanov, shows that the padding of a struct x does not have to be copied along the rest of the struct after the two consecutive struct assignments y = x; z = y;:

int i(void)
{
  struct { char c; int i; } x, y, z;
  memset(&x, 1, sizeof x);
  memset(&z, 0, sizeof z);

  y = x;
  z = y;

  return *((char *)&z + 1);
}

The function i is optimized to return 0;, meaning that the entirety of the memory dedicated to x was not copied from x to z.

These occurrences of padding can be prevented when programming for a specific architecture, or a couple of architectures(*), with interstitial character array members. If, in a new project, you are implementing treatments so sophisticated that they require you to define a struct, C may be the wrong language to use in 2016. However, if you are going to do it anyway, you might want to insert these explicit interstitial members yourself: nothing the compiler can do with these unused bytes is worse than what the compiler can do with padding. Clang has a warning to help with this.

Consequences

Since padding does not have to be copied in a struct assignment, and since any struct member assignment can, according to the standard, set padding to unspecified values, memcmp is in general the wrong way to compare structs that have padding. Sending an entire struct wholesale to an interlocutor across the network (by passing its address to write) can leak information that was not intended to get out.

The rest of this post discusses padding and trap representations in scalar types, which we show that our exemplar C programmer for usual x86 platforms might encounter after all. Also, padding in scalar types cannot be eliminated with just additional members, so the problem, if rarely noticed, is in some ways more annoying than struct padding.

Padding in scalar types

It would be easy to assume that in such an ordinary architecture as x86, padding only happens because of structs and unions, as opposed to scalar types. The x86 architecture is not that ordinary: its first historical floating-point type occupied 80 bits; many compilation platforms still make that floating-point type available as long double, for some reason as a type of 12 bytes or 16 bytes of total (respectively in 32-bit and 64-bit mode), including respectively 2 and 6 bytes of padding. Padding may or may not be modified when the value is assigned, as shown in another example from Alexander Cherepanov:

int f(void)
{
  long double x, y;
  memset(&x, 0, sizeof x);
  memset(&y, -1, sizeof y);

  y = x;
  return ((unsigned char *)&y)[10];
}

The function f above is compiled as is if was return 255;, although the entire memory assigned to x was set to 0 before copying x to y with y = x;.

Trap representations

Trap representations are a particular case of padding bits. A symptom of padding bits is that a type t has fewer than 2CHAR_BIT ✕ sizeof(t). In the case of trap representations, some of the bit patterns are considered erroneous for type t. Accessing such erroneous representations with an lvalue of type t is undefined behavior.

The C11 standard latest draft contains this footnote(**):

53) Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types. All other combinations of padding bits are alternative object representations of the value specified by the value bits.

This footnote is in the context of the representation of integer types, which are made of value bits, padding bits, and in the case of signed integers, one sign bit (6.2.6.2).

“I have no access to no parity bits,” the ordinary programmer might think. “Even if redundant internal representations are part of the implementation of my high-performance computer, the interface presented to me is that of ordinary memory, where all bits are value bits (with sometimes a sign bit).”

In fact, as implemented by GCC and Clang, the _Bool type has two values and 254 trap representations. This is visible on the following example:

int f(_Bool *b) {
  if (*b)
    return 1;
  else
    return 0;
}

The function f in this example, as compiled, returns 123 when passed the address of a byte that contains 123. This value does not correspond to any of the two possible execution paths of the function. Undefined behavior is the only excuse the compiler has for generating code with this behavior, and this means that GCC and Clang, for the sake of this optimization, choose to interpret bytes containing a value other than 0 or 1 as trap representations for the type _Bool.


This post owes to Alexander Cherepanov’s examples, John Regehr’s encouragements and Miod Vallat’s remarks.

Footnotes:

(*) I would have recommended to use fixed-width integer types from stdint.h, such as uint32_t, to make the layout fixed, but that does not solve the problem with floating-point types or pointer types.

(**) The latest C11 draft contains two very similar footnotes 53 and 54 for which it is not clear that both were intended to be present in the final revision.

A non-exhaustive list of ways C compilers break for objects larger than PTRDIFF_MAX bytes

One well-documented way in which a program may break in presence of objects of more than PTRDIFF_MAX bytes is by doing a pointer subtraction that results in an overflow:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  char *p = malloc(0x81000000);
  if (!p) abort();
  char *q = p + 0x81000000;
  printf("q - p: %td\n", q - p);
}

On the Ubuntu Linux version that I have conveniently at hand, this program, compiled to 32-bit code, produces:

$ gcc -m32 t.c && ./a.out 
q - p: -2130706432

Strange results are acceptable because the overflow in pointer subtractions is undefined behavior:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.

Do not stop reading if you already know all this! Things are going to get interesting in a minute!

Incidentally, when I tried this program on OS X 10.9 with clang -m32 t.c && ./a.out, I obtained a warning that malloc had failed, and my program aborted without doing the dangerous q - p pointer subtraction. In effect, a malloc function that rejects allocations of more than half the address space goes a long way towards preventing surprising behaviors. This is interesting because at least as late as OS X 10.5, I routinely allocated 2.5GiB from 32-bit processes. On the OS X platform, the decision was courageously made to break some 32-bit applications that used to work in order to prevent some vaguely worrying problems. But then again, preserving application compatibility was never the foremost preoccupation of the Mac platform.

Even if you already knew all this, keep reading.

Subtraction of pointers to wider types

On a 32-bit platform

GCC and Clang’s behavior on the following example is not excused by clause 6.5.6p9:

#include <stdlib.h>

int main(void) {  
  short *p, *q;

  p = malloc(0x70000000 * sizeof *p); // note: product does not overflow
  q = p + 0x70000000; // note: stays in-bounds
  return q - p; // note: does not overflow ptrdiff_t
}

This program prints -268435456. Wait, what?

The example can be made more striking by making the expected result of the pointer subtraction smaller:

#include <stdlib.h>

typedef char (*arrayptr)[0x70000000];

int main(void) {  
  arrayptr p, q;

  p = malloc(2 * sizeof *p);
  q = p + 2;
  return q - p;
}

Both Clang and GCC make this program return 0 when it’s very obvious that the correct result is 2. As a psychological experiment, showing these related examples to C programmers, the average subject might reluctantly let themselves be convinced that -268435456 is 0x70000000, but they are quite certain that 2 should not be 0, and express more surprise for the second example.

The answer to the riddle—but not really an excuse in itself for Clang and GCC’s behavior—is contained in the following decomposition of pointer subtraction into elementary steps. The “sarl” instruction implements a signed operation that does not produce the correct result when the value in %eax should be interpreted as a number above 231. When the element size is not a power of two, the compiler uses a division instruction but the problem remains the same: the sequence generated by the compiler implement a signed division that doesn’t work above 231.

Partially excusing GCC’s behavior, it is somewhat documented (in replies to bug reports) that 32-bit GCC does not work with malloc functions that succeed for requests of more than 2GiB of memory. As long as the malloc calls fail in the examples above, GCC is not doing anything wrong, because the program invokes undefined behavior as soon as p + … is computed. (It is illegal to do pointer arithmetic from a null pointer.)

And indeed, if we re-introduce the malloc success check that I surreptitiously removed from the initial example, GCC does the right thing. “The right thing” being making a call to malloc, which, as semi-documented, should fail for the requested size in order to be compatible with GCC.
It’s a shame that 32-bit Glibc’s malloc does not systematically fail for such requests, but that is not GCC’s problem. It is the problem of the programmer who links together code generated from GCC and code from Glibc (note that Glibc is not intended to be compiled with a compiler other than GCC, which is unfortunate considering).

Clang, the most honey badger-like of compilers, don’t give a shit.

64-bit platforms

64-bit solves all these problems, because a malloc call requesting half the address space always fails, right?
It’s a shame that there was a difficult trade-off on 32-bit platforms, where the problem was fully realized only after some applications had gotten used to requesting 2.5GiB and getting away with it, but as even the smallest computers are moving to 64-bit, we can consider this problem solved by Moore’s law? 263-1 bytes ought to be enough for anybody?

Not really. The malloc call in the program below succeeds (the program would return 0 if the malloc invocation failed), despite Clang having the same half-the-address-space limitation for 64-bit as for 32-bit.

#include <stdlib.h>

typedef char (*arrayptr)[0x700000000000000];

int main(void) {  
  arrayptr p, q;

  p = malloc(32 * sizeof *p);
  if (!p) return 0;
  q = p + 32;
  return q - p;
}

The program not returning 0 indicates that malloc succeeded. The program returning a nonzero value other than 32 indicates that Clang is confused.

Comparisons of pointers

One might consider it an acceptable trade-off to let malloc allocate 2.5GiB on 32-bit platforms, if the developer knows that they never subsequently subtract pointers to the allocated zone. Not as char* and not as pointers to another type, since compilers are broken for these subtractions too, even when C11’s 6.5.6:9 clause perfectly defines the result.

Unfortunately, an optimizing compiler’s code appears to be littered with optimizations that assume object sizes are less than PTRDIFF_MAX. As an example, we all know that X86 Clang and GCC implement pointer comparison as an assembly unsigned comparison (“cmpl” is an agnostic comparison instruction, but here it is followed by “cmoval”, where the “a”, for “above”, indicates conditional move based on an unsigned comparison). This means that it is allowed to compare pointers inside a putative 2.5GiB array on a 32-bit platform, right?

Not all pointer comparisons.

Because of this optimization.

Note the “g” in “cmovgl”. This “g” stands for “greater”: a mnemotechnic(?) indicator that a signed comparison decides which string is returned. Determining what happens with offs1=0 and offs2=0x81000000, two values representable in a 32-bit size_t that, with a lenient malloc, can also be valid offsets into a same array, is left as an exercise for the reader.

Conclusion

It is impossible to tell when the compiler will apply this or any other optimization that assumes that a pointer difference is always representable as a ptrdiff_t. Therefore, the only safe path on 32-bit platforms is to refrain from allocating more than 2GiB in a single malloc call; it might be a good idea if Glibc’s malloc was changed to implement this limitation. Compilers could do much, much better than they currently do. Nothing would prevent malloc(0x81000000) to cause a warning (I suggest “malloc call will always return NULL” or something equally direct). Beyond the friendliness of warnings, Clang’s elimination of malloc calls that, because of Clang limits, cannot succeed, as calls that cannot fail indicate a disturbing misunderstanding of classical logic.

Acknowledgements

Alexander Cherepanov and John Regehr provided most of the examples in this post, with additional remarks coming from Daniel Micay and Rich Felker. Matt Godbolt made the absolute best platform on which to investigate these examples and show them to the world.

Out-of-bounds pointers: a common pattern and how to avoid it

A quick tis-interpreter update

At long last, tis-interpreter was released as open-source last month. It is somewhat difficult to install, and as a remedy I will regularly prepare binary snapshots for Linux (and maybe OpenBSD, OS X). At the time of this writing, the first publicly available binary snapshot can be downloaded from the bottom of the tis-interpreter homepage.

The rest of this post is about extremely recent evolutions in tis-interpreter.

A newfound strict stance on pointer arithmetic

As tis-interpreter was getting easier to use and was getting used, the feedback from applying it led to what may seem like drastic changes. Here is one example.

Among other software components, John Regehr applied tis-interpreter to Google’s implementation of their WebP image format, and then to SQLite. The former is only a compressed format for static images, but the latter is a large library with an even larger testsuite. Both were worth looking at, but the latter was a significant effort.

In libwebp, among other problems, John reported a subtle bug in a pointer comparison. libwebp would check whether the size declared for a sub-object was consistent by comparing what would be the address to continue parsing at (after the sub-object, if it had the size it said it had) to the end pointer. This idiom is dangerous. tis-interpreter warned for the dangerous comparison, I wrote a short essay, John reported the bug, and it was the end of that story.

On the SQLite front, progress was slow. The very same “dangerous pointer comparison” warning was being emitted because of one-indexed arrays implemented as int *one_indexed = ptr - 1;, as explained in John’s post. John complained that the “pointer comparison” warning was emitted on an eventual one_indexed == NULL, arbitrarily far from the allocation of the one-indexed array, which was the real culprit, and in any case, the line he wished to change so as to suppress all warnings coming from that pointer in one swoop. After multiple requests, I finally implemented detection of invalid pointer arithmetic in tis-interpreter, so that the warning would be on ptr - 1 instead.

A surprise lurking in already visited code

This week, my colleague Anne Pacalet found herself going back to analyzing libwebp. She naturally started by using the latest version of tis-interpreter, with its all-new early warnings about invalid pointer arithmetic. She discovered that apart from the reported, since long fixed end_ptr idiom where a pointer was compared after going out of bounds, other invalid pointers are computed inside libwebp. This happens when decompressing examples provided as testcases by Google itself. The line sporting the tis-interpreter warning is this one, and the surrounding code looks like this:

  for (j = 0; j < height; ++j) {
    for (i = 0; i < width; ++i) {
      const uint32_t alpha_value = alpha[i];
      dst[4 * i] = alpha_value;
      alpha_mask &= alpha_value;
    }
    alpha += alpha_stride;
    dst += dst_stride; // goes out of bounds on the last iteration
  }

Do not confuse this with a buffer overflow or a similar immediate problem! The pointer dst goes out of bounds when it is computed at the end of the last iteration, and it is never used after that. Besides, it may look like this last value of dst is a one-past-the-end pointer as allowed by the C standard, but that is only true if dst started from 0.

The function in question is called from EmitAlphaRGB (through the WebPDispatchAlpha function pointer). At line 183, dst may have been incremented by 3 before being passed as argument, and in this case, the last value of dst is way out of bounds (it is four-past-the-end, but there is no “four-past-the-end” exception for pointer arithmetic in the C standard).

Fixing the problem

It may help both to communicate the problem and to convince the maintainers to remove it if it is reported with an example of change that would keep the computation the same and remove the undefined behavior. We looked at different variations. I thought I had a solution that wasn't too ugly when I arrived at the following change:

-  for (j = 0; j < height; ++j) {
+  j = 0;
+  while (1) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
       dst[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
+    if (j == height - 1) break;
+    ++j;
     alpha += alpha_stride;
     dst += dst_stride;
   }

The above change applied, the exit test is now after dst has been used in the iteration but before it is incremented, so that the ultimate undefined-behavior-invoking incrementation does not take place. The generated assembly for the code above seems fine, but there is a subtle difference: the original function never entered the loop if height was 0, whereas this one always does at least one iteration. This is something that would have to be discussed with the developers, but the problem does not seem worth a discussion. We could pre-emptively guard the loop with if (height>0), but then the source code gets uglier, at the very least, and the generated assembly code might be longer than what was generated for the original function.

Jamey Sharp's solution

Jamey Sharp provided a brilliant insight. His remark was: “what about just doing the obvious multiply and letting strength reduction turn it into this code? Easier to maintain and correct”

Translated to code, Jamey's remark amounts to this change:

   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      dst[j * dst_stride + 4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
   }

Indeed, we are removing more code than we are adding. It solves the undefined behavior problem because now, in the source code, j * dst_stride is only added to dst at the time of accessing the memory location (in the source code, dst + height * dst_stride is never computed). The beauty of this solution is that an optimizing compiler generates exactly the same assembly code, to the comma, for this version as it did for the original version:

$ diff -u old.s jamey.s
$

In other words, we stick to legal (and concise and readable) operations in the source code. The compiler does the tedious work of knowing that on the target platform, pointers are just integers and that a multiplication by the index inside a loop can be transformed into an addition. This is the correct division of labor between developer and computer.

Miod Vallat's solution

Miod Vallat pointed out that the solution above required you to trust that the compiler would always succeed at the strength-reduction optimization. Miod's solution instead expresses in the source code the fact that pointers are just integers. It should work fine for all the ordinary platforms we are used to, because of footnote 67 in the C11 standard.

   int i, j;
+  uintptr_t dst_int = (uintptr_t) dst;
 
   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      ((uint8_t*)dst_int)[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
+    dst_int += dst_stride;
   }

With Miod's solution too, the compiler (GCC 5.2) produces the exact same assembly as before:

$ diff -u old.s miod.s
$

Conclusion

The new “bug” discussed here is, as far as I know, harmless with current compilers. In this particular instance, by removing the undefined pointer arithmetic, we are making libwebp safer from agressive compiler optimizations for the next ten or fifteen years, but we are not solving any immediate problem. It is not always that clear-cut. The pointer comparison problem that had been reported earlier in libwebp, with current compilers and execution platforms, would never have noticeable consequences on the ordinary computers security researchers use to look for problems, but it could have consequences in a peculiar configuration.

The option remains to disable the warnings on the creation of invalid pointers: if that is what you want, use:

tis-interpreter.sh -no-val-warn-pointer-arithmetic-out-of-bounds …

Fiddly buffer overrun in OpenSSL

John’s blog is hosting a post, co-authored by me, about one of the more entertaining “bugs” reported by TrustInSoft in OpenSSL. In this case the behavior was intended, and looked like a good idea when the code was originally written. I see this anecdote as a continuation of the “sociology of open-source security fixes” series. Unlike the situation in the first two post of that series, here the codebase has a long history, with some parts of it going back twenty years, but practices have changed and compilers have changed.

The incongruous pattern was present in both LibreSSL and OpenSSL, and has now been cleaned-up from both. Many thanks to all maintainers of aging open-source software, it is a difficult task.