Differences between the B method and Frama-C in Formal Methods

My interest was piqued by a question on the questions-and-answers site Quora (the irritating one that tries to get you to register by limiting the number of answers you can view in a month otherwise; no, now you’re thinking of Medium; I mean the one that was doing this from the beginning). The question was about the differences between the B method and Frama-C. The existing answer at the time was making some good points, but I thought the phrase “First you are supposed to write or find some C source codes” in the existing answer was backwards. So I wrote my own answer there.

Achieve ISO-26262 compliance with TrustInSoft Analyzer

In one year, over 60 million vehicles were recalled in the U.S. With more than 50% of recalls resulting from bad code, the stakes have never been higher. In today’s connected world, there’s no room for guesswork—you need safety guarantees. 

Design and implementation software validation expenses have risen dramatically since ISO-26262 went into effect. TrustInSoft’s award-winning technology provides key functions that will help you cost-effectively ensure ISO-26262 compliance and maintain unprecedented software safety guarantees.

Learn how TrustInSoft Analyzer allows you to Reduce Risk & Comply with ISO-26262

Download our white paper here:

Download

This document describes how to use the TrustInSoft Analyzer platform in order to automatically verify the source code requirements of the ISO-26262 standard. It contains an extract of the ISO-26262 standard that lists all the requirements related to the source code as well as their recommendation tables for the required ASIL.

Variadic Functions in TIS Interpreter

Working on improving and applying TIS Interpreter, with funding from the CII, we ended up implementing support for program-defined variadic functions. Recently, while applying TIS Interpreter to the musl standard C library implementation, we found one slight violation of the C standard in the way musl defined functions of the printf family. This post describes program-defined variadic functions and the ways they can be misused, and argues that they are worth supporting in TIS Interpreter.

What are variadic functions and why should we care?

Variadic functions, found in several programming languages, accept a variable number of arguments. As an example, we can imagine a little variadic function add(arg1, arg2, …, argn) that takes some numbers as arguments, and computes their sum, arg1 + arg2 + … + argn, whatever the number of passed arguments actually is.

In C, variadic functions are not that popular, and a seasoned C programmer, anticipating what this is going to entail, would probably avoid specifying and implementing such an add function. Nevertheless, in some cases, the possibility of going beyond functions with fixed number and type of arguments is convenient. When, in the 1990s, C was still considered and used as a high-level language, variadic functions were a reasonable solution in situations where the flexibility they provided made a difference with more cumbersome alternatives. An especially good example of such a situation is for input and output formatting using format strings. Because of that, even though a contemporary C programmers would usually avoid defining new variadic functions, many are present in existing C software, notably in legacy code.

The printf function in C

Let us take a closer look at C’s ubiquitous printf function. Its declaration is a part of the stdio.h standard library header file and it looks like as follows:

int printf(const char * restrict format, ...);

The declaration states that the printf function takes:

The mandatory argument format here is the format string itself. The expected type of the remaining arguments is not established statically on the declaration level, but depends dynamically on the contents of this first argument. In the case of printf, this argument is a format string, and it is the conversion specifiers (the special patterns introduced with the % character) present in this format string that define the types and the number of arguments to be printed on the standard output. The conversion specifier %d expects an int argument and displays it in decimal style, the conversion specifier %c expects a char argument and displays it as an ASCII character, and so on; the list is long. The characters in the format string that are not conversion specifiers simply pass through unchanged.

A simple use of printf could be:

char c = 'X';
int i = 42;
printf("The %c is %d.", c, i);

This code snippet prints “The X is 42.” to the standard output when executed.

In a different context, the same printf function might be passed a double argument and a void* argument. Although this lax type handling has some advantages in the case of formatting functions (the same printf function can be used to print strings, floating-point values, pointers and integers), major dangers come with it. The main advantage is the flexibility that it offers: without it the whole printf function family could not exist in this form, and producing formatted outputs in C would be much more tedious. The danger is that this mechanism may give rise to a whole lot of significant type-safety related problems. If we follow section 7.21.6.1 of the ISO C11 standard dedicated to the fprintf function (of whom printf is a specialised version), we discover that both failure to provide enough arguments and a mismatch in a provided argument’s type cause Undefined Behavior. CWE-134: Use of Externally-Controlled Format String shows that the security risks involved, information leak and—through the %n formatter—foreign code execution, are real and should be taken seriously.

Home-brew variadic functions in C

The printf function is only the most ubiquitous example of a C function that takes a variable number of arguments. The C standard library not only contains several other variadic functions, but also provides the tools necessary for the programmer to define their own variadic functions. The ingredients are available in the stdarg.h header file.

Declaring, calling, and defining variadic functions

We have already seen with printf how to declare a variadic function. After providing a number of mandatory arguments, together with their types, we simply add an ellipsis (i.e. ...) after the last argument, which means additional arguments may follow:

int add(int n, ...);

We also know how to call such functions, as they are in fact called almost exactly like the normal ones. The fixed mandatory arguments are simply followed by the additional variable ones. Therefore the number of passed arguments can differ between different call sites of the same function:

int foo = add(3, 23, 12, 7);
int bar = add(2, 23, 19);

Defining variadic functions is a bit more complicated. The function’s mandatory arguments are accessed just the usual way, but a special mechanism is needed in order to access the variable ones. The stdarg.h header provides four variadic macros: va_start, va_arg, va_end, and va_copy, as well as the type va_list, which together make it possible to cycle through the additional arguments that have been passed to a variadic function:

  • The type va_list holds the necessary information needed by the variadic macros. The traditional name of a variable of this type is ap, for argument pointer.
    Intuitively, we can imagine that an object of this type contains a pointer to the next variadic argument (or a list of remaining variadic arguments). Of course, as the interface is abstract, the actual implementation of this type is not known to the library user. The C standard only describes how it will behave if used correctly.
  • The macro va_start(va_list ap, parmN) initializes the va_list object ap.
    Intuitively, after invoking va_start the ap object points to the first variadic argument passed to the function.
    (Note: The parmN parameter here must be the same as the name of the last non-variadic argument in the function definition, that is, the argument just before the .... The necessity of passing this parameter hints at the low-level details of the implementation.)
  • The macro invocation va_arg(va_list ap, type) has the type and the value of the next variadic argument. Also, each invocation of va_arg on ap modifies ap so that subsequent arguments are returned in sequence. The type provided to the va_arg macro should match the actual type of the next argument.
    Intuitively, va_arg returns the argument that ap points to and makes ap point to the next one.
  • Finally, the macro va_end(va_list ap) deinitializes the ap object. Each va_list object should be deinitialized before the function returns.
  • For simplicity’s sake we will ignore the va_copy macro here, as it is not necessary for the examples in this post.

There is no built-in way (such as an additional variadic macro) to know from the inside of a variadic function how many arguments and of which types have been passed to it. The only way is to provide this information explicitly and consistently at each call site, and to retrieve it inside the variadic function. The usual approach is to encode it using the function’s mandatory arguments, exactly like in the case of printf‘s format string.

A full example

In order to see how this all works, let us implement the closest approximation that can be achieved in C of the add function that sums a variable number of arguments:

int add(int n, ...) {
  va_list ap;
  va_start(ap, n);
  int sum = 0;
  for (; n > 0; n--) {
    int next_arg = va_arg(ap, int);
    sum += next_arg;
  }
  va_end(ap);
  return sum;
}

The internal workings of function add are pretty straightforward:

  1. The beginning:
    • First we declare a va_list object ap and we intialize it using the va_start macro. Now, intuitively, ap points to the first variadic argument of add.
    • Also, we initialize the sum variable to zero.
  2. The middle:
    • n times we consume the next variadic argument, using the va_arg(ap, int) macro. Note that we expect each of the variadic arguments to be of type int and we expect their number to be equal at least n.
    • We add the subsequent arguments’ values to the variable sum.
  3. The end:
    • We deinitialize the ap variable using the va_end macro.
    • We return the sum variable’s value, which is now (obviously) equal to the sum of the n variadic arguments.

Now, in order to perform a correct call to add we must at least make sure that:

  • the number of variadic arguments that we pass to the function is not smaller than the value of the argument n passed in the same call (it is all right if more arguments are passed than consumed),
  • and that all these n arguments are of type int.

Examples of some correct calls would be:

add(3, 23, 12, 7); /* == 42 */
add(1, 42); /* == 42 */
add(3, 23, 12, 7, 'x', 13.0, "hello"); /* == 42 */
add(0); /* == 0 */
add(-42); /* == 0 */
Variadic macros and the C standard

Now, please notice the inconspicuous at least, in “we must at least make sure that”, which found its way to the previous paragraph. Well, these are by far not the only conditions necessary to call the add function correctly. In fact the section of the C11 standard concerning variadic functions and variadic macros is complex and it provides numerous ways to introduce Undefined Behavior into programs that use the stdarg.h library header.

The C11 standard section 7.16, entitled Variable arguments <stdarg.h>, defines the semantics of the four variadic macros, as well as the va_list type, and thus decides how home-brew functions with variable arguments behave, and when they may misbehave. There are many constraints concerning correct usage of these macros, some pretty straightforward and direct, some subtle. Violating most of these constraints seems completely innocuous in practice for all common compilation platforms, while breaking others causes visible problems on some or all common compilation platforms.

Correct sequence of variadic macros invocations

Several constraints concerning the variadic macros are concerned with the order in which these macros are supposed to be invoked on a va_list object. The allowed order is strictly defined by the standard. The following rules paraphrase, in a simplified and more casual way, what the standard says on this subject (note that we omit, again, the va_copy macro):

  • Each va_list object ap begins its life uninitialized.
  • If ap is not initialized it may be initialized using the va_start macro.
  • Once ap has been initialized, va_arg macro may be invoked on it at most number of times equal to the number of variadic arguments which were passed to the function.
  • If a va_list variable has been initialized, it must be deinitialized in the same function before the function returns, using the va_end macro.
  • This sequence can be repeated any number of times: after deinitializing ap with va_end we can initialize it again with va_start, and iterate on all the variadic arguments again (from the beginning) with va_arg, and then deinitialize it again with va_end.
  • If any other sequence of events happens, the behavior is undefined.

This ends up to be a simple pattern, which resembles a finite state machine that validates how va_start, va_arg, and va_end (invoked on a given va_list variable) are allowed to be interwoven in a single function. Such machine for a 3 variadic arguments passed in the call would look like this:

Finite state machine validating variadic macros invocations for 3 arguments passed in the function call

Consuming too many variadic arguments

The interesting aspect of these rules is how the impact of violating each of them is differently visible in practice. For example, if we try to consume more variadic arguments than are available in the given call we will run into trouble quite quickly. Of course what happens exactly is compiler and platform dependent, but in most cases the implementation of the underlying Undefined Behavior will result in reading some random data from memory. Let us see a simple program that simulates such a situation:

/* FILE not_enough_args.c */

#include "stdarg.h"
#include "stdio.h"

int f(int fst, ...) {
  va_list ap;
  va_start(ap, fst);
  int x = va_arg(ap, int);
  printf("%d\n", x);
  va_end(ap);
  return x;
}

int main() {
  f(11);
  return 0;
}

In this program, the function f tries to consume one int variadic argument and in the call no argument is passed at all. On my machine compiling it with gcc and then executing ten times prints following values, which seem pretty random indeed:

1691332760
-262231160
1531375144
-1205239608
133570616
985826840
-1260096984
1604313000
-883866920
-473417640
-121339624
The unpleasant case of va_end macro

What may be more disturbing, disobeying many of the the other mentioned constraints concerning the sequence of variadic macro invocations will usually have no visible effect during both the program’s compilation and its execution. In particular, all the rules which involve the va_end macro appear to be optional in practice. The va_end macro is translated by most compilers on most architectures to a do-nothing operation (a fact recognised directly in the C standard rationale, section 7.15.1.3 : The va_end macro). As this macro is supposed to perform a clean-up after initializing and using the va_list object, and in most stdarg.h implementations there is simply nothing to clean up, thus the macro is actually not needed at all and may be just ignored. You might ask, why was it included in the standard in the first place? Following the C rationale again: those implementations that need it probably need it badly is the explanation.

The variadic arguments’ types

There are also several constraints which concern the types of the variadic arguments. Basically, the type provided when invoking the va_arg(ap, type) macro should be compatible with the actual type of the next variadic argument (with several well-defined exceptions, see the C11 standard section 7.16.1.1p2 for details). The danger related with the underlying Undefined Behavior is quite serious in this case, and definitely of the harmless-looking, but waiting to stab you in the back one day kind. As the size of C types may vary depending on compiler, the compilation options, and the platform, it is not hard to imagine situations when a given program works perfectly well on one configuration, where the two theoretically incompatible types happen to align well (and they happen to be passed through the function call exactly in the same way), and it fails miserably on another configuration, where exactly the same two types in the same circumstances do not behave the same anymore, and thus the variadic argument is recovered incorrectly, and suddenly there we have a nice shiny bug…

Let us look at a simplistic example that showcases the problem. In the following code we call the variadic function f passing two arguments, both of type long and value 42, and then we attempt to consume one variadic argument of size long long:

/* FILE type_mismatch.c */

#include "stdarg.h"
#include "stdio.h"

void f(int hello, ...) {
  va_list ap;
  va_start(ap, hello);
  long long b = va_arg(ap, long long);
  printf("%lld\n", b);
  va_end(ap);
}

int main(void) {
  f(0, (long) 42, (long) 42);
  return 0;
}

On my machine when I compile this source code with gcc using two different target options I get different results upon execution:

  • Option -m64 selects 64-bit x86-64 build. In this case the first variadic argument’s value 42 of type long is read correctly, as the sizes of the types long and long long seem to match: the program prints 42.
  • Option -m32 selects 32-bit i386 build. In this case the argument is read incorrectly: the program prints 180388626474 which is definitely not the value we expected.

This example is simplistic indeed, but it shows exactly the disturbing property that we were just talking about: on one configuration it works perfectly fine, and on another it does not. Of course here it is pretty evident that the two concerned types will not always match and that something might go wrong. However, if this kind of mismatch is well hidden in a much larger program which has many execution paths and #define directives all around the place, the existence of a potential problem will not be so obvious anymore. Furthermore, no amount of testing on a 64-bit i386 build will ever throw any doubt at the Works on My Machine certificate that we might have given this piece of code. But compiling and running on a 32-bit system tells a different story.

Support of variadic functions in TIS Interpreter

In TIS Interpreter we have recently implemented support for variadic functions written using the variadic macros from stdarg.h. TIS Interpreter, developed thanks to the funding of the CII and available as Open-Source, is an abstract interpretation tool for C programs, capable of finding and identifying a very large set of Undefined Behaviors and C standard violations. Now, with its newly gained proficiency in this particular domain, it can also discover problems concerning variable argument handling, like the ones mentioned above.

Let us see how TIS Interpreter handles the examples that we have introduced so far:

    • Interpreting the example where too many variadic arguments consumed produces the following warning:
not_enough_args.c:7:[value] warning: va_arg macro called when all the variadic arguments have been already used up; assert enough arguments
    • When we interpret the example where variadic argument types are not really matching we get:
type_mismatch.c:7:[value] warning: the actual type of the next variadic argument (long) does not match the type provided to the va_arg macro (long long); assert the type of each variadic arguments provided to a function matches the type given to the corresponding call to the va_arg macro
    • And as of the example that we did not explicitly state, with a va_end macro invocation removed from an otherwise correct program:
missing_va_end.c:13:[value] warning: local variable ap of type va_list in function add has been initialized using va_start or va_copy macro and has not been deinitialized by a matching va_end macro; assert va_list variable ap has been uninitialized using the va_end macro

All these warnings are extracted from the output of TIS Interpreter when we execute it directly on the code that we have seen here, simply like that:

$ tis-interpreter not_enough_args.c
The case of musl

Recently the support for variadic functions deemed itself very useful, as we were running musl code through TIS Interpreter. The musl library is an example of a perfect target for TIS Interpreter: it is an important and widely used Open Source component written in C, and it aims “to be correct in the sense of standards-conformance and safety”.

musl is a libc: an implementation of the standard library functionality described in the ISO C and POSIX standards. musl‘s main objectives are to be lightweight, fast, simple, free, and, as we have already emphasised, correct. It is a key component of the Alpine Linux, a security-oriented, small, simple, resource-efficient Linux distribution, very well adapted to use in software containers. The features of Alpine Linux make it a frequent choice for using in Docker containers, rumours say that it is even considered as the default platform option in the official Docker image library. And, as Docker is an extremely popular (the world’s leading according to its website) software container platform, musl happens thus to be a pretty widely deployed libc version. Hence our interest in it.

What did we find?

As musl is high-quality software written with standard-conformance in mind (POSIX standard for the interface it provides, C standard for its assumptions with respect to the compiler), we did not expect to find many issues to report. And effectively we have only managed to encounter minor transgressions of the C standard. One of these, which ultimately has been deemed important enough to be corrected, was present in the implementation of the printf and scanf variadic functions. In musl the implementation of these library functions is in fact based on the variadic macros from stdarg.h.

The issue was related to the type and value of the argument passed to the va_arg macro. Consider the program:

char dest[100];
int x = sprintf(dest, "%lld\n", -1LL);

These two lines are correct C. The type of the -1LL argument matches the format specifier %lld. Still, TIS Interpreter emits a warning when using this snippet to drive the sprintf implementation that was in musl at the time.

src/stdio/vfprintf.c:141: warning: although the type provided to the va_arg macro (unsigned long long) and the actual type of the next variadic argument (long long) are corresponding unsigned - signed variants of the same integer type, the actual value of the argument (signed) is negative thus it cannot be represented in the provided (unsigned) type

This warning refers to the Undefined Behavior that we have already come across earlier, described in C11 standard section 7.16.1.1:2:

(…) if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:

  • one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
  • one type is pointer to void and the other is a pointer to a character type.

So what happens here exactly? The next variadic argument at this point of the execution is the -1LL constant, which is a negative value of long long type. Deep inside the sprintf implementation, the va_arg macro expects at this moment is an argument of unsigned long long type. Though these two types are not compatible, we fall into one of two exception cases: one type is a signed integer type, the other type is the corresponding unsigned integer type. But this use of a type with a different signedness is only valid if the argument’s value exists in both the signed and the unsigned type, which is not the case for -1. Consuming -1LL with va_arg(…, unsigned long long) is undefined. And that is exactly what TIS Interpreter is warning about here.

A short investigation led to the va_arg invocation that consumed this variadic argument. It was the one at line 141 of the vprintf.c file, effectively expecting unsigned long long:

break; case ULLONG: arg->i = va_arg(*ap, unsigned long long);

The cause behind the issue is optimization in the vprintf.c file. Two symbols, LONG_IS_INT and ODD_TYPES, are defined conditionally and then employed, using #ifdef directives, to fiddle with the enum type related with handling the conversion specifiers and with the switch cases which select the correct va_arg invocation in the function pop_arg. Let us see exactly how the ODD_TYPES makes us get to the switch case with the unsigned long long type:

First, the symbol ODD_TYPES is defined or not, depending on the representation of certain types on the platform:

#if SIZE_MAX != ULONG_MAX || UINTMAX_MAX != ULLONG_MAX
#define ODD_TYPES
#endif

Then ODD_TYPES decides if LLONG is an actual enumeration tag or just a synonym for the ULLONG tag:

enum {
  /* ... */
  PTR, INT, UINT, ULLONG,
  /* ... */
#ifdef ODD_TYPES
  LLONG, SIZET, IMAX, UMAX, PDIFF, UIPTR,
#else
#define LLONG ULLONG
/* other #define directives here... */
#endif
  /* ... */
};

Finally, inside the pop_arg function’s switch statement the case corresponding to LLONG is conditionally avoided (as LLONG is in this situation just an alias for ULLONG, that would be de facto second ULLONG case in the switch):

switch (type) {
  /* ... */
  break; case ULLONG: arg->i = va_arg(*ap, unsigned long long);
  /* ... */
#ifdef ODD_TYPES
  break; case LLONG: arg->i = va_arg(*ap, long long);
  /* ... */
#endif

So what purpose did these type-related shenanigans serve? This optimization can shave off a few bytes from the compiled code by unifying certain execution paths: if these different types have the same underlying representation, they can be both treated in the same way. Unfortunately, as we have just seen, this optimization also introduces Undefined Behavior. After (see the discussion on the musl mailing list), a cleanup patch has been applied by Rich Felker, the primary author of musl, in this commit.

Conclusion

In an ideal world, this kind of optimization would not need to exist at the C level in the first place. If the compiler recognised what was going on in this situation, i.e. that two or more execution paths are equivalent on a given architecture, these two switch cases could be merged at compile-time. Then the programmer could just stick to the C standard, express their intentions, and get an executable as small as when applying the hack discussed above.

Luckily, in these particular circumstances, the efficiency impact of removing altogether this dubious optimization was negligible, so the choice was easy to make. In other cases though, if the difference in efficiency was more substantial, it might be less clear if sticking to the C standard is worth the price.

As a side-effect, this lead to a short discussion about compilers producing efficient code from the type-safe and correct version that musl now uses exclusively. It is a pleasant convergence that a problem uncovered in musl with TIS Interpreter revealed an instance of a compilation challenge which was being worked on at the same time.

Acknowledgments: Shafik Yaghmour provided comments on an early version of this post, and it was edited by Pascal Cuoq. Joakim Sindholt provided the cleanup patch that was merged into the musl tree by Rich Felker.

Want to know more: 

An old quirky libksba bug

The libksba library, used by GnuPG, provides functions for parsing X.509 cryptographic certificates. I was testing libksba with TIS Interpreter a little over a year ago. One of the bugs I found then illustrates a point I would like to make now.

The bug

Consider this function, as it was present in libksba two years ago:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  gpg_error_t err;
  AsnNode n;
  char *algo;
  size_t nread;

  if (!cert)
    return NULL;  /* Ooops (can't set cert->last_error :-().  */

  if (!cert->initialized)
    {
       cert->last_error = gpg_error (GPG_ERR_NO_DATA);
       return NULL;
    }

  if (cert->cache.digest_algo)
    return cert->cache.digest_algo;

  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The source code above contains the bug. The bug can almost be found by inspection of the function, inferring contents of types and behavior of callees from the way they are used. I have only removed commented-out code from the original. If you think that you have identified the bug I found one year ago, but that it may depend how the function ksba_cert_get_digest_algo is used, you may be on the right track. Here is how ksba_cert_get_digest_algo is invoked in tests/cert-basic.c:

  oid = ksba_cert_get_digest_algo (cert);
  s = get_oid_desc (oid);
  printf ("  hash algo.: %s%s%s%s\n",
          oid?oid:"(null)", s?" (":"",s?s:"",s?")":"");

If on the other hand, even with this clue, you are still finding the entire function ksba_cert_get_digest_algo too tedious to review, here is a synthetic view that makes that bug stand out:

const char *
ksba_cert_get_digest_algo (ksba_cert_t cert)
{
  …
  char *algo;
  …
  n = _ksba_asn_find_node (cert->root, "Certificate.signatureAlgorithm");
  if (!n || n->off == -1)
    err = gpg_error (GPG_ERR_UNKNOWN_ALGORITHM);
  else
    err = _ksba_parse_algorithm_identifier (cert->image + n->off,
                                            n->nhdr + n->len, &nread, &algo);
  if (err)
    cert->last_error = err;
  else
    cert->cache.digest_algo = algo;

  return algo;
}

The bug is that the automatic variable algo remains uninitialized until _ksba_asn_find_node is called. When that function succeeds, &algo is passed as argument of the function _ksba_parse_algorithm_identifier (without looking at it, we can assume that that function does initialize algo in all cases). The fact remains that when _ksba_asn_find_node fails because of the condition !n || n->off == -1 being true, the variable algo remains uninitialized until its “value” is returned as the result of the function.

The anecdote

The funny thing about this bug is that it is very easy to produce inputs for. I was using afl, an efficient and easy-to-use fuzzer, to generate testcases. This bug was the second one I found, right after I set up afl and TIS Interpreter to work together on libksba. It turned up just after afl generated an input that demonstrated a crash caused by an unrelated first bug. Whenever the target software crashes, afl generates a README.txt file to invite the user to report their success to afl’s author. This is smart timing: when using a fuzzer to find bugs, producing an input that causes a direct crash is one of the best possible outcomes. afl’s README file is thus a subtle declaration of victory: you only see it when afl has done a good job.

The README, a pure ASCII file intended for human consumption, of all inputs pretending to be X.509 certificates, also happens to trigger the uninitialized-variable-use bug described in the first part of this post. Let they who would not have used the shell command for i in findings_dir/crashes/* ; do run_in_tis_interpreter.sh $i ; done throw the first stone.

No great coincidence happened here, neither was the README.txt file generated designed to look fiendishly similar to a X.509 certificate. Any brute-force purely random fuzzer would have instantly produced a file that triggered the bug shown. In good time, afl would have found one too—and, importantly, it would have recognized that such an input caused a different execution path than less obviously incorrect certificates to be followed inside libksba’s cert-basic test program, and would have set it apart for later inspection.

Epilog

The crash initially found by afl was fixed in a timely manner by Werner Koch in commit a7eed17, and the uninitialized-variable-use bug described in this post was fixed in commit 3f74c2c.

I never formally reported any of afl’s numerous findings to afl’s author, despite the instructions in the input file. Well, this post can be my report: all the bugs reported by me in libksba in 2016 were found using afl and TIS Interpreter together. This post is but an example to illustrate how well these two tools work together.

afl had been used on libksba before (for instance this 2014 vulnerability is listed as having been found using afl). But although the uninitialized-variable-use bug is very shallow and can be evidenced by throwing any README file at cert-basic, the bug was probably missed because it did not cause a crash. Barring weird optimizations, when executing a binary not instrumented to detect the use of uninitialized variables, the variable’s value is what happened to be on the stack. The stack is full of valid addresses: addresses of variables and buffers that have previously served, but also addressed of call sites to return to. Reading from any of these addresses, it is likely that a zero will be found before the end of the page (code contains plenty of null bytes because of how constants in instructions are typically encoded, and data contains plenty of null bytes too, of course). This means that running cert-basic directly does not reveal, via a crash, that something wrong happened. All that happens is that instructions or (possibly secret) data is printed as if it were a string. Since there are no a priori expectations for what cert-basic should print when passed an invalid file, this is difficult to notice.

One solution, of course, is to have used MemorySanitizer (MSan for short) to compile libksba’s cert-basic. One might speculate that the bug was not found earlier because MSan’s instrumentation is incompatible with ASan’s, and that users of afl, if they use a sanitizer at all, use ASan in order to find the most dangerous memory bugs.

TIS Interpreter takes more time to run a test program than it takes to compile and execute this test program with MSan or ASan instrumentation, but TIS Interpreter finds all the problems that these two sanitizers find, and more.

afl can produce minimal test suites that exert all the different execution paths it was able to identify in the target programs. This feature enables TIS Interpreter to be used harmoniously together with it.

If the only way to use the two together was to run TIS Interpreter on every input that afl generated and discarded, then the collaboration between the two tools would be fruitless. The interesting inputs would be lost in a sea of uninteresting ones and TIS Interpreter’s comparatively slow speed would make it seem like it is not finding anything. But since afl can produce minimal test suites, covering a lot of different behaviors of a typical parser with a few hundred testcases, these few hundred testcases should definitely be run in the best detector of undefined behavior that exists, even if that detector uses up a few seconds per testcase. And, for C programs, the best detector of undefined behavior in terms of detected undefined behaviors is arguably TIS Interpreter.

Auditing zlib

zlib is a venerable multi-purpose compression library, first released in May 1995. The first time I installed GNU/Linux was in late 1995, so to me zlib has somehow always existed. Although now that I think about it, the zlib source code was distributed in the Unix compress format: had I been just a little bit more perceptive, I might have inferred that zlib had not existed forever, as a consequence of something else having come before it.

Together with New York based company Trail of Bits and under the auspices of Mozilla’s Secure Open Source program, TrustInSoft has completed an automated audit of zlib. Subtle issues were found, and fixed by zlib co-author Mark Adler.

Dan Guido makes a great argument for taking advantage of all that software tools can offer for auditing software. One of the tools used in this audit is tis-interpreter, to identify C undefined behaviors along the execution traces generated by Trail of Bits’s CRS.

How do you report bugs that you alone can see?

Do you remember the TV show The Invaders? It was enormously popular in France, much more than in the US where it originated. It tells the story of one David Vincent, who alone sees that currently working C programs have a serious defect and are at risk of getting translated to flawed binaries by evil C compilers.

David Vincent

I misremember. In the series, David Vincent is alone aware that Earth is being infiltrated by extra-terrestrial beings of human appearance.

Strict aliasing as some know it

Many will know “strict aliasing” as a set of rules that make it forbidden to pass the same address twice to the following C function:

int f(int *p, float *q) {
  *p = 1;
  *q = 2.0;
  return *p;
}

Indeed, the compiler confidently generates assembly code that assumes that the arguments are not some int variable’s address &x and (float*)&x:

f:                                      # @f
        movl    $1, (%rdi)
        movl    $1073741824, (%rsi)     # imm = 0x40000000
        movl    $1, %eax
        retq

This is in accordance with the rules described in the C standard. The function f in itself is fine and can be used validly with arguments that do not alias, but passing arguments that make *q write to an int variable causes undefined behavior.

Strict aliasing as implemented in modern optimizing compilers

Not so many may know that compilers also assume that p and q do not alias in function g:

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

The function g always returns 1, according to Clang and GCC:

g:                                      # @g
        movl    $1, (%rdi)
        movl    $2, (%rsi)
        movl    $1, %eax
        retq

It is not clear that the standard allows them to do that(*). Not long ago, GCC optimized the following version, showing that any pointer computed as the address of a struct was assumed not to point to any other struct even when the lvalue used locally did not show any trace of this.

int g(struct object *p, struct thing *q) {
  int *int_addr1 = & p -> id;
  int *int_addr2 = & q -> id;
  *int_addr1 = 1;
  *int_addr2 = 2;
  return *int_addr1;
}

This function is also compiled as always returning 1:

g:
        movl    $1, (%rdi)
        movl    $1, %eax
        movl    $2, (%rsi)
        ret

(*) Here is an argument demonstrating that passing the same address for both argument of g does not break strict aliasing rules, and therefore implying that the compiler should produce code that works in this case. I understand the author of the comment to be close to the matter at hand, being the implementer of the first type-based alias analysis in GCC.

How to report bugs

So we have been working on a pretty neat thing, and it is now just working well enough to give its first results. The thing is a very early version of a static analyzer that detect violations of strict aliasing as described in the standard, such as the first example above, and as actually implemented by compilers, such as the subsequent examples.

The results are diagnostics of strict aliasing violations in widely used open-source software. How would you go about reporting these to the maintainers of the software?

It seems important to report them: type-based alias analyses have broken programs that their authors expected to work in the past, and in fact they break them again every time a Gentoo user recompiles the Linux kernel without the option -fno-strict-aliasing. It is possible that these optimizations will not become more programmer-hostile than they are now (fingers crossed), and one may think that if they did not break a particular program (that violates the rules), they never will, but compiler implementers are hard at work on inter-procedural and link-time optimizations, all of which will make information available that wasn’t before, and allow strict aliasing optimizations to fire where they didn’t.

In the particular case of the bug being reported, we are quite sure of the analyzer’s findings, but the analyzer is a bit too experimental to release yet. Not that this would necessarily help: these alien-detecting sunglasses may, to a proud developer, seem like sunglasses with aliens painted on. The analyzer is the first of its kind, too, so there is little hope of confirming the findings with another comparable alien detector.

Pointer conversion misuses of the kind of the first example are easy to recognize: a float is being converted to an int containing its representation, or vice-versa, and one only needs to convince the maintainer of the software that there are better, if more verbose, ways to do this. On the other hand, the assumption that structs can be used in the way shown in the example above is rooted even deeper in the C programming ways. Not only will it be harder to convince developers that what the code is doing is dangerous, but since any genericity at all in C is only obtained through pointer conversions, it is not easy to show which among them are invalid, without seeming to appeal to authority. Note how in the program below, the pointer conversion takes place where the function g is invoked, which isn’t the place where the strict aliasing violation takes place (it takes place inside g). Just looking at the function main, it is not obvious that the pointer conversion is one that leads to a strict aliasing violation, as opposed to one that is the only way not to implement twenty different qsort functions.

#include <stdio.h>

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

int main(void) {
  struct thing saucer;
  g((struct object*)&saucer, &saucer);
}

I have been reporting subtle errors, like use of uninitialized memory, by offering a patch that should not change the program’s behavior if the program wasn’t using uninitialized memory, and that makes it evident that it does. Here is one recent example. Perhaps the same approach can be used here. That is, for reporting a problem in the program above, one might offer the patch below, and hope that the maintainer is not already having a bad day before receiving the bug report.

$ diff -u orig.c instrumented.c
--- orig.c	2016-06-25 20:40:38.819893396 +0200
+++ instrumented.c	2016-06-25 20:35:25.253912642 +0200
@@ -6,7 +6,9 @@
 
 int g(struct object *p, struct thing *q) {
   p -> id = 1;
+  printf("writing 2  to  %p as thing\n", (void*) & q -> id);
   q -> id = 2;
+  printf("reading %d from %p as object\n", p -> id, (void*) & p -> id);
   return p -> id;
 }
 
$ ./a.out 
writing 2  to  0x7ffe3e7ebbd0 as thing
reading 2 from 0x7ffe3e7ebbd0 as object

Bear in mind that in real code, the two sites may be in different functions, or even different files. The patch above is pointing out the obvious only because the strict aliasing violation is obvious in this 6-line example.

Will this be convincing enough? I will let you know…

This blog post owes to the Cerberus project for pointing out the actual situation with modern C compilers and structs, John Regehr for writing a summary of all the ways strict aliasing-based optimizations break programs and proofreading this post, Loïc Runarvot for implementing the analysis prototype. Alexander Cherepanov suggested the use of the verb “to fire” to describe C compiler optimizations being triggered.

Trap representations and padding bits

The C programming language does not hide from you how the values you manipulate are represented. One consequence is that when padding happens, its presence may have observable effects in carelessly crafted programs. Padding is well-known to appear between members of a struct, and also possibly after the last member of a struct. The remaining space in a union when the active member is not the widest one is also considered padding. A C programmer that only cares for usual x86 platforms might be excused for thinking that, for them, this is it. As for trap representations, these may be believed to be reserved for weird hardware that use one’s complement or explicit parity bits.

Padding in structs and unions

Naïve attempts at making an x86-64 compiler take advantage of the unspecified nature of struct padding fail, as in example functions f, g and h, but the function i, provided by Alexander Cherepanov, shows that the padding of a struct x does not have to be copied along the rest of the struct after the two consecutive struct assignments y = x; z = y;:

int i(void)
{
  struct { char c; int i; } x, y, z;
  memset(&x, 1, sizeof x);
  memset(&z, 0, sizeof z);

  y = x;
  z = y;

  return *((char *)&z + 1);
}

The function i is optimized to return 0;, meaning that the entirety of the memory dedicated to x was not copied from x to z.

These occurrences of padding can be prevented when programming for a specific architecture, or a couple of architectures(*), with interstitial character array members. If, in a new project, you are implementing treatments so sophisticated that they require you to define a struct, C may be the wrong language to use in 2016. However, if you are going to do it anyway, you might want to insert these explicit interstitial members yourself: nothing the compiler can do with these unused bytes is worse than what the compiler can do with padding. Clang has a warning to help with this.

Consequences

Since padding does not have to be copied in a struct assignment, and since any struct member assignment can, according to the standard, set padding to unspecified values, memcmp is in general the wrong way to compare structs that have padding. Sending an entire struct wholesale to an interlocutor across the network (by passing its address to write) can leak information that was not intended to get out.

The rest of this post discusses padding and trap representations in scalar types, which we show that our exemplar C programmer for usual x86 platforms might encounter after all. Also, padding in scalar types cannot be eliminated with just additional members, so the problem, if rarely noticed, is in some ways more annoying than struct padding.

Padding in scalar types

It would be easy to assume that in such an ordinary architecture as x86, padding only happens because of structs and unions, as opposed to scalar types. The x86 architecture is not that ordinary: its first historical floating-point type occupied 80 bits; many compilation platforms still make that floating-point type available as long double, for some reason as a type of 12 bytes or 16 bytes of total (respectively in 32-bit and 64-bit mode), including respectively 2 and 6 bytes of padding. Padding may or may not be modified when the value is assigned, as shown in another example from Alexander Cherepanov:

int f(void)
{
  long double x, y;
  memset(&x, 0, sizeof x);
  memset(&y, -1, sizeof y);

  y = x;
  return ((unsigned char *)&y)[10];
}

The function f above is compiled as is if was return 255;, although the entire memory assigned to x was set to 0 before copying x to y with y = x;.

Trap representations

Trap representations are a particular case of padding bits. A symptom of padding bits is that a type t has fewer than 2CHAR_BIT ✕ sizeof(t). In the case of trap representations, some of the bit patterns are considered erroneous for type t. Accessing such erroneous representations with an lvalue of type t is undefined behavior.

The C11 standard latest draft contains this footnote(**):

53) Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types. All other combinations of padding bits are alternative object representations of the value specified by the value bits.

This footnote is in the context of the representation of integer types, which are made of value bits, padding bits, and in the case of signed integers, one sign bit (6.2.6.2).

“I have no access to no parity bits,” the ordinary programmer might think. “Even if redundant internal representations are part of the implementation of my high-performance computer, the interface presented to me is that of ordinary memory, where all bits are value bits (with sometimes a sign bit).”

In fact, as implemented by GCC and Clang, the _Bool type has two values and 254 trap representations. This is visible on the following example:

int f(_Bool *b) {
  if (*b)
    return 1;
  else
    return 0;
}

The function f in this example, as compiled, returns 123 when passed the address of a byte that contains 123. This value does not correspond to any of the two possible execution paths of the function. Undefined behavior is the only excuse the compiler has for generating code with this behavior, and this means that GCC and Clang, for the sake of this optimization, choose to interpret bytes containing a value other than 0 or 1 as trap representations for the type _Bool.


This post owes to Alexander Cherepanov’s examples, John Regehr’s encouragements and Miod Vallat’s remarks.

Footnotes:

(*) I would have recommended to use fixed-width integer types from stdint.h, such as uint32_t, to make the layout fixed, but that does not solve the problem with floating-point types or pointer types.

(**) The latest C11 draft contains two very similar footnotes 53 and 54 for which it is not clear that both were intended to be present in the final revision.

A non-exhaustive list of ways C compilers break for objects larger than PTRDIFF_MAX bytes

One well-documented way in which a program may break in presence of objects of more than PTRDIFF_MAX bytes is by doing a pointer subtraction that results in an overflow:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  char *p = malloc(0x81000000);
  if (!p) abort();
  char *q = p + 0x81000000;
  printf("q - p: %td\n", q - p);
}

On the Ubuntu Linux version that I have conveniently at hand, this program, compiled to 32-bit code, produces:

$ gcc -m32 t.c && ./a.out 
q - p: -2130706432

Strange results are acceptable because the overflow in pointer subtractions is undefined behavior:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.

Do not stop reading if you already know all this! Things are going to get interesting in a minute!

Incidentally, when I tried this program on OS X 10.9 with clang -m32 t.c && ./a.out, I obtained a warning that malloc had failed, and my program aborted without doing the dangerous q - p pointer subtraction. In effect, a malloc function that rejects allocations of more than half the address space goes a long way towards preventing surprising behaviors. This is interesting because at least as late as OS X 10.5, I routinely allocated 2.5GiB from 32-bit processes. On the OS X platform, the decision was courageously made to break some 32-bit applications that used to work in order to prevent some vaguely worrying problems. But then again, preserving application compatibility was never the foremost preoccupation of the Mac platform.

Even if you already knew all this, keep reading.

Subtraction of pointers to wider types

On a 32-bit platform

GCC and Clang’s behavior on the following example is not excused by clause 6.5.6p9:

#include <stdlib.h>

int main(void) {  
  short *p, *q;

  p = malloc(0x70000000 * sizeof *p); // note: product does not overflow
  q = p + 0x70000000; // note: stays in-bounds
  return q - p; // note: does not overflow ptrdiff_t
}

This program prints -268435456. Wait, what?

The example can be made more striking by making the expected result of the pointer subtraction smaller:

#include <stdlib.h>

typedef char (*arrayptr)[0x70000000];

int main(void) {  
  arrayptr p, q;

  p = malloc(2 * sizeof *p);
  q = p + 2;
  return q - p;
}

Both Clang and GCC make this program return 0 when it’s very obvious that the correct result is 2. As a psychological experiment, showing these related examples to C programmers, the average subject might reluctantly let themselves be convinced that -268435456 is 0x70000000, but they are quite certain that 2 should not be 0, and express more surprise for the second example.

The answer to the riddle—but not really an excuse in itself for Clang and GCC’s behavior—is contained in the following decomposition of pointer subtraction into elementary steps. The “sarl” instruction implements a signed operation that does not produce the correct result when the value in %eax should be interpreted as a number above 231. When the element size is not a power of two, the compiler uses a division instruction but the problem remains the same: the sequence generated by the compiler implement a signed division that doesn’t work above 231.

Partially excusing GCC’s behavior, it is somewhat documented (in replies to bug reports) that 32-bit GCC does not work with malloc functions that succeed for requests of more than 2GiB of memory. As long as the malloc calls fail in the examples above, GCC is not doing anything wrong, because the program invokes undefined behavior as soon as p + … is computed. (It is illegal to do pointer arithmetic from a null pointer.)

And indeed, if we re-introduce the malloc success check that I surreptitiously removed from the initial example, GCC does the right thing. “The right thing” being making a call to malloc, which, as semi-documented, should fail for the requested size in order to be compatible with GCC.
It’s a shame that 32-bit Glibc’s malloc does not systematically fail for such requests, but that is not GCC’s problem. It is the problem of the programmer who links together code generated from GCC and code from Glibc (note that Glibc is not intended to be compiled with a compiler other than GCC, which is unfortunate considering).

Clang, the most honey badger-like of compilers, don’t give a shit.

64-bit platforms

64-bit solves all these problems, because a malloc call requesting half the address space always fails, right?
It’s a shame that there was a difficult trade-off on 32-bit platforms, where the problem was fully realized only after some applications had gotten used to requesting 2.5GiB and getting away with it, but as even the smallest computers are moving to 64-bit, we can consider this problem solved by Moore’s law? 263-1 bytes ought to be enough for anybody?

Not really. The malloc call in the program below succeeds (the program would return 0 if the malloc invocation failed), despite Clang having the same half-the-address-space limitation for 64-bit as for 32-bit.

#include <stdlib.h>

typedef char (*arrayptr)[0x700000000000000];

int main(void) {  
  arrayptr p, q;

  p = malloc(32 * sizeof *p);
  if (!p) return 0;
  q = p + 32;
  return q - p;
}

The program not returning 0 indicates that malloc succeeded. The program returning a nonzero value other than 32 indicates that Clang is confused.

Comparisons of pointers

One might consider it an acceptable trade-off to let malloc allocate 2.5GiB on 32-bit platforms, if the developer knows that they never subsequently subtract pointers to the allocated zone. Not as char* and not as pointers to another type, since compilers are broken for these subtractions too, even when C11’s 6.5.6:9 clause perfectly defines the result.

Unfortunately, an optimizing compiler’s code appears to be littered with optimizations that assume object sizes are less than PTRDIFF_MAX. As an example, we all know that X86 Clang and GCC implement pointer comparison as an assembly unsigned comparison (“cmpl” is an agnostic comparison instruction, but here it is followed by “cmoval”, where the “a”, for “above”, indicates conditional move based on an unsigned comparison). This means that it is allowed to compare pointers inside a putative 2.5GiB array on a 32-bit platform, right?

Not all pointer comparisons.

Because of this optimization.

Note the “g” in “cmovgl”. This “g” stands for “greater”: a mnemotechnic(?) indicator that a signed comparison decides which string is returned. Determining what happens with offs1=0 and offs2=0x81000000, two values representable in a 32-bit size_t that, with a lenient malloc, can also be valid offsets into a same array, is left as an exercise for the reader.

Conclusion

It is impossible to tell when the compiler will apply this or any other optimization that assumes that a pointer difference is always representable as a ptrdiff_t. Therefore, the only safe path on 32-bit platforms is to refrain from allocating more than 2GiB in a single malloc call; it might be a good idea if Glibc’s malloc was changed to implement this limitation. Compilers could do much, much better than they currently do. Nothing would prevent malloc(0x81000000) to cause a warning (I suggest “malloc call will always return NULL” or something equally direct). Beyond the friendliness of warnings, Clang’s elimination of malloc calls that, because of Clang limits, cannot succeed, as calls that cannot fail indicate a disturbing misunderstanding of classical logic.

Acknowledgements

Alexander Cherepanov and John Regehr provided most of the examples in this post, with additional remarks coming from Daniel Micay and Rich Felker. Matt Godbolt made the absolute best platform on which to investigate these examples and show them to the world.

Out-of-bounds pointers: a common pattern and how to avoid it

A quick tis-interpreter update

At long last, tis-interpreter was released as open-source last month. It is somewhat difficult to install, and as a remedy I will regularly prepare binary snapshots for Linux (and maybe OpenBSD, OS X). At the time of this writing, the first publicly available binary snapshot can be downloaded from the bottom of the tis-interpreter homepage.

The rest of this post is about extremely recent evolutions in tis-interpreter.

A newfound strict stance on pointer arithmetic

As tis-interpreter was getting easier to use and was getting used, the feedback from applying it led to what may seem like drastic changes. Here is one example.

Among other software components, John Regehr applied tis-interpreter to Google’s implementation of their WebP image format, and then to SQLite. The former is only a compressed format for static images, but the latter is a large library with an even larger testsuite. Both were worth looking at, but the latter was a significant effort.

In libwebp, among other problems, John reported a subtle bug in a pointer comparison. libwebp would check whether the size declared for a sub-object was consistent by comparing what would be the address to continue parsing at (after the sub-object, if it had the size it said it had) to the end pointer. This idiom is dangerous. tis-interpreter warned for the dangerous comparison, I wrote a short essay, John reported the bug, and it was the end of that story.

On the SQLite front, progress was slow. The very same “dangerous pointer comparison” warning was being emitted because of one-indexed arrays implemented as int *one_indexed = ptr - 1;, as explained in John’s post. John complained that the “pointer comparison” warning was emitted on an eventual one_indexed == NULL, arbitrarily far from the allocation of the one-indexed array, which was the real culprit, and in any case, the line he wished to change so as to suppress all warnings coming from that pointer in one swoop. After multiple requests, I finally implemented detection of invalid pointer arithmetic in tis-interpreter, so that the warning would be on ptr - 1 instead.

A surprise lurking in already visited code

This week, my colleague Anne Pacalet found herself going back to analyzing libwebp. She naturally started by using the latest version of tis-interpreter, with its all-new early warnings about invalid pointer arithmetic. She discovered that apart from the reported, since long fixed end_ptr idiom where a pointer was compared after going out of bounds, other invalid pointers are computed inside libwebp. This happens when decompressing examples provided as testcases by Google itself. The line sporting the tis-interpreter warning is this one, and the surrounding code looks like this:

  for (j = 0; j < height; ++j) {
    for (i = 0; i < width; ++i) {
      const uint32_t alpha_value = alpha[i];
      dst[4 * i] = alpha_value;
      alpha_mask &= alpha_value;
    }
    alpha += alpha_stride;
    dst += dst_stride; // goes out of bounds on the last iteration
  }

Do not confuse this with a buffer overflow or a similar immediate problem! The pointer dst goes out of bounds when it is computed at the end of the last iteration, and it is never used after that. Besides, it may look like this last value of dst is a one-past-the-end pointer as allowed by the C standard, but that is only true if dst started from 0.

The function in question is called from EmitAlphaRGB (through the WebPDispatchAlpha function pointer). At line 183, dst may have been incremented by 3 before being passed as argument, and in this case, the last value of dst is way out of bounds (it is four-past-the-end, but there is no “four-past-the-end” exception for pointer arithmetic in the C standard).

Fixing the problem

It may help both to communicate the problem and to convince the maintainers to remove it if it is reported with an example of change that would keep the computation the same and remove the undefined behavior. We looked at different variations. I thought I had a solution that wasn't too ugly when I arrived at the following change:

-  for (j = 0; j < height; ++j) {
+  j = 0;
+  while (1) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
       dst[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
+    if (j == height - 1) break;
+    ++j;
     alpha += alpha_stride;
     dst += dst_stride;
   }

The above change applied, the exit test is now after dst has been used in the iteration but before it is incremented, so that the ultimate undefined-behavior-invoking incrementation does not take place. The generated assembly for the code above seems fine, but there is a subtle difference: the original function never entered the loop if height was 0, whereas this one always does at least one iteration. This is something that would have to be discussed with the developers, but the problem does not seem worth a discussion. We could pre-emptively guard the loop with if (height>0), but then the source code gets uglier, at the very least, and the generated assembly code might be longer than what was generated for the original function.

Jamey Sharp's solution

Jamey Sharp provided a brilliant insight. His remark was: “what about just doing the obvious multiply and letting strength reduction turn it into this code? Easier to maintain and correct”

Translated to code, Jamey's remark amounts to this change:

   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      dst[j * dst_stride + 4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
   }

Indeed, we are removing more code than we are adding. It solves the undefined behavior problem because now, in the source code, j * dst_stride is only added to dst at the time of accessing the memory location (in the source code, dst + height * dst_stride is never computed). The beauty of this solution is that an optimizing compiler generates exactly the same assembly code, to the comma, for this version as it did for the original version:

$ diff -u old.s jamey.s
$

In other words, we stick to legal (and concise and readable) operations in the source code. The compiler does the tedious work of knowing that on the target platform, pointers are just integers and that a multiplication by the index inside a loop can be transformed into an addition. This is the correct division of labor between developer and computer.

Miod Vallat's solution

Miod Vallat pointed out that the solution above required you to trust that the compiler would always succeed at the strength-reduction optimization. Miod's solution instead expresses in the source code the fact that pointers are just integers. It should work fine for all the ordinary platforms we are used to, because of footnote 67 in the C11 standard.

   int i, j;
+  uintptr_t dst_int = (uintptr_t) dst;
 
   for (j = 0; j < height; ++j) {
     for (i = 0; i < width; ++i) {
       const uint32_t alpha_value = alpha[i];
-      dst[4 * i] = alpha_value;
+      ((uint8_t*)dst_int)[4 * i] = alpha_value;
       alpha_mask &= alpha_value;
     }
     alpha += alpha_stride;
-    dst += dst_stride;
+    dst_int += dst_stride;
   }

With Miod's solution too, the compiler (GCC 5.2) produces the exact same assembly as before:

$ diff -u old.s miod.s
$

Conclusion

The new “bug” discussed here is, as far as I know, harmless with current compilers. In this particular instance, by removing the undefined pointer arithmetic, we are making libwebp safer from agressive compiler optimizations for the next ten or fifteen years, but we are not solving any immediate problem. It is not always that clear-cut. The pointer comparison problem that had been reported earlier in libwebp, with current compilers and execution platforms, would never have noticeable consequences on the ordinary computers security researchers use to look for problems, but it could have consequences in a peculiar configuration.

The option remains to disable the warnings on the creation of invalid pointers: if that is what you want, use:

tis-interpreter.sh -no-val-warn-pointer-arithmetic-out-of-bounds …