26.06.2016

How do you report bugs that you alone can see?

Strict aliasing in C

Graphic: Technical article

Do you remember the TV show The Invaders? It was enormously popular in France, much more than in the US where it originated. It tells the story of one David Vincent, who alone sees that currently working C programs have a serious defect and are at risk of getting translated to flawed binaries by evil C compilers.

David Vincent

I misremember. In the series, David Vincent is alone aware that Earth is being infiltrated by extra-terrestrial beings of human appearance.

Strict aliasing as some know it

Many will know “strict aliasing” as a set of rules that make it forbidden to pass the same address twice to the following C function:

int f(int *p, float *q) {
  *p = 1;
  *q = 2.0;
  return *p;
}

Indeed, the compiler confidently generates assembly code that assumes that the arguments are not some int variable’s address &x and (float*)&x:

f:                                      # @f
        movl    $1, (%rdi)
        movl    $1073741824, (%rsi)     # imm = 0x40000000
        movl    $1, %eax
        retq

This is in accordance with the rules described in the C standard. The function f in itself is fine and can be used validly with arguments that do not alias, but passing arguments that make *q write to an int variable causes undefined behavior.

Strict aliasing as implemented in modern optimizing compilers

Not so many may know that compilers also assume that p and q do not alias in function g:

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

The function g always returns 1, according to Clang and GCC:

g:                                      # @g
        movl    $1, (%rdi)
        movl    $2, (%rsi)
        movl    $1, %eax
        retq

It is not clear that the standard allows them to do that(*). Not long ago, GCC optimized the following version, showing that any pointer computed as the address of a struct was assumed not to point to any other struct even when the lvalue used locally did not show any trace of this.

int g(struct object *p, struct thing *q) {
  int *int_addr1 = & p -> id;
  int *int_addr2 = & q -> id;
  *int_addr1 = 1;
  *int_addr2 = 2;
  return *int_addr1;
}

This function is also compiled as always returning 1:

g:
        movl    $1, (%rdi)
        movl    $1, %eax
        movl    $2, (%rsi)
        ret

(*) Here is an argument demonstrating that passing the same address for both argument of g does not break strict aliasing rules, and therefore implying that the compiler should produce code that works in this case. I understand the author of the comment to be close to the matter at hand, being the implementer of the first type-based alias analysis in GCC.

How to report bugs

So we have been working on a pretty neat thing, and it is now just working well enough to give its first results. The thing is a very early version of a static analyzer that detect violations of strict aliasing as described in the standard, such as the first example above, and as actually implemented by compilers, such as the subsequent examples.

The results are diagnostics of strict aliasing violations in widely used open-source software. How would you go about reporting these to the maintainers of the software?

It seems important to report them: type-based alias analyses have broken programs that their authors expected to work in the past, and in fact they break them again every time a Gentoo user recompiles the Linux kernel without the option -fno-strict-aliasing. It is possible that these optimizations will not become more programmer-hostile than they are now (fingers crossed), and one may think that if they did not break a particular program (that violates the rules), they never will, but compiler implementers are hard at work on inter-procedural and link-time optimizations, all of which will make information available that wasn’t before, and allow strict aliasing optimizations to fire where they didn’t.

In the particular case of the bug being reported, we are quite sure of the analyzer’s findings, but the analyzer is a bit too experimental to release yet. Not that this would necessarily help: these alien-detecting sunglasses may, to a proud developer, seem like sunglasses with aliens painted on. The analyzer is the first of its kind, too, so there is little hope of confirming the findings with another comparable alien detector.

Pointer conversion misuses of the kind of the first example are easy to recognize: a float is being converted to an int containing its representation, or vice-versa, and one only needs to convince the maintainer of the software that there are better, if more verbose, ways to do this. On the other hand, the assumption that structs can be used in the way shown in the example above is rooted even deeper in the C programming ways. Not only will it be harder to convince developers that what the code is doing is dangerous, but since any genericity at all in C is only obtained through pointer conversions, it is not easy to show which among them are invalid, without seeming to appeal to authority. Note how in the program below, the pointer conversion takes place where the function g is invoked, which isn’t the place where the strict aliasing violation takes place (it takes place inside g). Just looking at the function main, it is not obvious that the pointer conversion is one that leads to a strict aliasing violation, as opposed to one that is the only way not to implement twenty different qsort functions.

#include 

struct object { int id; };

struct thing { int id; int thing_stuff; };

int g(struct object *p, struct thing *q) {
  p -> id = 1;
  q -> id = 2;
  return p -> id;
}

int main(void) {
  struct thing saucer;
  g((struct object*)&saucer, &saucer);
}

I have been reporting subtle errors, like use of uninitialized memory, by offering a patch that should not change the program’s behavior if the program wasn’t using uninitialized memory, and that makes it evident that it does. Here is one recent example. Perhaps the same approach can be used here. That is, for reporting a problem in the program above, one might offer the patch below, and hope that the maintainer is not already having a bad day before receiving the bug report.

$ diff -u orig.c instrumented.c
--- orig.c	2016-06-25 20:40:38.819893396 +0200
+++ instrumented.c	2016-06-25 20:35:25.253912642 +0200
@@ -6,7 +6,9 @@
 
 int g(struct object *p, struct thing *q) {
   p -> id = 1;
+  printf("writing 2  to  %p as thing\n", (void*) & q -> id);
   q -> id = 2;
+  printf("reading %d from %p as object\n", p -> id, (void*) & p -> id);
   return p -> id;
 }
 
$ ./a.out 
writing 2  to  0x7ffe3e7ebbd0 as thing
reading 2 from 0x7ffe3e7ebbd0 as object

Bear in mind that in real code, the two sites may be in different functions, or even different files. The patch above is pointing out the obvious only because the strict aliasing violation is obvious in this 6-line example.

Will this be convincing enough? I will let you know…

This blog post owes to the Cerberus project for pointing out the actual situation with modern C compilers and structs, John Regehr for writing a summary of all the ways strict aliasing-based optimizations break programs and proofreading this post, Loïc Runarvot for implementing the analysis prototype. Alexander Cherepanov suggested the use of the verb “to fire” to describe C compiler optimizations being triggered.

Newsletter

Related articles

April 17, 2024