16.07.2015

The libc cross-testing project

Cross-testing libc and bugs found

Cross testing libc

While discussing with a number of other people, the idea arose of the differential testing of libc implementations.

There are many of these (musl libc is my favorite; the implementation of the libc is one of the points on which the *BSD Unices differ, so that they count for at least three; and of course there is Glibc). Some of them have tests, some of them compare the results of their tests to the results obtained for the same tests with Glibc, but as far as I know no-one is running every libc’s testsuite on every libc. This is the first task of the libc cross-testing project.

Another idea is that a specialized fuzzer that produces defined inputs to libc functions is worth writing, because we can use all libc implementation as references for each other and investigate each time they do not all agree.

One immediate task would be to identify which standard library functions should be compared in this way and which libcs and testsuites already available. All tests need not be only about valid inputs. For instance, the following printf invocation is nonsense:

printf("%.9*s", 3, "abcd");

There should either be a number or an asterisk for the format width, but not both.
Clang and GCC each warn that something is wrong in this format string:

$ clang -Wall t.c
t.c:4:14: warning: invalid conversion specifier '*' [-Wformat-invalid-specifier]
…
$ gcc -Wall t.c
t.c: In function 'main':
t.c:4:3: warning: unknown conversion type character '*' in format [-Wformat=]
…

Congratulations for passing the test although you are not libcs, Open-Source C compilers.

Apart from libc implementation projects and, in the example above, C compilers, one related project is the Open POSIX Test Suite.

tis-interpreter contains a partial libc (implemented by myself in OCaml with contributions being made by Arduino Cascella, Eunice Martins and Laureline Patoz) that could be tested with all these techniques. Separately from that, those other standard functions implementations that are written in pure C, being applied to these tests, could be executed inside tis-interpreter to detect undefined behaviors. So that makes me twice interested. If you have your own reasons to use or contribute to this project, you should drop me a note at the e-mail address cuoq at trust-in-soft.com.

Bugs/pitfalls discovered so far

left shift of a negative number in Musl
bug in OCaml’s printf function
The C99 and C11 standards say that if the number being converted by the strto*l functions is out of range for the destination type, this should be reported as an error, but at least the Glibc version and the BSD versions of strtoul and strtoull accept some negative numbers. In fact the standard is being ambiguous here, as was already discussed on comp.std.c in 1992.
analysis of a C program that reads each of the characters of a string literal in Frama-C’s value analysis or in tis-interpreter would lead to quadratic performance in the length of the string literal, where linearish performance could be expected
At the time of this writing, the Linux documentation for strtoull says “Since strtoul() can legitimately return 0 or ULONG_MAX (ULLONG_MAX for strtoull()) on both success and failure, the calling program should set errno to 0 before the call, and then determine if an error occurred by checking whether errno has a nonzero value after the call” but errno is not set for the kind of failure that is characterized by the condition *endptr == nptr. To match the implementation, the documentation should recommend both to set and check errno on the one hand and on the other hand to check whether *endptr == nptr.
Brian Mastenbrook used a combination of KLEE, UBSan and tis-interpreter to identify and report two signed overflows in Musl’s implementation of its internal function __secs_to_tm, used in API functions such as gmtime.