tis-interpreter

tis-interpreter detects subtle bugs in C programs that may not have eye-visible effects when executing the same programs compiled in the traditional way. Some of the bugs that are discovered lead to security vulnerabilities. Fortunately, most don’t.

tis-interpreter works by interpreting C programs statement by statement from beginning to end, verifying at each statement whether the program can invoke undefined behavior. This makes it comparable to Valgrind and C compiler sanitizers (UBSan, ASan, …). The recommended use is to apply tis-interpreter to existing tests for security-sensitive C code in which a bug could have dramatic consequences. tis-interpreter can detect violations of the C standard even when applied to regression tests that have never revealed any problem.

Comparison with other tools

Detecting run-time undefined behavior is a matter of trade-offs. tis-interpreter emphasizes the detection of a large number of families of undefined behavior (including arithmetic, incorrect uses of pointers, and use of uninitialized memory), in an exhaustive fashion, for the significant subset of C that it handles. It also warns when the program’s execution goes out of the supported subset.

In order to be able to monitor as many programs as possible, other tools (such as Valgrind) work at the binary level. The obvious drawback of this approach is that source-level bugs that have been made unrecognizable in the translation from source to binary cannot be detected by this approach. “Unrecognizable” does not mean that such a bug is harmless: another C compiler, or the next version of the same C compiler, might choose to compile it differently. Just because the bug does not have immediate consequences doesn’t mean that it doesn’t need to be fixed; that only means that the combination of your current compiler + Valgrind leaves you unequipped for finding it.

ASan and UBSan are two detectors of undefined behavior that work at the source level. Their underlying idea is to instrument the code during compilation. However, the instrumentation is limited to comparatively inexpensive checks, to keep the overhead acceptable for large projects. Also, as needed for large projects, the instrumentation is limited to that that can work in presence of external libraries that weren’t compiled with instrumentation.

tis-analyzer finds bugs in code that has already been executed under ASan, UBSan or Valgrind because it explores a different choice of trade-offs: it does not limit itself to easily detected undefined behaviors, and it only works on projects where the entire source code is provided. Incidentally, since the C program is interpreted, it is possible to observe the behavior the program would have if compiled and executed on a different architecture than the host on which tis-intepreter is running. A C program intended for an IL32P64 64-bit platform can be tested on an ILP32 32-bit platform, and the behavior the code would have on a big-endian platform can be emulated on a little-endian platform.

Three examples of undefined behaviors found by tis-interpreter and typically not found by other tools are: using a dangling pointer for pointer arithmetics or in a comparison (for performance reasons, other tools usually only check that a dangling pointer is not dereferenced, but any use of a dangling pointer is undefined behavior), comparing addresses from different memory blocks with <=, mismatch between the format and the argument of printf when the format is dynamic (compilers only check that the format matches the arguments when the format is static).

Libraries tis-interpreter has been applied to

Since the selling point of tis-interpreter is that it finds more issues than already-used alternatives, at the cost of speed and generality, the question is whether tis-interpreter is general enough to be useful. This section gathers examples of security-critical C libraries that can be executed in tis-interpreter.

Most of OpenSSL's test directory can be executed inside tis-interpreter, with small adjustments. Additional tests of the openssl command can also be executed. Some bugs that have been found are listed here, here, here and here.

Since tis-interpreter is mostly automatic after the set-up phase, we have run it on LibreSSL and found minor issues in either new code imported after the fork or in code that had become hidden under #ifdef PURIFY-type guards in OpenSSL since the fork. This “fiddly buffer overrun” had been hidden behind a #if defined(PEDANTIC) in OpenSSL and was actually found in LibreSSL before being recognized and reported in OpenSSL.

Most of the tests of Amazon's s2n TLS implementation can be executed inside tis-interpreter too. The minor issues found are described here, here, here and here.

On the basis of previous experiments, tests exercising iconv implementations, cryptographic primitives, HTML parsers(1, 2, 3), text, image, sound and video compression or decompression routines implemented in C should all be within the scope of tis-interpreter. Thanks to the ready-made test suites provided via American Fuzzy Lop's webpage, we have tested libjpeg-6b and libjpeg-9a, libpng, webp, and SQLite.

Will it work on my code?

The following functions are handled natively by tis-interpreter: memset bzero memcpy memmove memcmp memchr strchr strlen strnlen strcmp malloc calloc realloc free printf sprintf asprintf …
This means that subtle bugs arising from misuses of these functions are detected even if they do not cause a crash in Valgrind.

Support for more standard functions is added as calls to them are found inside the sort of C code that it is valuable to analyze. In a pinch, missing functions can be written in C, although this can prevent some errors from being detected (providing a typical C implementation for memcpy does not tell tis-interpreter that the arguments are supposed to be pointers to objects and that the source and destination memory zones must not overlap).

tis-interpreter comes with its own set of headers. Some common C compiler extensions are supported.

Availability

tis-interpreter is available as open-source on GitHub. Current binary snapshots are available:

2016-05 x86-64 Linux binary snapshot of commit 275f0a4

Older snapshots

2016-04 x86-64 Linux binary snapshot of commit 43713db

See also

in the press