tis-interpreter detects subtle bugs in C programs that may not have eye-visible effects when executing the same programs compiled in the traditional way. Some of the bugs that are discovered lead to security vulnerabilities. Fortunately, most don’t.
tis-interpreter works by interpreting C programs statement by statement from beginning to end, verifying at each statement whether the program can invoke undefined behavior. This makes it comparable to Valgrind and C compiler sanitizers (UBSan, ASan, …). The recommended use is to apply tis-interpreter to existing tests for security-sensitive C code in which a bug could have dramatic consequences. tis-interpreter can detect violations of the C standard even when applied to regression tests that have never revealed any problem.
Comparison with other tools
Detecting run-time undefined behavior is a matter of trade-offs. tis-interpreter emphasizes the detection of a large number of families of undefined behavior (including arithmetic, incorrect uses of pointers, and use of uninitialized memory), in an exhaustive fashion, for the significant subset of C that it handles. It also warns when the program’s execution goes out of the supported subset.
In order to be able to monitor as many programs as possible, other tools (such as Valgrind) work at the binary level. The obvious drawback of this approach is that source-level bugs that have been made unrecognizable in the translation from source to binary cannot be detected by this approach. “Unrecognizable” does not mean that such a bug is harmless: another C compiler, or the next version of the same C compiler, might choose to compile it differently. Just because the bug does not have immediate consequences doesn’t mean that it doesn’t need to be fixed; that only means that the combination of your current compiler + Valgrind leaves you unequipped for finding it.
ASan and UBSan are two detectors of undefined behavior that work at the source level. Their underlying idea is to instrument the code during compilation. However, the instrumentation is limited to comparatively inexpensive checks, to keep the overhead acceptable for large projects. Also, as needed for large projects, the instrumentation is limited to that that can work in presence of external libraries that weren’t compiled with instrumentation.
tis-analyzer finds bugs in code that has already been executed under ASan, UBSan or Valgrind because it explores a different choice of trade-offs: it does not limit itself to easily detected undefined behaviors, and it only works on projects where the entire source code is provided. Incidentally, since the C program is interpreted, it is possible to observe the behavior the program would have if compiled and executed on a different architecture than the host on which tis-intepreter is running. A C program intended for an IL32P64 64-bit platform can be tested on an ILP32 32-bit platform, and the behavior the code would have on a big-endian platform can be emulated on a little-endian platform.
Three examples of undefined behaviors found by tis-interpreter and typically not found by other tools are: using a dangling pointer for pointer arithmetics or in a comparison (for performance reasons, other tools usually only check that a dangling pointer is not dereferenced, but any use of a dangling pointer is undefined behavior), comparing addresses from different memory blocks with
<=, mismatch between the format and the argument of
printf when the format is dynamic (compilers only check that the format matches the arguments when the format is static).
What's new in 2017: strict aliasing conformity
In 2017, tis-interpreter is getting an (optional) analysis for detection strict aliasing violations in C programs.
The analysis will be optional: if you are already compiling with GCC/Clang's
-fno-strict-aliasing, you do not need to worry about these violations. This option instructs the compiler not to apply any optimization that would only be valid if the program respected the strict aliasing rules. On the other hand, if tis-interpreter's strict aliasing analysis finds strict aliasing violations in the legacy software component that you maintain, and you are not already using
-fno-strict-aliasing, the easiest fix is to add this compile-time option and/or document that the code contains these violations. This is what the respective maintainers of expat and zlib did.
We wrote and presented an article in VMCAI to explain what is difficult about this problem.
The goal is to execute code from the musl C standard library, and from BusyBox, in order to detect issues that might be present in these components, including possible strict aliasing issues. Of course, all the other problems that tis-interpreter already detects will be detected on the way, too. In the case of musl, this is difficult because a C standard library tends to use all the features of the C language. The support for some of these features (
long double) will take some time to add, and for now we are working around the problems by editing out these aspects out of the musl source code.
A version of tis-interpreter containing this analysis is available on request for beta-testing.
Libraries tis-interpreter has been applied to
Since the selling point of tis-interpreter is that it finds more issues than already-used alternatives, at the cost of speed and generality, the question is whether tis-interpreter is general enough to be useful. This section gathers examples of security-critical C libraries that can be executed in tis-interpreter.
Most of OpenSSL's
test directory can be executed inside tis-interpreter, with small adjustments. Additional tests of the
openssl command can also be executed. Some bugs that have been found are listed here, here, here and here.
Since tis-interpreter is mostly automatic after the set-up phase, we have run it on LibreSSL and found minor issues in either new code imported after the fork or in code that had become hidden under
#ifdef PURIFY-type guards in OpenSSL since the fork. This “fiddly buffer overrun” had been hidden behind a
#if defined(PEDANTIC) in OpenSSL and was actually found in LibreSSL before being recognized and reported in OpenSSL.
On the basis of previous experiments, tests exercising iconv implementations, cryptographic primitives, HTML parsers(1, 2, 3), text, image, sound and video compression or decompression routines implemented in C should all be within the scope of tis-interpreter. Thanks to the ready-made test suites provided via American Fuzzy Lop's webpage, we have tested libjpeg-6b and libjpeg-9a, libpng, webp, and SQLite.
Will it work on my code?
The following functions are handled natively by tis-interpreter: memset bzero memcpy memmove memcmp memchr strchr strlen strnlen strcmp malloc calloc realloc free printf sprintf asprintf …
This means that subtle bugs arising from misuses of these functions are detected even if they do not cause a crash in Valgrind.
Support for more standard functions is added as calls to them are found inside the sort of C code that it is valuable to analyze. In a pinch, missing functions can be written in C, although this can prevent some errors from being detected (providing a typical C implementation for
memcpy does not tell tis-interpreter that the arguments are supposed to be pointers to objects and that the source and destination memory zones must not overlap).
tis-interpreter comes with its own set of headers. Some common C compiler extensions are supported.
tis-interpreter is available as open-source on GitHub. Current binary snapshots are available: