Printing a null pointer with %s is undefined behavior
July 9, 2019
Avoiding undefined behaviors in C language
Introduction
The C standard makes it undefined to pass anything other than a pointer to a null-terminated string as second argument to printf("%s",. However, most libcs kindly print the string (null) if a null pointer is passed as argument, and some developers have made it a habit, when generating debug messages, to pass pointers that can be either null pointers or point to a valid string directly to printf("%s",. These developers are relying on the kindness of the underlying libc implementation. “It’s undefined because it could behave differently on another platform”, these developers think, “but I know how printf works on my platform.”
Digging into stdio implementations
Most calls to printf worldwide end up interpreted by one of four independent implementations:
- Glibc provides its own printf implementation. This is the printf that ends up being called, say, if you compile a C program on a mainstream GNU/Linux distribution such as Ubuntu.
- On many Unices, the implementation of printf comes from “UCB stdio” and was originally written by Steve Summit, but later heavily modified by Keith Bostic who gave it its current form.
- On other Unices such as Solaris, the implementation is descended from the AT&T codebase.
- Musl is a libc implementation from scratch, initiated and maintained by Rich Felker.
In their most recent incarnations, all four of these implementations are kind with respect to null pointers for %s. Consider the program below:
#include int main(void) { printf("%s|%.3s|\n", (char*)0, (char*)0); }
This program, when linked with Glibc, kindly prints (null)||, whereas the other three aforementioned printf implementations kindly print (null)|(nu|. The reason for the disparity is that Glibc’s printf tries to be sophisticated about the substitution, whereas musl, UCB stdio, and Solaris are straightforward about it. You may have noticed in the previously linked commit from 1988 by Keith Bostic that printf was already handling null pointers before the commit, but that it used to print them as (NULL) or (NU.
What does this C program do?
#include int main(int argc, char*argv[]) { switch (argc) { case 0: return 0; case 1: printf("argc is 1, expect to see '(null)'\n"); default: printf("%s\n", argv[1]); } }
Given the reassuring results of all the digging up in the first half of this post, one may expect, when compiling and executing the above program without commandline arguments, that it prints:
argc is 1, expect to see '(null)' (null)
This is not what happens for me when compiling and linking with Glibc on my faithful Ubuntu GNU/Linux distribution:
$ cat > printarg.c #include int main(int argc, char*argv[]) { switch (argc) { case 0: return 0; case 1: printf("argc is 1, expect to see '(null)'\n"); default: printf("%s\n", argv[1]); } } $ gcc printarg.c && ./a.out argc is 1, expect to see '(null)' Segmentation fault
You should be able to reproduce this at home, and this Compiler Explorer link and snippet contains some vital clues as to how this happens:
main: # @main testl %edi, %edi je .LBB0_4 pushq %rbx movq %rsi, %rbx cmpl $1, %edi jne .LBB0_3 movl $.Lstr, %edi callq puts .LBB0_3: movq 8(%rbx), %rdi callq puts popq %rbx .LBB0_4: xorl %eax, %eax retq .Lstr: .asciz "argc is 1, expect to see '(null)'"
Because, in the small example in this section, the printf calls match a pattern that can be realized with the leaner, faster puts function, GCC and Clang substituted the latter for the former. The first substitution is harmless. By applying the second one, though, the compilers assumed we did not intend to print a null pointer, because puts does not support this at all. But as the title of this post says, we shouldn’t be surprised: passing a null pointer to printf for %s was undefined behavior all along.
TSnippet, which, in a sense, contains a fifth printf implementation largely written from scratch, warns that this program invokes undefined behavior.
Conceivable consequences and conclusion
If a developer does not expect a pointer to be null, but thinks that it might still happen and decides to make robust code, they might write:
printf("[parsing] got string:\n"); printf("%s\n", s); if (s == NULL) { // handle the situation gracefully ... }
(The developer wants to log that the null pointer happened, and expects printf to handle it.) In this case, the null pointer, which the developer does not know how to cause and thus cannot test, will lead to a segmentation fault despite the developer’s attempt to handle it.
The moral of this note is once again that undefined behavior should be avoided as much as possible, because even if a libc has been kind to you once, the C compiler may not be.
Annex
Acknowledgments: Miod Vallat largely contributed to this post. Post-scriptum: if you pass a pointer to printf("%s\n", after transforming the call to a call to puts, GCC will assume that the pointer is not null, since “you” passed it to puts. Generally this should not make the problem worse, since the program will crash at the puts call, exceptions notwithstanding.