28.05.2020

The Wireshark Challenge: how to ensure the security of open-source projects with formal methods?

Meeting the security challenges arising from open-source projects with powerful analysis

Open-source software is a foundational element of our technological world today. WordPress, Linux, OpenSSL, or Nginx are just some examples of widely-used open-source programs. The large community of developers regularly contributes to the functionalities and security of these programs, libraries, or frameworks, making them more complete and robust. Thanks to this level of robustness and high security, the industrial world also benefits from open-source tools, even in mission-critical environments.

The Challenge: providing the strongest possible bug-free guarantee using only existing tests

Here at TrustInSoft, we are also big fans of open-source software. Our product, TrustInSoft Analyzer, is a recognized expert[1] for software code verification through formal methods analysis and is used in a wide range of critical industries to ensure the security of C and C++ code. Its kernel is open-sourced: using it TrustInSoft has long contributed to open-source by analyzing the security of the source code of various open-source projects such as OpenSSL[2], SQLite[3], and ntpd[4], to name a few.

As you can see, TrustInSoft often considers how to leverage its expertise to make open-source programs even more secure. So we jumped at the chance when coincidently, Naval Group, one of the most significant industrial companies of the naval defense sector worldwide, asked us to evaluate one of the most popular open-source software programs, Wireshark. Quality and security standards are immensely important to Naval Group because of their very critical and sensitive activities, and although Wireshark is not used in production, it can be used in a testing environment. To see TrustInSoft Analyzer in action, they asked us to run the code of Wireshark through our code analyzer to demonstrate its ability to provide the strongest possible guarantee that some precise parts of Wireshark do not contain any bug that can lead to severe security issues.

Wireshark is the most widely used network protocol analyzer and IP packet sniffer. It analyzes network protocols by looking at packet captures, which function like pictures taken of the network activity at particular moments in time. The tool allows deep inspection of hundreds of protocols and, as such, is the de-facto standard across many commercial and non-profit enterprises, government agencies, and educational institutions.

Setting up the source code analyzer

Fabien, one of TrustInSoft’s software engineers, was put in charge of this analysis. Fabien, who has comprehensive experience working with code analysis, chose to apply TrustInSoft’s usual approach and proceeded step by step.

(1) Access the code on GitHub, download and compile

(2) Identify and access dependency codes, download and compile

(3) Use a fuzzer to generate thousands of scenarios

(4) Run scenarios through the analyzer

(5) Identify risks and flaws and use the Graphical User Interface to find the cause precisely

(6) Create a report with a list of risks and share with codebase maintainers

(7) Once flaws corrected, a formal proof can be delivered that the codebase is more robust than before

With this in mind, Fabien started by accessing the Wireshark public code repository on Github and downloading the codebase. Fabien was thrilled to see that the repository already had software tests, some of which were intended to be verified with tools (Valgrind, which is a toolset that allows to monitor code performance, sanitizers, etc…). The repository also contained the possibility of doing fuzz testing. In the Wireshark context, fuzzing is the process of creating random or semi-random capture files, feeding them into Wireshark, and looking for crashes or other error conditions. This reassured Fabien and showed him Wireshark already implemented excellent code quality checking and security. However, Fabien knew that these existing mechanisms could be reinforced by the use of TrustInSoft Analyzer, with its use of formal methods to guarantee the absence of bugs. This is because although tests, sanitizers, and style checkers can strongly increase confidence in code, none of these can truly deliver an absolute guarantee and full confidence, as with the use of formal methods.

Software developers are truly superheroes

Wireshark is written in C language, which is one of the most popular low-level programming languages. C allows programmers to write code that directly interacts with memory and enables them to produce the most optimized, efficient, and fastest code. But with great power (over the machine) comes great responsibility. If mishandled, it can lead to unintended consequences and “illegal” actions in the software. These flaws can be very well hidden, and only appear if some conditions are met, thus making them very challenging to find and correct.

But this is where the TrustInSoft Analyzer excels the most. TrustInSoft specializes in mathematical guarantees on critical code by using formal methods to deliver the strongest possible proof of secured code. Formal methods are using very sophisticated mathematical modeling of the code and memory management. They function like creating a mathematical twin or a representation of the code. Because forbidden operations and behaviors are easier to spot in the mathematical world, TrustInSoft Analyzer can point out the bugs. TrustInSoft Analyzer uses formal method approaches such as abstract interpretation to model code execution paths and to identify illegal behaviors. Actually, the analyzer does not execute a compiled program natively on the machine where it runs. Instead, it works directly on the program’s source code level, interpreting it line by line and simulating the program’s behavior in a virtual environment. Based on a complete formal understanding of the C and C++ semantics and a complex model of the computer’s memory, the simulation is mathematically rigorous and comprehensive.

Correcting those bugs on real-life code and having an issue-free mathematical construct is the first step to full formal verification, the highest confidence level in code.

If you would like to explore TrustInSoft Analyzer’s functionalities a little bit more, have a look at this article written by one of our software engineers: “Upgrade your existing test suites with TrustInSoft.”

Back to Fabien, who was looking at the Wireshark source code and test suite. Fabien included dependencies of Wireshark in his analysis, to ensure that the investigation would provide the most rigorous results. He also saw that the test code is supplied with 44 preexisting packet captures used for testing. Since this does not offer enough diversity to test all possibilities, Fabien used the AFL Fuzzer to generate 10,000 more packet captures. Analyzing the outcome of the Wireshark packet processing code on those 10,000 packets is equivalent to doing 10,000 unique tests. This would allow Fabien to simulate a real-time situation with realistic traffic volumes.

We know… Fabien has the nicest view from his office!

Once Fabien had generated enough tests, he launched TrustInSoft Analyzer and made a cup of tea. As understanding the Wireshark source code was not necessary and did not prevent the analyzer from identifying the issues, Fabien was able to really enjoy his cup of Earl Grey.

Understanding the results and next steps

A couple of hours later, the analyzer displayed the results. Several undefined behaviors were signaled. Curious, Fabien used the integrated Graphical User Interface to pinpoint the location of those errors. As the analyzer memorizes throughout the execution of the program all values of variables along the execution path, it was straightforward for Fabien to identify where the errors occurred. Some of those undefined behaviors included uninitialized variables and signed overflows.

With full confidence in the results, Fabien wrote a report listing identified issues and their precise locations and shared it with the Wireshark code maintainers. Since then, discussions are well advanced in solving those bugs, as seen here and here. Because each bug reported in TrustInSoft Analyzer is a true bug, Wireshark should release a patch that corrects each bug Fabien reported.

Fabien was thrilled to help out such an impressive and large-scale open-source project. He did not need to write a single new line of code, nor did he need to understand how the Wireshark code worked. In having at his disposal the existing tests of Wireshark, Fabien just needed to use TrustInSoft Analyzer to identify potential issues and pinpoint them along with their locations in the source code. Now it’s up to the Wireshark team to correct the remaining errors.

The bottom line: writing tests is the simple key to obtaining a formal methods verification

Correcting errors found at this first level is necessary in order to eventually provide a full guarantee (an interesting next step would be to test the code for any input including beyond test-defined inputs, but this is another story for another time). Once those errors identified at this level are fixed, and the analysis by TrustInSoft is replayed, we can deliver an absolute guarantee that for whatever the test-defined inputs, the output will be always determined and defined, which provides a high confidence level in the code. The good news is that anyone can benefit from the power of formal methods-verified results. The only prerequisite is simple: to write tests, as did Wireshark.

Although there are a few steps left to reach full formal validation, correcting errors found at this first level is mandatory. With TrustInSoft Analyzer, Fabien was able to provide a unique way to offer strong confidence in code security and paved the way towards a bug and vulnerability-free program.

If you would like to know more about the different bugs identified by TrustInSoft Analyzer, you can consult the full report, available here.

[1] https://trust-in-soft.com/nist-assessment/

[2] https://www.mail-archive.com/openssl-dev@openssl.org/msg39140.html

[3] https://blog.regehr.org/archives/1292

[4] https://trust-in-soft.com/demo-1/