SuperTest and Frama-C: a Clash of Titans

A. Maroneze (Frama-C), V. Yaglamunov & M. Beemster (Solid Sands) - 16th Nov 2022

This post has been co-written with Solid Sands; it is also available at the Solid Sands blog.

Some time ago, the Frama-C team entered into a partner agreement with Solid Sands to make use of SuperTest, a very thorough test suite for C compilers and libraries. From basic syntactic tests to comprehensive libc functions, this suite helps to ensure the quality of compilers and libraries as well as their conformance to different standards.

Naturally, we asked ourselves: what happens if we use SuperTest and Frama-C to verify each other? This blog post summarizes the results of this exchange.

Testing a verifier, verifying a test suite

Frama-C is not a compiler, but it has many similarities: it parses programs, types them, and produces an AST. More importantly, it can analyze tests semantically.

SuperTest is not a normal program, but a collection of, sometimes contrived, tests that touch every aspect of C. It will stress any tool that processes C. Its positive tests should be strictly conforming to the C standard.

Given enough C programs, cracks will appear

Solid Sands has analyzed many implementations of C and C analysis tools. It has never seen any implementation that is free of error. That is not to say that there are no good implementations. It is just that in the corners of the C language there are things that you can easily trip over. Another difficult issue to get right is the precision of diagnostics. Thus, some errors in Frama-C were to be expected as well.

From Frama-C’s previous experience with test suites and benchmarks, at least some semantic bugs in SuperTest were to be expected: given a sufficiently large set of C programs, and given the large amount of subtle semantic aspects in the language, it is statistically unlikely to obtain such a suite completely free of undefined behaviors, be them portability issues, non-deterministic cases, or simply bugs. This was already the case with NIST’s Juliet C/C++ testsuite, [J. Moerman’s 2018 bachelor’s thesis test suite (JM2018TS)(https://github.com/JMoerman/JM2018TS), TrailOfBits’ version of the DARPA Challenge Binaries, and other test suites and benchmarks we ran Frama-C on.

Eva to the rescue

The Eva plug-in of Frama-C is particularly efficient for this kind of scenario: mostly automatic execution, no (or few) specifications needed, great precision due to small test sizes.

After an initial setup to configure parsing and adjust a few stubs (e.g. so that Eva will consider the most general case), it is often easy to run Frama-C and obtain precise results in a few minutes. In SuperTest, interpreting the results is made easier by the fact that test cases are named and structured according to the corresponding C standard features they test.

Given enough C programs, Frama-C will fail to parse some

Before starting the test campaign, The Frama-C team expected Frama-C to not completely succeed with it. Frama-C’s parser is not 100% compatible with C99/C11/etc: it tends to be too lenient. The main reasons being that:

  1. Frama-C does not compile the code, so the user will have to write syntactically valid C code in the end; historically, Frama-C delegates some checks to the compiler;
  2. Frama-C focuses on semantic analyses, leaving the thorny bits of syntactical issues to other, better-suited tools. For instance, Frama-C does not even have its own preprocessor.

Nowadays, we strive to make Frama-C as syntactically compliant as possible, but there are still a few remaining spots requiring some polish. This effort certainly helps with that.

Results: unsurprising, but interesting nonetheless

Unsurprisingly, we found both kinds of issues: some semantic issues in SuperTest, and some syntactic shortcomings in the case of Frama-C. Actually, we found more of the latter than the former.

Fortunately, we found no major problems. The issues found in SuperTest are of little relevance for critical and embedded systems - for instance, issues related to wide characters (wchar.h). We believe two reasons for that are:

  1. Many tools and frameworks decide not to support these functions, since they are rarely used in the domain of embedded systems;
  2. The issues were related to portability, and thus undetectable in some architectural configurations.

Very wide coverage

SuperTest is exhaustive with respect to the C standard library, while Frama-C’s ACSL specifications are iteratively added on a best-effort basis. SuperTest contains tests for features which are not covered by Frama-C, such as complex numbers, some C11 constructs, and several library files and functions for which Frama-C does not yet provide an ACSL specification. Thus SuperTest is likely to remain relevant for future Frama-C evolutions, and reciprocally, improving Frama-C and maintaining SuperTest up to date can help with future findings.

Final score: win-win

Overall, this interchange between CEA LIST and Solid Sands has proven beneficial to both: setting up a recurrent run of Frama-C on SuperTest prevents regressions and also benefits from evolutions to Frama-C itself; on the other hand, Frama-C can improve its syntactic support thanks to SuperTest.

A. Maroneze (Frama-C), V. Yaglamunov & M. Beemster (Solid Sands)
16th Nov 2022