Frama-C/Eva in SATE VI with Juliet 1.3

André Maroneze - 15th Nov 2018

Frama-C chose to participate in the SATE VI Ockham Sound Analysis Criteria. With the upcoming Frama-C 18 (Argon) release, results can be reproduced using the open source version of the Eva plug-in. This post briefly presents SATE, the Juliet test suite, and the results obtained with Frama-C 18. We then describe the Frama-C SATE VI repository on Github, created to allow anyone to reproduce the results obtained with Frama-C/Eva. We also explain some patches that were applied to the test suites, and our choice of parametrization options for Eva.

SATE VI and Juliet 1.3

SATE (Static Analysis Tool Exposition), organized by NIST, is a comparative study of static analysis tools: toolmakers volunteer to participate by running their tools on the sets provided by NIST, and then by submitting their analysis scripts and results. The NIST SAMATE team then compiles a dense report, comparing and evaluating all participating tools.

For the analysis of C programs, there are two tracks: one in which all kinds of tools participate, called the Classic track, and another dedicated to sound analyzers, in the sense of the Ockham Sound Analysis Criteria.

For the Ockham track in SATE VI, we chose the Juliet 1.3 test suite for C/C++, which contains over 64,000 test cases. Some are written in C++, but most are written in C. Test suites are organized in directories related to CWEs (Common Weakness Enumerations): each directory has code samples containing violations of the CWE. For each test case, there are two variants: one with good code (which does not trigger the CWE), and another with bad code (which triggers the CWE in question). In our tests using Juliet, Frama-C is supposed to find all occurrences of bad code; it may, however, report some good code as bad.

CWEs handled by Frama-C/Eva

For the analysis with Frama-C/Eva, we chose the subset of CWEs in Juliet 1.3 for C/C++ which are handled by the analyzer. Click the arrow below to unfold the list.

CWEs in Juliet 1.3 handled by Frama-C/Eva (18 - Argon)
  • CWE121 - Stack Based Buffer Overflow
  • CWE122 - Heap Based Buffer Overflow
  • CWE123 - Write What Where Condition
  • CWE124 - Buffer Underwrite
  • CWE126 - Buffer Overread
  • CWE127 - Buffer Underread
  • CWE190 - Integer Overflow
  • CWE191 - Integer Underflow
  • CWE194 - Unexpected Sign Extension
  • CWE195 - Signed to Unsigned Conversion Error
  • CWE196 - Unsigned to Signed Conversion Error
  • CWE197 - Numeric Truncation Error
  • CWE369 - Divide by Zero
  • CWE415 - Double Free
  • CWE416 - Use After Free
  • CWE457 - Use of Uninitialized Variable
  • CWE464 - Addition of Data Structure Sentinel
  • CWE469 - Use of Pointer Subtraction to Determine Size
  • CWE475 - Undefined Behavior for Input to API
  • CWE562 - Return of Stack Variable Address
  • CWE587 - Assignment of Fixed Address to Pointer
  • CWE588 - Attempt to Access Child of Non Structure Pointer
  • CWE590 - Free Memory Not on Heap
  • CWE617 - Reachable Assertion
  • CWE665 - Improper Initialization
  • CWE690 - NULL Deref From Return
  • CWE758 - Undefined Behavior
  • CWE835 - Infinite Loop
  • CWE843 - Type Confusion

Note that some CWEs are very broad, such as CWE758 - Undefined Behavior. We do not claim Frama-C/Eva handles all cases which might be categorized under such CWEs; however, those considered in the Juliet 1.3 test suite have all their bad code samples correctly labeled.

Some examples of CWEs which are not considered by Frama-C/Eva include CWE426 - Untrusted Search Path, CWE90 - LDAP Injection and CWE780 - Use of RSA Algorithm Without OAEP. All three of them constitute functional-level weaknesses which do not cause runtime errors.

Some of the CWEs not currently considered by Frama-C/Eva could be handled via specific function annotations, but this would constitute a very ad hoc solution. Other plugins in the Frama-C platform might be better suited for these CWEs; e.g., using the WP plugin with function annotations, or the Secure Flow plugin with trusted and untrusted sinks and sources, might allow identifying the cases with API violations leading to weaknesses.

Results

The following table summarizes the results obtained with Frama-C 18 (Argon). The script that produces this table is presented in a later section.

total imprecise      :    751
total non-terminating:      3
total ok             :  44668
total timeout        :      0
total unsound        :      0
precision (excluding non-terminating): 0.983465

Each test case is labeled as one of the following cases:

  • ok: correctly labeled: either a good test without alarms, or a bad test with alarms;
  • imprecise: a good test with a false alarm;
  • unsound: a bad test without an alarm;
  • non-terminating: the program does not terminate; this should never happen, except in the case of CWE835 - Infinite Loop;
  • timeout: a test which did not finish within the allotted time (by default, 30 seconds). This never happens in our benchmarking rig, but slower or overloaded machines may see this happen. If so, then a larger timeout should be given and the test should be re-run.

The precision (98.3%) displayed in the last line is the number of ok tests divided by the total number of tests (ok + imprecise + unsound).

The improvements between Frama-C 17 (Chlorine) and 18 (Argon) are noteworthy, especially the fact that the memcpy and memset builtins are now released in the open source version. Other improvements in precision also contributed to remove some spurious warnings and to simplify the analysis script.

The remaining imprecisions are due to different situations, but most of them occur several times, due to reuse of the same patterns among different test cases. One source of imprecision is the snprintf function, whose specification is not precise enough to ensure that the output string is null-terminated. This leads to false alarms when that string is used later in the code. Another case is related to CWE369 - Divide by Zero, in which some tests use socket-related functions whose specifications do not allow for precise results in every case.

The Frama-C SATE VI Repository

The Frama-C SATE VI repository on Github allows anyone to run Frama-C on the Juliet 1.3 test suite for C/C++, to reproduce the results or to try different approaches, e.g. parametrizing the analysis differently, or trying a new abstract domain.

Software requirements are: Frama-C 18.0 (Argon), GNU Make (>= 4.0), the GNU parallel tool, Bash (>= 4.0) and wget (to download the Juliet archive from NIST’s website).

The repository is structured in the following way:

  • A GNUmakefile allows bootstrapping the repository, by downloading the Juliet 1.3 for C/C++ archive from NIST’s website and then decompressing it (Juliet 1.3 is not distributed in our repository). It also applies some patches to Juliet’s sources, described in a later section;
  • the fc directory contains Frama-C scripts used to run the analysis and automatically evaluate its results. The main scripts are:
    • analyze.sh: sets the default analysis parameters; these differ from Frama-C’s defaults in several ways, being stricter in some cases but less strict in others;
    • evaluate_case.sh: performs automatic evaluation of good and bad cases into the categories presented before;
  • the fc/patches directory contains some patches applied to Juliet 1.3 to fix known issues in a few tests. It also changes the way non-deterministic random numbers are used in some tests (details are presented later);
  • the C directory contains the extracted Juliet 1.3 archive, with its scripts and test cases;
  • finally, the oracles directory contains a versioned copy of the oracles produced by the analysis with Frama-C/Eva; this allows comparing different parametrizations, and following the evolution of the platform.

Reproducing the results

The overall process to use the repository is the following:

git clone https://github.com/Frama-C/SATE-VI.git
cd SATE-VI
make
cd C/testcases
make

The last make step takes a very long time (2+ hours, depending on your CPU and number of cores).

Note that, if Frama-C is not installed in the PATH, you can use the fc/frama-c-path.sh script to specify the directory where Frama-C is installed. This is especially useful to compare different versions of Frama-C.

After running make inside the C/testcases directory, a small summary with the number of tests performed in each subdirectory will be presented in textual form:

CWE121_Stack_Based_Buffer_Overflow/s01: 876 tests
    876  ok
CWE121_Stack_Based_Buffer_Overflow/s02: 1020 tests
   1020  ok
CWE121_Stack_Based_Buffer_Overflow/s03: 1020 tests
     34  imprecise
    986  ok
...

Running ./summarize_results.sh will also output this result. The stat_results.py script will produce an overall summary across all subdirectories, as presented in the Results section.

Re-running individual tests

Inside each directory in C/testcases/, it is possible to run a single test, by removing its extension and adding _good.res or _bad.res, e.g. if there is a file file.c inside directory CWE000/s01, then doing cd CWE000/s01 and make file_good.res will run the test and produce three files:

  • file_good: output of the analysis;
  • file_good.csv: CSV report of warnings produced during the analysis;
  • file_good.res: result of evaluate_case.sh applied to the test case report;

Note that for tests consisting of multiple files (each ending with a, b, c, etc.), the test case name is their prefix without the letter, e.g. for the test comprised of files file_42a.c and file_42b.c, the command is make file_42_good.res (or bad).

Finally, by replacing .res with .gui in the make command, it is possible to open the Frama-C GUI on the test case, to inspect it.

Patches applied to Juliet 1.3

In the Frama-C SATE VI repository, there are two sets of patches applied to Juliet 1.3 for C/C++:

fix-tests.patch

These are fixes in a small number of tests, due to 2 patterns which were found out to lead to undefined behavior.

The first fix consists in replacing an overflow check which accidentally introduces a signed overflow:

 /* FIX: Add a check to prevent an overflow from occurring */
-if (abs((long)data) <= (long)sqrt((double)CHAR_MAX))
+if (data <= LONG_MAX && labs(data) <= (long)sqrt((double)CHAR_MAX))
 {

data has type unsigned int. In the original code, the call to abs((long)data) is equivalent to abs((int)(long)data), since abs takes an int as argument. If int and long have the same length (e.g. as in x86 32-bit), then data can be larger than LONG_MAX (and also INT_MAX, since they have the same length), leading to a signed overflow in the call, and thus an undefined behavior. The patched version first checks that data is small enough, and then uses labs to avoid downcasting data.

The second fix concerns a set of files which had a local variable declared in a scope smaller than intended:

 static void goodG2B()
 {
     void * data;
+    int intBuffer = 8;
     /* Initialize data */
     data = NULL;
     {
     /* FIX: Point data to an int */
-        int intBuffer = 8;
         data = &intBuffer;
     }
     CWE843_Type_Confusion__short_68_goodG2BData = data;

In this set of test cases, intBuffer (and similar variables) was declared inside a block, and the value was later used outside of the scope of the variable, leading to undefined behavior. The fix consists in declaring the variable in the main scope of the function.

improve-testcasesupport.patch

This concerns mainly the RAND32 macro, used in Juliet to produce non-deterministic numbers. This macro is used in some test cases to assign values to variables of different types. The result is downcast to char or short whenever necessary. However, this downcast leads to warnings due to usage of option -warn-signed-downcast. This option is enabled in the analyze.sh script since some CWEs are related to it.

The patches applied to the test cases consist in replacing calls to (char)RAND32() and (short)RAND32() with RAND8() and RAND16(), two new macros defined in a way that prevents overflows from happening. These macros are defined in a way very similar to the existing RAND32() and RAND64() macros, but avoiding overflows triggering downcast warnings. To preserve existing behavior, these macros are defined to RAND32() when not running Frama-C.

Finally, minor patches were applied to pretty-printing functions in io.c, such as:

void printHexCharLine (char charHex)
 {
-    printf("%02x\n", charHex);
+    printf("%02hhx\n", charHex);
 }

The printf modifier for a variable of type char is hh, although this is very rarely used in practice. Still, the Variadic plug-in warns about it, so fixing the test case support file prevents spurious warnings.

Note that all of these patches are intended to improve the precision of the analysis, but do not affect its correction: not applying these patches will lead Frama-C/Eva to produce several spurious warnings (and, in a few cases, such warnings will lead to non-termination, when the only possible behavior is undefined), but their absence will not make Frama-C’s results unsound; bad cases will still be reported as such.

Rationale for analysis options

The analyze.sh script contains a series of non-default options used in the analysis. We present here the rationale for choosing these options during the analysis of Juliet 1.3. Similar options are grouped together for presentation purposes.

-no-autoload-plugins -load-module from,inout,report,scope,eva,variadic

These options are present in most of Frama-C’s test suites. The -no-autoload-plugins option simply disables by default all plugins, which speeds up initialization of Frama-C. We then only load the plug-ins which are needed for Juliet: eva and its derived plugins (from, inout, scope), report for the CSV report, and variadic to handle variadic functions present in almost every test (e.g. printf). Overall, this is an optimization for running thousands of tests; when running only a few dozens of tests, the difference is negligible.

-kernel-warn-key parser:decimal-float=inactive \
-kernel-warn-key typing:no-proto=inactive \
-kernel-warn-key typing:implicit-conv-void-ptr=inactive \
-eva-warn-key locals-escaping=inactive

These options disable warnings which are not directly related to runtime errors, but are enabled by default since they often indicate code quality issues in real code bases. The first three warnings could be avoided by patching the code (e.g. adding prototypes, explicit casts, and avoiding inexact floating-point constants); the latter is related to addresses of variables remaining accessible out of their scopes (in Eva, their value becomes ESCAPINGADDR), and it is emitted since it is often easier to identify the moment where the variable becomes out of scope, rather than the later time at which it may be used. This is not an undefined behavior, unless the variables are actually used (which does not happen in the good test cases). In any case, when the values are accessed (as in the bad test cases), a different warning is emitted.

-eva-msg-key=-initial-state \
-eva-no-show-progress \
-eva-print-callstacks \

These are all options related to minimizing the default verbosity of the analysis, except for the last, which adds callstacks to the output: when there are warnings, it is better to know where they come from.

-slevel 300 \

Some amount of slevel is useful to improve precision of the analysis. The chosen value is arbitrary, and variable according to the characteristics of the code under analysis. In the case of Juliet 1.3, this value is a good compromise between precision and speed of the analysis.

-eva-builtin alloca:Frama_C_vla_alloc_by_stack \

The alloca function (which is not in the C standard) is used in several test cases. In all of them, it behaves just like the allocation of a VLA (variable length array), that is, the variables are not used outside of the scope in which alloca was called1 . It is therefore safe to associate the vla_alloc builtin to calls to the alloca function in Juliet 1.3.

1 The main difference between alloca and the allocation of a VLA is that the latter is deallocated at the end of the scope in which it was created, while alloca memory blocks are deallocated at the end of the function in which they are called.

-warn-special-float none \

By default, Frama-C considers non-finite floating-point values as forbidden, emitting warnings when they may arise, even though this does not constitute an undefined behavior. This option allows non-finite floating-point values, thus avoiding spurious warnings.

-warn-signed-downcast \
-warn-unsigned-overflow \

Signed downcasts (possible loss of information when converting a larger type to a smaller one) and unsigned overflows, by default, do not constitute undefined behavior; however, it is a good practice to avoid them, and some specific CWEs do require treating them as undesired. Another option that could have been activated is -eva-symbolic-locations-domain, but in Juliet 1.3 it does not make any difference, neither in the oracles nor in the overall analysis time.

-eva-warn-copy-indeterminate=-@all \

By default, copying a possibly indeterminate value (e.g. uninitialized) in Eva triggers a warning, even if the value is copied in a standards-compliant way (e.g. via unsigned char pointers). This helps detect problems earlier in the code, rather than when used. However, the usage of such values already triggers warnings anyway, so we can afford to be less strict, avoiding spurious warnings in good test cases where the copied values are not used later.

-eva-equality-domain \
-eva-sign-domain \

These two abstract domains are efficient enough to be used in most analyses, and improve the precision of the results at a minimal cost. The equality domain in particular is useful in some tests which require relational information, while the sign domain is well suited for testing divide-by-zero errors. The equality domain eliminates false alarms in 184 tests, and the sign domain removes 308 false alarms.

-add-symbolic-path TESTCASESUPPORT_DIR:$TESTCASESUPPORT_DIR

This is a technicality, but useful for scripts and versioning: by default, Frama-C uses relative paths when they are “sufficiently close” to the current working directory, otherwise absolute paths are used (e.g. when there is a .. in the path). To avoid versioning such paths (and also for creating test oracles), Frama-C has a mechanism to replace specific paths with user-defined strings. The most common use case is Frama-C’s share directory, written as FRAMAC_SHARE in reports and in the GUI. In this case, the actual path to the C/testcasesupport directory is replaced with string TESTCASESUPPORT_DIR. This ensures every user should have the same oracles for the CSV files mentioned below.

-report-no-proven -report-csv <file>.csv

These options are not present in analyze.sh, but in evaluate_case.sh, due to availability of the output filename. These options invoke the Report plug-in to generate a CSV file containing the list of properties in the program, along with their statuses. Option -report-no-proven filters only the properties which are left Unknown or Invalid, which are the ones we are interested in when looking for undefined behaviors.

Conclusion

The NIST SAMATE Juliet 1.3 test suite for C/C++ provides an excellent set of well-defined programs, as well as programs with isolated bugs, which are very useful for automated testing and comparative evaluation.

Frama-C/Eva can handle several of the CWEs present in the test suite, and recent improvements in the analyzer allow it to obtain a very high precision. The tests cases are small, which does not make them representative of large case studies, but they allowed the identification of limitations in the analyzer and offered opportunities for improvement.

The analysis with Frama-C/Eva allowed the identification of very few issues with the code, which given the size of the test suites is an indicator of its high quality.

Overall, participation in the Ockham track enabled Frama-C/Eva to improve its analysis, and offered a very extensive set of test cases for regression testing and quantitative experimentation with different parametrizations. For instance, the sign domain, developed originally as part of a tutorial for Eva, proved itself very useful to improve the precision of several tests related to CWE369 - Divide by Zero, with a minimal runtime cost. The large-scale testing afforded by Juliet enables such kinds of evaluation.

We would like to thank NIST for preparing Juliet and organizing SATE, and for their commitment to the exposition of innovative tools for program analysis and verification.

André Maroneze
15th Nov 2018