A mini-tutorial of ACSL specifications for Value

André Maroneze - 23rd Sep 2016

(with the collaboration of F. Kirchner, V. Prevosto and B. Yakobowski)

Users of the Value plugin often need to use functions for which there is no available code, or whose code could be abstracted away. In such cases, ACSL specifications often come in handy. Our colleagues at Fraunhofer prepared the excellent ACSL by example report, but it is mostly directed at WP-style proofs.

In this post we explain how to specify in ACSL a simple function, in a way that is optimal for Value.

Prerequisites:

  • Basic knowledge of Value;
  • Basic knowledge of ACSL.

The messages and behaviors presented here are those of Frama-C Aluminium. Using another version of Frama-C might lead to different results.

A simple function

In this tutorial, we proceed in a gradual manner to isolate problems and make sure that each step works as intended. Let us consider the following informal specification:

// returns a random character between buf[0] and buf[n-1]
char get_random_char(char const *buf, unsigned n);

A minimal ACSL specification for this function must check two things:

  1. that n is strictly positive: otherwise, we’d have to select a character among an empty set, thereby killing any logician that would wander around at that time (buf is a byte array that may contain \0, so that there is no obvious way to return a default value in case n == 0);
  2. that we are allowed (legally, as per the C standard) to read characters between buf[0] and buf[n-1];

The following specification ensures both:

/*@
  requires n > 0;
  requires \valid_read(buf+(0 .. n-1));
*/
char get_random_char(char const *buf, unsigned n);

Note the spaces between the bounds in 0 .. n-1. A common issue, when beginning to write ACSL specifications, is to write ranges such as 0..n-1 without spaces around the dots. It works in most cases, but as 0..n is a floating-point preprocessing token (yes, this looks strange, but you can see it yourself in section 6.4.8 of C11 standard if you’re not convinced) funny things might happen when pre-processing the annotation. For instance, if we had a macro MAX instead of n, (e.g. #define MAX 255), then writing [0..MAX] would result in the following error message: [kernel] user error: unbound logic variable MAX, since the pre-processor would dutifully consider 0..MAX as a single token, thus would not perform the expansion of MAX. For that reason, we recommend always writing ranges with spaces around the dots.

To check our specification, we devise a main function that simply calls get_random_char with the appropriate arguments:

void main() {
  char *buf = "abc";
  char c = get_random_char(buf, 3);
}

Then, if we run this with Value (frama-c -val), it will produce the expected result, but with a warning:

[kernel] warning: No code nor implicit assigns clause for function get_random_char, generating default assigns from the prototype

By printing the output of Frama-C’s normalisation (frama-c -print), we can see what its generated assigns clause looks like:

assigns \result;
assigns \result \from *(buf+(0 ..)), n;

Despite the warning, Frama-C kernel generated a clause that is correct and not too imprecise in this case. But we should not rely on that and avoid this warning whenever possible, by writing our own assigns clauses:

/*@
  requires n > 0;
  requires \valid_read(buf+(0 .. n-1));
  assigns \result \from buf[0 .. n-1], n;
*/
char get_random_char(char const *buf, unsigned n);

Running Value again will not emit any warnings, plus it will indicate that both preconditions were validated:

[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i (no preprocessing)
[kernel] Parsing file.c (with preprocessing)
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
[value] Values of globals at initialization

[value] computing for function get_random_char <- main.
        Called from file.c:10.
[value] using specification for function get_random_char
file.c:2:[value] function get_random_char: precondition got status valid.
file.c:3:[value] function get_random_char: precondition got status valid.
[value] Done for function get_random_char
[value] Recording results for main
[value] done for function main
[value] ====== VALUES COMPUTED ======
[value] Values at end of function main:
  buf ∈ abc
  c ∈ [--..--]

However, the result is still imprecise: c ∈ [--..--], that is, it may contain any character, not only ‘a’, ‘b’ or ‘c’. This is expected: assigns \from ... clauses are used by Value to compute dependencies information, but they are not sufficient to constrain the exact contents of the assigned variables. We need to use postconditions for that, that is, ensures clauses.

The following ensures clause states that the result value must be one of the characters in buf[0 .. n-1]:

ensures \subset(\result, buf[0 .. n-1]);

The ACSL \subset predicate is interpreted by Value, which isolates the characters in buf, and thus returns a precise result: c ∈ {97; 98; 99}. Note that Value prints the result as integers, not as ASCII characters.

This concludes the specification of a very simple partial function. But what if we want to take into account more possibilities, e.g. a NULL pointer in buf, or n == 0?

First try: a robust and concise specification

Our get_random_char function is simple, but not robust. It is vulnerable to accidents and attacks. Let us apply some Postel-style recommendations and be liberal in what we accept from others: instead of expecting that the user will not pass a NULL pointer or n == 0, we want to accept these cases, but return an error code if they happen.

We have two choices: add an extra argument to get_random_char that represents the result status (OK/error), or use the return value as the status itself, and return the character via a pointer. Here, we will adopt the latter approach:

typedef enum { OK, NULL_PTR, INVALID_LEN } status;

// if buf != NULL and n > 0, copies a random character from buf[0 .. n-1]
// into *out and returns OK.
// Otherwise, returns NULL_PTR if buf == NULL, or INVALID_LEN if n == 0.
status safe_get_random_char(char *out, char const *buf, unsigned n);

We could also have used errno here, but global variables are inelegant and, most importantly, it would defeat the purpose of this tutorial, wouldn’t it?

This new function is more robust to user input, but it has a price: its specification will require more sophistication.

Before revealing the full specification, let us try a first, somewhat naive approach:

typedef enum { OK, NULL_PTR, INVALID_LEN } status;

/*@
  assigns \result, *out \from buf[0 .. n-1], n;
  ensures \subset(*out, buf[0 .. n-1]);
*/
status safe_get_random_char(char *out, char const *buf, unsigned n);

void main() {
  char *buf = "abc";
  char c;
  status res = safe_get_random_char(&c, buf, 3);
  //@ assert res == OK;
}

This specification has several errors, but we will try it anyway. We will reveal the errors as we progress.

Once again, we use a small C main function to check our specification, starting with the non-erroneous case. This reveals that Value does not return the expected result:

...
[value] Values at end of function main:
  buf ∈ abc
  c ∈ [--..--] or UNINITIALIZED
  res ∈ {0}

Variable c is imprecise, despite our \ensures clause. This is due to the fact that Value will not reduce the contents of a memory location that may be uninitialized. Thus, when specifying the ensures clause of a pointed variable, it is first necessary to state that the value of that variable is properly initialized.

This behavior may evolve in future Frama-C releases. In particular, our specification could have resulted in a more precise result, such as c ∈ {97; 98; 99} or UNINITIALIZED.

Adding ensures \initialized(out); before the \ensures \subset(...) clause will allow c to be precisely reduced to {97; 98; 99}. This solves our immediate problem, but creates another: is the specification too strong? Could we have an implementation of get_random_char in which \initialized(out) does not hold?

The answer here is definitely yes, especially in the case where buf is NULL. It is arguably also the case when n == 0, although it depends on the implementation.

The most precise and correct result that we expect here is c ∈ {97; 98; 99} or UNINITIALIZED. To obtain it, we will use behaviors.

Another reason to consider using behaviors is the fact that our \assigns clause is too generic, leading to avoidable warnings by Value: because we have written that *out may be assigned, Value will try to evaluate if out is a valid pointer. If our main function includes a test such as res = safe_get_random_char(NULL, buf, 3), Value will output the following:

[value] warning: Completely invalid destination for assigns clause *out. Ignoring.

This warning is not needed here, but its presence is justified by the fact that, in many cases, it helps detect incorrect specifications, such as an extra *. Value will still accept our specification, but if we want a precise analysis, the best solution is to use behaviors to specify each case.

Second try: using behaviors

The behaviors of our function correspond to each of the three cases of enum status:

  1. NULL_PTR when either out or buf are NULL;
  2. INVALID_LEN when out and buf are not NULL but n == 0;
  3. OK otherwise.

ACSL behaviors can be named, improving readability and generating more precise error messages. We will define a set of complete and disjoint behaviors; this is not required in ACSL, but for simple functions it often matches more closely what the code would actually do, and is simpler to reason about.

One important remark is that *disjoint* and *complete* behaviors are not checked by Value. The value analysis does take them into account to improve its precision when possible, but if they are incorrectly specified, Value may not be able to warn the user about it.

Here is one way to specify three disjoint behaviors for this function, using mutually exclusive assumes clauses.

/*@
  behavior ok:
    assumes out != null && buf != \null;
    assumes n > 0;
    requires \valid_read(buf+(0 .. n-1));
    requires \valid(out);
    // TODO: assigns and ensures clauses
  behavior null_ptr:
    assumes out == \null || buf == \null;
    // TODO: assigns and ensures clauses
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    // TODO: assigns and ensures clauses
*/
status safe_get_random_char(char *out, char const *buf, unsigned n);

Our choice of behaviors is not unique: we could have defined, for instance, two behaviors for null pointers, e.g. null_out and null_buf; but the return value is the same for both, so this would not improve precision.

Note that assumes and requires play different roles in ACSL: assumes are used to determine which behavior(s) are active, while requires impose constraints on the pre-state of the function or current behavior. For instance, one should not specify assumes \valid(out) in one behavior and assumes !\valid(out) in another: what this specification would actually mean is that the corresponding C function should somehow be able to distinguish between locations where it can write and locations where it cannot, as if one could write if (valid_pointer(buf)) in the C code. In practical terms, the main consequence is that such a specification would prevent Value from detecting errors related to memory validity.

For a more concrete example of the difference between them, you may consult the small appendix at the end of this post.

Assigns clauses in behaviors

Our specification for safe_get_random_char is incomplete: it has no assigns or ensures clauses.

Writing assigns clauses in behaviors is not always trivial, so here is a small, simplified summary of the main rules concerning the specification of such clauses for an analysis with Value:

  1. Global (default) assigns must always be present (even for complete behaviors), and must be at least as general as the assigns clauses in each behavior;
  2. Behaviors only need assigns clauses when they are more specific than the global one.
  3. Having a complete behaviors clause allows the global behavior to be ignored during the evaluation of the post-state, which may lead to a more precise result; but the global assigns must still be present.

Note: behavior inclusion is *not* currently checked by Value. Therefore, if a behavior’s assigns clause is not included in the default one, the result is undefined.

Also, the reason why assigns specifications are verbose and partially redundant is in part because not every Frama-C plugin is able to precisely handle behaviors. They use the global assigns in this case.

For ensures clauses, the situation is simpler: global and local ensures clauses are simply merged with an implicit logical AND between them.

The following specification is a complete example of the usage of behaviors, with precise ensures clauses for both outputs (\result and out). The main function below tests each use case, and running Value results in valid statuses for all preconditions and assertions. The \from terms in the assigns clauses are detailed further below.

#include <stdlib.h>
typedef enum {OK, NULL_PTR, INVALID_LEN} status;

/*@
  assigns \result \from out, buf, n;
  assigns *out \from out, buf, buf[0 .. n-1], n;
  behavior null_ptr:
    assumes out == \null || buf == \null;
    assigns \result \from out, buf, n;
    ensures \result == NULL_PTR;
  behavior invalid_len:
    assumes out != \null && buf != \null;
    assumes n == 0;
    assigns \result \from out, buf, n;
    ensures \result == INVALID_LEN;
  behavior ok:
    assumes out != \null && buf != \null;
    assumes n > 0;
    requires \valid(out);
    requires \valid_read(&buf[0 .. n-1]);
    ensures \result == OK;
    ensures \initialized(out);
    ensures \subset(*out, buf[0 .. n-1]);
  complete behaviors;
  disjoint behaviors;
 */
status safe_get_random_char(char *out, char const *buf, unsigned n);

void main() {
  char *msg = "abc";
  int len_arr = 4;
  status res;
  char c;
  res = safe_get_random_char(&c, msg, len_arr);
  //@ assert res == OK;
  res = safe_get_random_char(&c, NULL, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(NULL, msg, len_arr);
  //@ assert res == NULL_PTR;
  res = safe_get_random_char(&c, msg, 0);
  //@ assert res == INVALID_LEN;
}

Our specification includes several functional dependencies (\from), but there is still one missing. Can you guess which one? The answer, as well as more details about why and how to write functional dependencies, will appear in the next post. Stay tuned!

Appendix: example of the difference between requires and assumes clauses

This appendix presents a small concrete example of what happens when the user mistakingly uses requires clauses instead of assumes. It is directed towards beginners in ACSL.

Consider the (incorrect) example below, where bzero_char simply writes 0 to the byte pointed by the argument c:

/*@
  assigns *c \from c;
  behavior ok:
    assumes \valid(c);
    ensures *c == 0;
  behavior invalid:
    assumes !\valid(c);
    assigns \nothing;
  complete behaviors;
  disjoint behaviors;
*/
void bzero_char(char *c);

void main() {
  char *c = "abc";
  bzero_char(c);
}

In this example, Value will evaluate the validity of the pointer c, and conclude that, because it comes from a string literal, it may not be written to (therefore \valid(c) is false). Value then does nothing and returns, without any warnings or error messages.

However, had we written it the recommended way, the result would be more useful:

/*@
  assigns *c \from c;
  behavior ok:
    assumes c != \null;
    requires \valid(c);
    ensures *c == 0;
  behavior invalid:
    assumes c == \null;
    assigns \nothing;
  complete behaviors;
  disjoint behaviors;
*/
void bzero_char(char *c);

void main() {
  char *c = "abc";
  bzero_char(c);
}

In this case, Value will evaluate c as non-null, and will therefore activate behavior ok. This behavior has a requires clause, therefore Value will check that memory location c can be written to. Because this is not the case, Value will emit an alarm:

[value] warning: function bzero_char, behavior ok: precondition got status invalid.

Note that checking whether c is null or non-null is something that can be done in the C code, while checking whether a given pointer p is a valid memory location is not. As a rule of thumb, conditions in the code correspond to assumes clauses in behaviors, while requires clauses correspond to semantic properties, function prerequisites that cannot necessarily be tested by the implementation.

André Maroneze
23rd Sep 2016