Value analysis tutorial, part 5: jumping to conclusions

Pascal Cuoq - 22nd Nov 2010

This post is in two parts, both of them independently good fits for the title, and still not completely without relation to each other, but that's probably a coincidence.

Methodology

In this thread, we aim at the verification of low-level properties for the functions in Skein. In the last post, I closed one line of reasoning as if the objective had been reached. But someone might object. \How do you know you can trust the verifier?" ey might say. "What have you really proved?". And "are you asking me to believe that all cases are covered with separate analyses for 0 1 even numbers greater than 2 and odd numbers greater than 3?".

Answers in reverse order:

  1. yes. In C you cannot call a function a negative number of times so I am asking you to believe that this partition covers the "calling Update(... 80) an arbitrary number of times" part of the informal specification we settled on from the beginning.
  2. The middle question refers to the fact that there is a translation of sorts from the informal requirement or specification to the formal one. This is not unlike the translation in traditional verification from the informal requirement to executable test (or code review) plans. Yes there are dangers here but the dangers are not new. Both with testing or with more formal verification as we have been outlining you can make unjustified assumptions or forget a critical case. That weakness does not prevent replacing some tests and code reviews with formal methods since all these techniques have the weakness. If anything formal specifications that are slightly more expressive than individual test cases or review objectives are slightly closer to informal speech and could be argued to have a slightly lower and therefore slightly less risky translation step.
  3. The first question is even less original than the middle one. We gain confidence in the verifier with tests and code reviews. They didn't suddenly stop working you know.

Jumping ahead to the conclusion of the Skein-256 verification

All the techniques to deploy have not been exposed yet in this blog but now that the thread gives a good idea of how absence of run-time errors can be verified in very general circumstances using the value analysis I thought I would hint at the final results. One of the techniques to deploy is (as of November 2010) the next Frama-C release which contains many optimizations and performance improvements so there is no hurry for me to go into the details (yet).

Following roughly the same method we have verified in-house that there wasn't any alarm when calling Skein_256_Update an arbitrary number of times between Skein_256_Init and Skein_256_Final using for each call to Skein_256_Update a value n for the length and an input buffer of length exactly n with for each call an arbitrary independent n between 1 and 128.

The question of what has been proved is even more acute than previously. The above informal specification could have several interpretations. Let us just say that one part of the main function used for the verification looks like this:

... 
while(Frama_C_interval(0 1)) 
  { 
    { 
      int c = Frama_C_interval(0 1); 
      if (c) 
        { 
          u08b_t msg[1]; 
          arbitrarize_msg(msg  1); 
          Skein_256_Update( &skein_context  msg  1); 
          continue; 
        } 
    } 
    { 
      int c = Frama_C_interval(0 1); 
      if (c) 
        { 
          u08b_t msg[2]; 
          arbitrarize_msg(msg  2); 
          Skein_256_Update( &skein_context  msg  2); 
          continue; 
        } 
    } 
    ... 

In the verification described in detail until now we re-use the same buffer for each call to Skein_256_Update. This is a small weakness in the verification plan: there could be a bug that happens only with two calls with two different buffers. For instance Skein_256_Update could memorize the address of the buffer passed the first time and access it on the second call when the buffer may no longer exist. It goes to show that there is some validity to the question "what have you really proved?" but in this example as often the problem is with the ambiguous informal specification and not with the formal method itself.

Using local variables as in the above C code snippet the possibility of such a breach can be eliminated: we cannot prevent function Update to keep trace of a previous input buffer in its state but if it then accessed it or even just compared it to another address this would be detected as an operation on a dangling pointer and an alarm would be emitted.

Digression: actually we can also prevent function Update to keep trace of a previous input buffer thanks to secondary analyses derived from the value analysis. One of them computes the set of locations written to by a C function. With this analysis we can list the entire set of memory locations that Update uses as state and check that only information that should be kept is. Here is a future blog post on this topic.

With the future Frama-C release the verification takes about 5h and 150MiB of memory on a computer that was near top-of-the-line in 2006 (Intel Xeon 5150). Since we are speaking of performance this analysis is single-threaded. With the memory footprint that it now has there is hope that it will get a bit faster in time as a larger and larger proportion of the working set fits in lower and lower levels of memory cache. A parallel value analysis applicable to some verification plans (but untried for this one) is also at the proof-of-concept stage. Where applicable this parallel analysis will allow to take advantage of multicore processors and computation farms.

Pascal Cuoq
22nd Nov 2010