Short, difficult programs

Pascal Cuoq - 2nd Nov 2012

When researchers start claiming that they have a sound and complete analyzer for predicting whether a program statement is reachable, it is time to build a database of interesting programs.

Goldbach's conjecture

My long-time favorite is a C program that verifies Goldbach's conjecture (actual program left as an exercise to the reader).

If the conjecture is true the program never terminates (ergo a statement placed after it is unreachable). If the conjecture is false the program terminates and moves on to the following statement. The program can be implemented using unsigned long long integers that should be good for counter-examples up to 18446744073709551615. It can alternately use dynamic allocation and multi-precision integers in which case depending whether your precise definition of “analyzing a C program” includes out-of-memory behaviors you could claim that the reachability of the statement after the counter-example-finding loop is equivalent to the resolution of “one of the oldest and best-known unsolved problems in number theory and in all of mathematics”.

Easier programs than that

No-one expects the resolution of Goldbach's conjecture to come from a program analyzer. This example is too good because it is out of reach for everyone—it has eluded our best mathematicians for centuries. What I am looking for in this post is easier programs where the solution is just in reach. Examples that it would genuinely be informative to run these sound complete analyzers on. If they were made available.

With integers

For 32-bit ints and 64-bit long longs I know that the label L is unreachable in the program below but does your average sound and complete program analyzer do?

/*@ requires 2 <= x ; 
    requires 2 <= y ; */ 
void f(unsigned int x  unsigned int y) 
{ 
  if (x * (unsigned long long)y == 17) 
    L: return; 
} 

The key is that with the aforementioned platform hypotheses the multiplication does not overflow. Label L being reached would mean that the program has identified divisors of the prime number 17 which we don't expect it to.

In the program below the multiplication can overflow and not having tried it I have truly no idea whether the label L is reachable. I expect it is statistically but if you told me that it is unreachable because of a deep mathematical property of computations modulo a power of two I would not be shocked.

/*@ requires 2 <= x ; 
    requires 2 <= y ; */ 
void f(unsigned long long x  unsigned long long y) 
{ 
  if (x * y == 17) 
    L: return; 
} 

With floating-point

Floating-point is fun. Label L in the following program is unreachable:

/*@ requires 10000000. <= x <= 200000000. ; 
    requires 10000000. <= y <= 200000000. ; */ 
void sub(float x  float y) 
{ 
  if (x - y == 0.09375f) 
    L: return; 
} 

Frama-C's value analysis and most other analyzers will not tell you that label L is unreachable. It definitely looks reachable The difference between x and y can be zero and it can be 1.0. It looks like it could be 0.09375 but it cannot: the subtracted numbers are too large for the difference if non-zero to be smaller than 1.0.

So the subtlety in the example above is the magnitude of the arguments. What about smaller arguments then?

/*@ requires 1.0 <= x <= 200000000. ; 
    requires 1.0 <= y <= 200000000. ; */ 
void sub(float x  float y) 
{ 
  if (x - y == 0.09375f) 
    L: return; 
} 

This time the label L is easily reachable for instance with the inputs 2.09375 for x and 2.0 for y.

What about this third program?

/*@ requires 1.0 <= x <= 200000000. ; 
    requires 1.0 <= y <= 200000000. ; */ 
void sub(float x  float y) 
{ 
  if (x - y == 0.1f) 
    L: return; 
} 

The target difference 0.1f is larger than 0.09375f so it should be easier to obtain as the difference of two floats from the prescribed range right? In fact it isn't. The numbers 0.099999904632568359375 and 0.10000002384185791015625 can be obtained as values of x-y in the above program for the former as the result of picking 1.099999904632568359375 for x and 1.0 for y. The value 0.1f on the other hand cannot be obtained as the subtraction of two floats above 1.0 because some bits are set in its significand that cannot be set by subtracting floats that large.

Conclusion

I expect it will be some time before automatic analyzers can soundly and completely decide in all two-line programs whether the second line is reachable. Frama-C's value analysis does not detect unreachability in any of the programs above for instance (the value analysis is sound so it soundly detects that L may be reachable when it is). Please leave your own difficult two-line programs in the comments.

The floating-point examples in this post owe quite a bit to my colleague Bruno Marre explaining his article “Improving the Floating Point Addition and Subtraction Constraints” to me over lunch.

Pascal Cuoq
2nd Nov 2012