Unspecified behaviors and derived analyses, part 2

Pascal Cuoq - 4th Dec 2010

Context

This post is a sequel and conclusion to this remark.

Example of derived analysis: slicing

When writing a Frama-C plug-in to assist in reverse-engineering source code it does not really make sense to expect the user to check the alarms that are emitted by the value analysis. Consider for instance Frama-C's slicing plug-in. This plug-in produces a simplified version of a program. It is often applied to large unfamiliar codebases; if the user is at the point where he needs a slicer to make sense of the codebase it's probably a bad time to require him to check alarms on the original unsliced version.

The slicer and other code comprehension plug-ins work around this problem by defining the results they provide as ``valid for well-defined executions''. In the case of the slicer this is really the only definition that makes sense. Consider the following code snippet:

  x = a; 
  y = *p; 
  x = x+1; 
  // slice for the value of x here. 

This piece of program is begging for its second line to be removed but if p can be the NULL pointer the sliced program behaves differently from the original: the original program exits abruptly on most architectures whereas the sliced version computes the value of x.

It is fine to ignore alarms in this context but the user of a code comprehension plug-in based on the value analysis should study the categorization of alarms in the value analysis' reference manual with particular care. Because the value analysis is more aggressive in trying to extract precise information from the program than other analyses the user is more likely to observe incorrect results if there is a misunderstanding between him and the tool about what assumptions should be made.

Example of benign alarm: testing p<=q

Everybody agrees that accessing an invalid pointer is an unwanted behavior but what about comparing two pointers with <= in an undefined manner or assuming that a signed overflow wraps around in 2's complement representation? Function memmove for instance typically does the former when applied to two addresses with different bases.

Currently if there appears to be an undefined pointer comparison the value analysis propagates a state that contains all possible results for the comparison and for the pointers. This blog post was just trying to scare you. It is possible to take advantage of the value analysis for program comprehension and all existing program comprehension tools need to make assumptions about undefined behaviors. Most tools do not tell you if they had to make assumptions or not. The value analysis does: each alarm in general is also an assumption. Still as implementation progresses and the value analysis becomes able to extract more information from the alarms it emits one or several options to configure it either not to emit some alarms or not to make the corresponding assumptions will certainly become necessary.

Pascal Cuoq
4th Dec 2010