Iterating over the AST

Virgile Prevosto - 21st May 2012

Context

A facetious colleague who claims that he should be better writing his thesis but keeps committing Coq and OCaml files on Frama-C's repository, asked me the following question: Is there a function in Frama-C's kernel that can fold[1] a function over all types that appear in a C program?

As it turns out nothing is readily available for this exact purpose but another facetious colleague could have come up with a terse answer: There's a visitor for that! Now we can be a bit more helpful and actually write that visitor. Our end goal is to come up with a function

 
fold_typ: (Cil_types.typ -> 'a -> 'a) -> 'a -> 'a 

so that fold_typ f init will be f t_1 (f t_2 (... (f t_n init)...)) where the t_i are all the C types appearing in a given C program (in no particular order).

Frama-C's visitors

The visitor pattern is a well-known object-oriented design pattern that can be used to perform a given action over all nodes of a complex data structure (in our case the Abstract Syntax Tree -AST for short- of a C program). Frama-C provides a generic visitor mechanism built upon CIL's visitor whose entry points can be found in the aptly named src/kernel/visitor.mli file. It is also documented in section 5.14 of the developer manual. In summary you first define a class that inherits from the generic visitor and overrides the methods corresponding to the nodes you're interested in (in our case this will be vtype for visiting uses of Cil_types.typ). Then you apply an object of this class to the function from the Visitor module that correspond to the subpart of the AST that you want to inspect (in our case this will be the whole AST).

Basic version

The standard visitor does not provide anything to return a value outside of the AST. In fact all the entry points in Visitor return a node of the same type that the node in which the visit starts (for visitors that don't perform code transformation this is in fact physically the same node). But if you accept to sneak in a little bit of imperative code into your development -and by using Frama-C you've already accepted that- there is an easy way out: pass to the visitor a reference that it can update and you just have to read the final value that reference is holding after the visit. The visitor then looks like the following:

 
class fold_typ_basic f acc = 
object 
  inherit Visitor.frama_c_inplace 
  method vtype ty = acc:= f ty !acc; Cil.DoChildren 
end 

And that's it. Each time the visitor sees a type it will apply f to ty and the result of the previous computations stored in acc. fold_typ then just needs to call the visitor over the whole AST and give it a reference initialized with init:

 
let fold_typ f init = 
  let racc = ref init in 
  let vis = new fold_typ_basic f racc in 
  Visitor.visitFramacFileSameGlobals vis (Ast.get()); 
  !racc 

Don't do the same work twice

This first version that is barely more than 10 LoC works but we can do a little better. Indeed f will be called each time a type is encountered in the AST. In most cases we want to call f once for any given type. This can be done quite simply by memoizing in an instance variable of our visitor the set of types encountered thus far. Frama-C's Cil_datatype module (cil/src/cil_datatype.mli) provides all the needed functions for that:

 
class fold_typ f acc = 
object 
  inherit Visitor.frama_c_inplace 
  val mutable known_types = Cil_datatype.Typ.Set.empty 
  method vtype ty = 
    if Cil_datatype.Typ.Set.mem ty known_types then Cil.DoChildren 
    else begin 
      known_types <- Cil_datatype.Typ.Set.add ty known_types; 
      acc:= f ty !acc; 
      Cil.DoChildren 
    end 
end 

Testing the infrastructure

It is now time to test if everything works smoothly. The following function will print the name and size of the type who has the biggest size in the analyzed program.

 
let test () = 
  let f ty (maxty maxsize as acc) = 
    try 
      let size = Cil.sizeOf_int ty in 
      if size > maxsize then (ty size) else acc 
    with Cil.SizeOfError _ -> acc 
  in 
  let (ty size) = fold_typ f (Cil.voidType 0) in 
  Format.printf "Biggest type is %a @ with size %d@."  
    !Ast_printer.d_type ty size 

Since it is only a quick test we don't do anything special if Cil complains that it cannot compute the size of a given type: we just stick to the maximal value computed so far.

File fold_typ.ml provides the code for the visitor and the test function. frama-c -load-script fold_typ.ml file.c should output something like

 
[kernel] preprocessing with "gcc -C -E -I.  file.c" 
Biggest type is struct Ts  
with size 44 

Exercises

  1. How can you use the fold_typ class to define an iter_typ function that apply a function f returning unit to each type of the AST (val iter_typ: (Cil_types.typ -> unit) -> unit)?
  2. Writing fold_typ is a bit overkill if you're going to apply it once in your plug-in. Write a specialized visitor that will do the same thing as the test function above.

Notes

[1] For those who are not familiar with functional programming: http://en.wikipedia.org/wiki/Fold_%...

"
Virgile Prevosto
21st May 2012