The OCaml compiler does have some nice optimizations

Pascal Cuoq - 26th Aug 2011

Many OCaml programmers use it because it offers a reasonable (to them) compromise between expressivity and control over resources use. Serious OCaml users are often heard complaining about relatively simple optimizations that the compiler does not have. But this reveals as much of the kind of programmers that end up using OCaml as it does of OCaml itself.

Case in point: the value analysis in Boron used to have a type similar to:

 type t = bool * bool * v 

If this was C, the definition above would be a struct. Here, bool is the predefined type that you expect, v is the type of values a C expression can have in the value analysis, and the type t being defined is the type of values as found in memory. A value in memory can be a normal value (the v part) or it may have reason to be uninitialized (the first bool) or it may be a dangling pointer (the second bool). During analysis, a value in memory can be all of the above at once, as is variable y after the following snippet:

 void f(void) 
 { 
   int* y; 
   if (unknown condition 1) 
     y = NULL; 
   else if (unknown condition 2) 
   { 
     int l; 
     y = &l; 
   } 
   ... 
 } 

After the first instructions of function f, variable y is either uninitialized or a dangling pointer or the normal value NULL. The value analysis \uninitialized" flag can also be seen in action in this post.

An OCaml value of the above type t occupies four words in memory: one header word for automatic memory management and one for each field. Each field occupies one word to make memory management genericity etc. easier.

In the period between Boron and Carbon I decided to make the representation of type t a bit more compact (this is one of several memory use improvements I boasted about in this blog). The two booleans can be represented as bits inside a same integer; this would save one word. I decided to go further and in effect to borrow two unused bits from the header word by using the definition below:

 type t = 
     C_uninit_escaping of v 
   | C_uninit_noescaping of v 
   | C_init_escaping of v 
   | C_init_noescaping of v 

This type definition is no longer like a C struct. It is more like an union an union of v or v or v or v but a discriminated union where it's possible to recognize what kind of v one is talking about. C forces the programmer to manage eir own tag. In OCaml some bits are reserved in the bloc header for the tag. A value of the new type t always occupies two words one word for the header (including some bits for the tag) and one word for the contents of type v.

During analysis when the C program gets a value from memory and uses it in a computation it is necessary to get the part of type v from a value of type t. With the old definition the function was:

 let get_v (_  _  v) = v 

With the new definition the same function get_v would be:

 let get_v =  
   function  
      C_uninit_escaping  v 
    | C_uninit_noescaping v 
    | C_init_escaping v 
    | C_init_noescaping v -> v 

I was afraid that this would be compiled into an ugly 4-case test so I used a different implementation based on low-level OCaml primitives that are not supposed to be used:

 let get_v : t -> v = fun x -> Obj.obj (Obj.field (Obj.repr x) 0) 

It is ugly too but my hope was that the compiled code would be better. It literally means "forget the type of the argument x (something you are never supposed to do in OCaml) and get the first field of that whatever that is". I was so used to the compiler not optimizing to my liking that I did not even compare the code for the two versions. I should have known better. Julien Signoles discovered that the compiler generates the following code for the first pure OCaml version:

        movq    (%rax)  %rax 
        ret 

The above is "AT&T" assembly syntax and corresponds to return *arg; in C. It's even better than the generated code for the second low-level version because in the first version the compiler has more type information to rely on. Plus since I am reading the assembly for the whole module I can confirm that the function gets inlined where it is called directly (another optimization that the OCaml compiler does very well).

Pascal Cuoq
26th Aug 2011