zlib progress: one comma misused

Pascal Cuoq - 18th Dec 2012

A few days ago I announced that the world had been using an unverified zlib library for too long and that we were going to fix this. This post is the first progress report. I have found a harmless undefined behavior in zlib and I have learnt something about the comma operator. This post only covers the latter aspect.

The comma operator

The comma operator can be used to build an expression e1 e2. When e1 e2 is evaluated first e1 is evaluated (for its result and side-effects) then the result is discarded and e2 is evaluated.

The comma counts as a sequence point. Sequence points are the points in a C program in-between which it is illegal to both read and write the same memory location; for instance x++ + x is undefined behavior because there is no sequence point between the post-incrementation x++ and the use of the value of x. Coming back to the comma operator since it counts as a sequence point I thought this might allow writing (x++ y) + (y++ x) but I am not so sure about that now.

Easing the value analysis' job on zlib

Formally verifying such a library as zlib is difficult. Indeed otherwise considering its usefulness and fame if it was easy someone would already have advertized that they had formally verified it.

One verification plan that I wanted to try out on this library is the model-checking approach that we first experimented with last summer.

The words “model-checking” may evoke different things to different people. In this context they mean the propagation of only unabstracted known-to-be-feasible states so that if a division by zero is reached in the program we know for sure that this dangerous division can be reached for some inputs. In other words we are talking here about a technique to avoid false positives in addition to avoiding false negatives as we always do.

There is a catch: the propagation of unabstracted states does not really scale to a codebase the size and complexity of zlib. My plan was to ease the job of Frama-C's value analysis by carefully re-introducing a little bit of abstraction. I intended to modify the zlib library so that the modified version has all the same opportunities to do something wrong with pointers as the original version but the modified version always computes zero as output sequence. This would make many memory states that would otherwise have been different become identical something that the value analysis efficiently recognizes and takes advantage of.

A beginner's mistake

And this is how I found myself modifying *output_pointer++ = expression_with_side_effects; so that it would only write 0 to the output buffer but otherwise have all the same side-effects as the original. I chose to replace it with the statement below. And I was pretty pleased with myself.

  *output_pointer++ = expression_with_side_effects  0; 

Unfortunately after one additional cycle of verification it turned out that in the C syntax assignment has higher precedence than comma. My modified statement was parsed as:

  (*output_pointer++ = expression_with_side_effects)  0; 

It behaved exactly as the original statement in assigning non-zero values to the output buffer. I had instead intended:

 *output_pointer++ = (expression_with_side_effects 0); 
Pascal Cuoq
18th Dec 2012