The problem with differential testing is that at least one of the compilers must get it right

Pascal Cuoq - 25th Sep 2013

A long time ago, John Regehr wrote a blog post about a 3-3 split vote that occurred while he was finding bugs in C compilers through differential testing. John could have included Frama-C's value analysis in his set of C implementations and then the vote would have been 4-3 for the correct interpretation (Frama-C's value analysis predicts the correct value on the particular C program that was the subject of the post). But self-congratulatory remarks are not the subject of today's post. Non-split votes in differential testing where all compilers get it wrong are.

A simple program to find double-rounding examples

The program below looks for examples of harmful double-rounding in floating-point multiplication. Harmful double-rounding occurs when the result of the multiplication of two double operands differs between the double-precision multiplication (the result is rounded directly to what fits the double format) and the extended-double multiplication (the mathematical result of multiplying two double numbers may not be representable exactly even with extended-double precision so it is rounded to extended-double and then rounded again to double which changes the result).

$ cat dr.c 
#include <stdio.h> 
#include <stdlib.h> 
#include <math.h> 
#include <float.h> 
#include <limits.h> 
int main(){ 
  printf("%d %a %La"  FLT_EVAL_METHOD  DBL_MAX  LDBL_MAX); 
  while(1){ 
    double d1 = ((unsigned long)rand()<<32) + 
                           ((unsigned long)rand()<<16) + rand() ; 
    double d2 = ((unsigned long)rand()<<32) + 
                           ((unsigned long)rand()<<16) + rand() ; 
    long double ld1 = d1; 
    long double ld2 = d2; 
    if (d1 * d2 != (double)(ld1 * ld2)) 
      printf("%a*%a=%a but (double)((long double) %a * %a))=%a"   
	     d1  d2  d1*d2  
	     d1  d2  (double)(ld1 * ld2)); 
  } 
} 

The program is platform-dependent but if it starts printing something like below then a long list of double-rounding examples should immediately follow:

0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 

Results

In my case what happened was:

$ gcc -v 
Using built-in specs. 
Target: i686-apple-darwin11 
... 
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00) 
$ gcc -std=c99 -O2 -Wall dr.c && ./a.out  
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 
^C 

I immediately blamed myself for miscalculating the probability of easily finding such examples getting a conversion wrong or following while (1) with a semicolon. But it turned out I had not done any of those things. I turned to Clang for a second opinion:

$ clang -v 
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn) 
Target: x86_64-apple-darwin12.4.0 
Thread model: posix 
$ clang -std=c99 -O2 -Wall dr.c && ./a.out  
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 
^C 

Conclusion

It became clear what had happened when looking at the assembly code:

$ clang -std=c99 -O2 -Wall -S dr.c && cat dr.s 
... 
	mulsd	%xmm4  %xmm5 
	ucomisd	%xmm5  %xmm5 
	jnp	LBB0_1 
... 

Clang had compiled the test for deciding whether to call printf() into if (xmm5 != xmm5) for some register xmm5.

$ gcc -std=c99 -O2 -Wall -S dr.c && cat dr.s 
... 
	mulsd	%xmm1  %xmm2 
	ucomisd	%xmm2  %xmm2 
	jnp	LBB1_1 
... 

And GCC had done the same. Although to be fair the two compilers appear to be using LLVM as back-end so this could be the result of a single bug. But this would remove all the salt of the anecdote so let us hope it isn't.

It is high time that someone used fuzz-testing to debug floating-point arithmetic in compilers. Hopefully one compiler will get it right sometimes and we can work from there.

Pascal Cuoq
25th Sep 2013