Compiler-driven language development

Pascal Cuoq - 17th Nov 2012

A quiz

What is pressing “return” next below going to reveal GCC has done wrong?

$ cat t.c 
#include <limits.h> 
int x = -1; 
int main(int c, char **v) { 
  x = INT_MIN % x; 
  return x; 
} 
~ $ gcc -std=c99 t.c 
~ $ ./a.out 

Answer:

$ ./a.out  
Floating point exception 

The programmer was using an x86_64 processor

GCC has compiled a perfectly well-defined C99 program for returning 0 into binary code that makes an error. The error is misdiagnosed as a floating point exception but is actually a division overflow. It happens because computing the C99 remainder operation % on this computer invokes the x86_64 instruction idivl, which can raise this error when invoked with the arguments passed to it in this program. The idivl instruction computes a quotient and a remainder at the same time; the overflow error relates to the computation of the quotient, which my program was going to throw away anyway.

My program was well-defined in C90, too. In both these versions of the C standard, the semantics of %, like other arithmetic operations, are described informally as first computing the result as unbounded integer and then, since the result 0 fits the signed destination type int, the operation is defined and the result is 0.

Over the period of time it has been generating code for x86 computers, GCC could have become standard-compliant by adding the couple of instructions necessary to compute the standard-mandated result. The most efficient sequence I see would be to test the divisor for equality with -1 and to conditionally move if equal the divisor into the dividend. This would compute -1 % -1 instead of dividend % -1, always producing the same result 0 without the need for an expensive branch.

GCC would not even have needed to generate this code all the time. Most of the times the divisor is statically known not to be -1 or the dividend is statically known not to be INT_MIN. In either case the guarding test is unnecessary.

To be fair, GCC can generate sophisticated assembly code when it exerts itself. If a program uses both dividend % divisor and dividend / divisor nearby in the same function, the generated assembly code is likely to call the division instruction idivl only once. My proposed conditional move would interfere with this optimization when it applies.

What it must be like to standardize a language like C

Good news! GCC held out, and now (in the C11 standard), INT_MIN % -1 is officially undefined behavior, so GCC is allowed to do what it wants. This goes to show two things:

  1. Revisions of the C standard are carefully crafted to allow as many compliant programs as possible to continue to be considered compliant, but this is only one of several conflicting objectives. My program is C99-compliant and is undefined in C11.
  2. Sometimes the standard defines the future direction of the language (e.g. the memory model part of C11) but sometimes it only ratifies whatever non-compliant semantics compilers have been implementing for years.
Pascal Cuoq
17th Nov 2012