When is it valid to compare two pointers in C?

Pascal Cuoq - 14th Apr 2011

This post is about the circumstances in which the value analysis considers comparing pointers is safe, and those in which it considers the comparison is dangerous and emits an alarm. The alarm, an enigmatic assert \pointer_comparable(…, …);, uses an unaxiomatized ACSL predicate. If you use the value analysis regularly, you probably already encountered it (but perhaps only as a false alarm caused by earlier approximations). This posts informally describes the rules applied by the value analysis. It is by these rules false alarms should be sorted from true ones, although a specific programmer may wish the rules were either stricter or relaxed a bit.

If it were axiomatized, the predicate \pointer_comparable should capture all the conditions expressed in this post.

Note: this discussion takes place in the usual working conditions of the value analysis: we know and assume a bit about the architecture: size of integer and pointer types, NULL is zero, …

Pointers

The expression &a < &b is dangerous. It clearly can evaluate to 0 or 1 depending on compiler choices the programmer has no control over. I would have guessed that this constituted unspecified behavior (i.e. the expression evaluates to either 0 or 1). In fact, according to paragraph 6.5.8.5 in the C99 standard, it is undefined behavior (anything can happen, including a segmentation fault or the program displaying aberrant behavior from that point onwards).

Expression &a + 1 == &b is similarly dangerous: variable b may have been placed just after a in memory, or it may not. The comparison may just, in this case, be unspecified: this is my reading of paragraph 6.5.9.6.

Pointers converted to integers

The value analysis employs the same rule for comparisons between expressions that have pointer types and for comparison between expressions that have unsigned integer types but happen to contain addresses. Signed integer inequalities between values that happen to be addresses should be considered dangerous more often, since variables can be placed at will in memory by the compiler. Let me clarify what I mean with an example:

int t[10]; 
main(){ 
  return (unsigned int)t <= (unsigned int)(t+8); 
} 

In Frama-C's default target architecture, an `unsigned int` can contain a pointer to int. The comparison above is deemed safe, and always evaluates to 1 (true), same as the pointer comparison t<=(t+8).

Compare to this slightly different example:

int t[10]; 
main(){ 
  return (int)t <= (int)(t+8); 
} 

The comparison above is dangerous and should be flagged as such by the value analysis, because the array t could have been placed by the compiler at address 0x7ffffffa. That t could, by sheer bad luck, be placed at this address means that (int)(t+8) could be negative while (int)t is positive in the value analysis' permissive model of memory and values. The result of this comparison depending in this way on compiler choices is undesirable, hence the warning.

You can force the value analysis to take into account the possibility that the dangerous comparison (int)t <= (int)(t+8) returns 0 (false) with option -undefined-pointer-comparison-propagate-all. This is exactly what this option is for.

Let's move on from signed integer inequalities: you can really never do anything safe except (int)&a <= (int)&a. Here is another example:

int t[10]; 
main(){ 
  return 0 < (unsigned int)t; 
} 

In the value analysis modelization, this is a contrived way to test whether t is different from NULL, and is therefore always true. This comparison is considered safe.

int t[10]; 
main(){ 
  return t-1 <= t; 
} 

This comparison is dangerous: the address 0 is reserved for NULL, but nothing prevents the compiler from placing t at address 2. Doing so means that the comparison, assumed to be implemented with the unsigned comparison assembly instruction, can return false. It returns true for most other allocations of t, and is therefore unsafe.

Technically, computing t - 1 is undefined behavior in C99. The value analysis waits until this value is used in a comparison before warning about it. This is only one of the various places where it is more lenient than the standard. The rationale is that the value analysis is intended for verification of existing programs. Existing programs do this (you wouldn't believe how often!). It would be very easy to warn at the slightest sign that the program is computing t-1, and this would in fact make a lof of things simpler (including this blog post). We are waiting for an industrial customer to ask for this specific feature, and we anticipate their asking, once it's done, how they can disable it locally.

int t[10]; 
main(){ 
  return 2 <= (unsigned int)t; 
} 

The comparison above, and 2 == (unsigned int)t, are both unsafe because, while they usually respectively return 1 and 0, they might return 0 and 1 at run-time for some rare compilations.

String literals

The comparison \tutu" == "tutu" (or if you prefer the comparison s == t when s and t have been defined as char *s = "tutu"; char *t = "tutu";) is dangerous: the compiler may place the anonymous string constants anywhere in memory and factor them at its option. Again in this case my reading of the standard is that this comparison is only unspecified but the development version of the value analysis emits an alarm for it nevertheless.

char *s = "tutu"; char *t = s; s == t is not dangerous: it should always evaluate to 1.

"hello" == "world" is not dangerous: the compiler has no opportunity to factor the chains and thus the comparison predictably evaluates to 0. "hello" == "hello world" is not dangerous either: the compiler cannot factor the two strings because the first one must be zero-terminated. The terminating zero clashes with the space in the second string.

"hello" == "hello\000world" is dangerous: the compiler can factor the strings making the comparison return 1. It may also not factor them making the comparison return 0.

Finaly consider the snippet below:

  char *s = "oh  hello"; 
  char *s4 = s+4; 
  char *t = "hello\000hello"; 

In this context s4 == t + 6 is harmless because although s4 points to the string "hello" the compiler was contrained by the requirement to place "oh " before it and could not place it at t + 6. This comparison always return 0. The comparison s4 == t on the other hand is dangerous because the compiler could have factored the tail of s with the head of t.

State of value analysis implementation

The rules described in this post are only implemented approximately in Carbon. They will work exactly as described here in the next Frama-C release.

Pascal Cuoq
14th Apr 2011