Balban <(E-Mail Removed)> writes:

> On my compiler (gcc), if I add an integer value to a void pointer the

> integer is interpreted as signed instead of unsigned. Is this expected

> behavior?
I don't think that's what's happening.

As has already been mentioned, arithmetic on void* is a gcc-specific

extension; in standard C, it's a constraint violation, requiring a

diagnostic.

But the same thing applies to arithmetic on char*, which is well

defined by the standard.

Adding a pointer and an integer (p + i) yields a new pointer value

that points i elements away from where p points. For example, if p

points to the element 0 of an array, then (p + 3) points to element 3

of the same array. If p points to element 7 of an array, then (p - 2)

points to element 5 of the same array.

It would have been helpful if you had shown us an example of what

you're talking about. But suppose we have:

char arr[10];

char *p = arr + 5;

int i = -1;

unsigned int u = -1;

Let's assume a typical system where int and pointers are 32 bits.

So p points to arr[5]. The expression (p + i) points to arr[4].

But consider (p + u).

Since u is unsigned, it can't actually hold the value -1. During

initialization, that value is implicitly converted from signed

int to unsigned int, and the value stored in u is 4294967295.

In theory, then, (p + u) would point to arr[4294967300], which

obviously doesn't exist. So the behavior is undefined, if you try

to evaluate (p + u), anything can happen.

What probably will happen on typical modern systems is that the

addition will quietly wrap around. Let's assume that pointer values

are represented as 32-bit addresses that look like unsigned integers

(nothing like this is required by the standard, but it's a typical

implementation), and let's say that arr is at address 0x12345678.

Then p points to address 0x1234567d, and (p + 4294967295) would

theoretically point to address 0x11234567c. But this would require 33

bits, and we only have 32-bit addresses. Typically, an overflowing

addition like this will quietly drop the high-order bit(s) yielding an

address of 0x1234567c -- which just happens to be the address of

arr[4].

So you initialized u with the value -1, computed (p + u), and

got the same result you would have gotten for (p + (-1)). But in

the process, you generated an intermediate result that was out of

range, resulting in undefined behavior. (This is really the worst

possible consequence of undefined behavior: having your program

behave exactly as you expected it to. It means your code is buggy,

but it's going to be very difficult to find and correct the problem.)

This kind of thing is very common with 2's-complement systems. The

2's-complement representation is designed in such a way that addition

and subtraction don't have to care whether the operands are signed or

unsigned. But you shouldn't depend on this. The behavior of addition

and subtraction operations, either on integers or on pointers, is well

defined only when the mathematical result is within the required

range. Adding 0xFFFFFFFF to a pointer can appear to work "correctly",

as if you had really added -1, but it's better to just add a signed

value -1 in the first place.

Even if your code never runs on anything other that the system you

wrote it for, an optimizing compiler may assume that no undefined

behavior occurs. For example, if you write (p + u), it can assume

that p is in the range 0 to 5, and perform optimizations that depend

on that assumption.

--

Keith Thompson (The_Other_Keith)

(E-Mail Removed) <http://www.ghoti.net/~kst>

Nokia

"We must do something. This is something. Therefore, we must do this."

-- Antony Jay and Jonathan Lynn, "Yes Minister"