Velocity Reviews > Re: Pointer Arithmetic & UB

# Re: Pointer Arithmetic & UB

James Kuyper
Guest
Posts: n/a

 12-10-2012
On 12/10/2012 12:14 PM, Edward Rutherford wrote:
> Hello
>
> Would the following code invoke an undefined behavior?

"invoke" is a bad term to use for this purpose; it implies that there's
some particular kind of behavior which is called "undefined behavior".
You should ask "Would the following code have undefined behavior?"

> char a[10];
> size_t i=20,j=15;
> *(a+i-j)=42;
>
> It potentially constructs the invalid pointer a+i as an intermediate
> value. But overall the access is inbounds.

Yes, it does have undefined behavior.
To make this seem more reasonable, consider a platform with the
following real-world characteristics: there are registers specialized
for storing addresses, and when an invalid address is stored in one of
those registers, the current process aborts immediately, as a safety
measure - it doesn't wait for the invalid address to be used. On such an
implementation, a conforming implementation could translate your code so
that 'a' is allocated near the end of a block of valid memory addresses,
so that adding 20 to a gives an invalid address. It could generate
it. Execution of those instructions would result in the register

Eric Sosman
Guest
Posts: n/a

 12-10-2012
On 12/10/2012 12:55 PM, James Kuyper wrote:
> On 12/10/2012 12:14 PM, Edward Rutherford wrote:
>> Hello
>>
>> Would the following code invoke an undefined behavior?

>
> "invoke" is a bad term to use for this purpose; it implies that there's
> some particular kind of behavior which is called "undefined behavior".
> You should ask "Would the following code have undefined behavior?"
>
>> char a[10];
>> size_t i=20,j=15;
>> *(a+i-j)=42;
>>
>> It potentially constructs the invalid pointer a+i as an intermediate
>> value. But overall the access is inbounds.

>
> Yes, it does have undefined behavior.
> To make this seem more reasonable, consider a platform with the
> following real-world characteristics: [...]

A colleague who did some work on IBM's AS/400 (they've
changed the name; I forget the new one) told me that simply
trying to calculate an out-of-range pointer yielded a null
pointer as a result. In the O.P.'s case, the intermediate
steps would go something like

a // OK so far
a + i // too big: result = NULL
NULL - j // not sure, but surely not good
*(NULL - j) // really Really REALLY not good

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d

Edward A. Falk
Guest
Posts: n/a

 12-11-2012
In article <ka59e6\$mq7\$(E-Mail Removed)>,
Eric Sosman <(E-Mail Removed)> wrote:
>
> A colleague who did some work on IBM's AS/400 (they've
>changed the name; I forget the new one) told me that simply
>trying to calculate an out-of-range pointer yielded a null
>pointer as a result.

Heh; learn something new every day. I never would have guessed
that there was an actual architecture that would blow up with
this construct.

I assume that *(a+(i-j)) would be ok?

--
-Ed Falk, (E-Mail Removed)
http://thespamdiaries.blogspot.com/

James Kuyper
Guest
Posts: n/a

 12-11-2012
Context:
char a[10];
size_t i=20,j=15;
*(a+i-j)=42;

On 12/10/2012 07:46 PM, Edward A. Falk wrote:
....
> Heh; learn something new every day. I never would have guessed
> that there was an actual architecture that would blow up with
> this construct.
>
> I assume that *(a+(i-j)) would be ok?

That should be safe for all conforming implementations of C.
--
James Kuyper

Noob
Guest
Posts: n/a

 12-11-2012
Edward A. Falk wrote:

> I assume that *(a+(i-j)) would be ok?

Please correct me if I am wrong,

*(a+(i-j)) is strictly equivalent to a[i-j]

(I find the latter clearer.)

Eric Sosman
Guest
Posts: n/a

 12-11-2012
On 12/10/2012 7:46 PM, Edward A. Falk wrote:
> In article <ka59e6\$mq7\$(E-Mail Removed)>,
> Eric Sosman <(E-Mail Removed)> wrote:
>>
>> A colleague who did some work on IBM's AS/400 (they've
>> changed the name; I forget the new one) told me that simply
>> trying to calculate an out-of-range pointer yielded a null
>> pointer as a result.

>
> Heh; learn something new every day. I never would have guessed
> that there was an actual architecture that would blow up with
> this construct.
>
> I assume that *(a+(i-j)) would be ok?

Assuming `i-j' in range, yes.

More on my colleague's tale: The code maintained a buffer
in which items of various sizes accumulated, and which drained
to disk when it got too full or too old. To decide whether a
newly-offered item would fit, the code did something like

itemEndPtr = nextBufferSpacePtr + itemSize;
if (itemEndPtr < bufferStart + bufferSize) ...

This worked as intended on all the other target systems, but
failed on AS/400. I suspect the failure had something to do
with the fact that the buffer was in a shared memory area, so
stepping off the end also meant stepping outside of mapped
address space; the problem might not have shown up with the

Still, perhaps a salutary lesson for the folks who still
believe "All the world's a VAX^H^H^Hx86^H^H^Hx64^H^H^H..."

--
Eric Sosman
(E-Mail Removed)d

glen herrmannsfeldt
Guest
Posts: n/a

 12-11-2012
Eric Sosman <(E-Mail Removed)> wrote:

(previous snip on pointer offsets)

>>> A colleague who did some work on IBM's AS/400 (they've
>>> changed the name; I forget the new one) told me that simply
>>> trying to calculate an out-of-range pointer yielded a null
>>> pointer as a result.

>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.

>> I assume that *(a+(i-j)) would be ok?

> Assuming `i-j' in range, yes.

> More on my colleague's tale: The code maintained a buffer
> in which items of various sizes accumulated, and which drained
> to disk when it got too full or too old. To decide whether a
> newly-offered item would fit, the code did something like

> itemEndPtr = nextBufferSpacePtr + itemSize;
> if (itemEndPtr < bufferStart + bufferSize) ...

Might fail in x86 (especially the 80286) in huge model.

You can't load arbitrary data into segment selector registers
in protected mode x86. In large mode, though, any offset isn't
tested until an actual access is attempted. (The offset is in
an ordinary register, such as AX.)

In huge model, the system allocates a series of segments,
such that the one can address through them in order.

Still, I believe that the compilers are careful not to load
a segment selector until needed to actually access something,
maybe partly to allow such faulty C code.

> This worked as intended on all the other target systems, but
> failed on AS/400. I suspect the failure had something to do
> with the fact that the buffer was in a shared memory area, so
> stepping off the end also meant stepping outside of mapped
> address space; the problem might not have shown up with the
> `auto' array in your example.

I believe that could happen with protected mode x86, too.

> Still, perhaps a salutary lesson for the folks who still
> believe "All the world's a VAX^H^H^Hx86^H^H^Hx64^H^H^H..."

In the 80286 days, I had OS/2 1.0 and then 1.2 running, when
malloc(), I would directly allocate segments from OS/2 of exactly
the needed length. The hardware will then interrupt for an access,
even read, either before or just after the end of the allocated
space. (Unless the register wraps, and it is back into the
allocated space again.)

As usual in C, a 2D array was allocated as an array of pointers,
each pointing to its own OS/2 allocated segment.

Fortunately, the C compilers were always good at not using segment
selector registers when copying pointers that might not point to
anything.

I don't know AS/400 that well, but there have been systems that relied
on the compiler to generate the appropriate code, instead of run-time
memory protection. I believe some Burroughs ALGOL systems worked that
way. (Maybe still do.)

As far as I know, they never had a C compiler, but if one did it might
also have problems with out of range pointers.

-- glen

Ken Brody
Guest
Posts: n/a

 12-12-2012
On 12/10/2012 7:46 PM, Edward A. Falk wrote:
> In article <ka59e6\$mq7\$(E-Mail Removed)>,
> Eric Sosman <(E-Mail Removed)> wrote:
>>
>> A colleague who did some work on IBM's AS/400 (they've
>> changed the name; I forget the new one) told me that simply
>> trying to calculate an out-of-range pointer yielded a null
>> pointer as a result.

>
> Heh; learn something new every day. I never would have guessed
> that there was an actual architecture that would blow up with
> this construct.
>
> I assume that *(a+(i-j)) would be ok?

No. There is no requirement that the value of "i-j" be calculated prior to
parentheses to "fix" UB in things involving such constructs as "i + (i++)".)
Operator precedence only guarantees how the expression is to be
interpreted, not the actual order of evaluation.

Ken Brody
Guest
Posts: n/a

 12-12-2012
On 12/10/2012 9:28 PM, James Kuyper wrote:
> Context:
> char a[10];
> size_t i=20,j=15;
> *(a+i-j)=42;
>
> On 12/10/2012 07:46 PM, Edward A. Falk wrote:
> ...
>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.
>>
>> I assume that *(a+(i-j)) would be ok?

>
> That should be safe for all conforming implementations of C.

Are you sure? Does anything in the Standard *require* that "i-j" be
evaluated prior to adding it to "a"?

Haven't we had this discussion earlier, related to other forms of UB, with

Keith Thompson
Guest
Posts: n/a

 12-12-2012
Ken Brody <(E-Mail Removed)> writes:
> On 12/10/2012 7:46 PM, Edward A. Falk wrote:
>> In article <ka59e6\$mq7\$(E-Mail Removed)>,
>> Eric Sosman <(E-Mail Removed)> wrote:
>>>
>>> A colleague who did some work on IBM's AS/400 (they've
>>> changed the name; I forget the new one) told me that simply
>>> trying to calculate an out-of-range pointer yielded a null
>>> pointer as a result.

>>
>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.
>>
>> I assume that *(a+(i-j)) would be ok?

>
> No. There is no requirement that the value of "i-j" be calculated prior to
> adding it to "a". (Check the numerous threads here involving using
> parentheses to "fix" UB in things involving such constructs as "i + (i++)".)
> Operator precedence only guarantees how the expression is to be
> interpreted, not the actual order of evaluation.

True, but the expression `a+(i-j)` is evaluated *in the abstract
machine* by subtracting j from i and then adding the result to a.
A compiler is free to evaluate it by computing a+i and then
subtracting j from the result *only* if it can guarantee that the
result is the same, or if the canonical order has undefined behavior.

`INT_MAX + (1 - 1)` has well defined behavior.
`INT_MAX + 1 - 1` does not.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"