Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Re: Pointer Arithmetic & UB (http://www.velocityreviews.com/forums/t955319-re-pointer-arithmetic-and-ub.html)

James Kuyper 12-10-2012 05:55 PM

Re: Pointer Arithmetic & UB
 
On 12/10/2012 12:14 PM, Edward Rutherford wrote:
> Hello
>
> Would the following code invoke an undefined behavior?


"invoke" is a bad term to use for this purpose; it implies that there's
some particular kind of behavior which is called "undefined behavior".
You should ask "Would the following code have undefined behavior?"

> char a[10];
> size_t i=20,j=15;
> *(a+i-j)=42;
>
> It potentially constructs the invalid pointer a+i as an intermediate
> value. But overall the access is inbounds.


Yes, it does have undefined behavior.
To make this seem more reasonable, consider a platform with the
following real-world characteristics: there are registers specialized
for storing addresses, and when an invalid address is stored in one of
those registers, the current process aborts immediately, as a safety
measure - it doesn't wait for the invalid address to be used. On such an
implementation, a conforming implementation could translate your code so
that 'a' is allocated near the end of a block of valid memory addresses,
so that adding 20 to a gives an invalid address. It could generate
instructions that load 'a' into an address register, then adds 'i' to
it. Execution of those instructions would result in the register
containing an invalid address, thus causing your program being aborted.

Eric Sosman 12-10-2012 06:25 PM

Re: Pointer Arithmetic & UB
 
On 12/10/2012 12:55 PM, James Kuyper wrote:
> On 12/10/2012 12:14 PM, Edward Rutherford wrote:
>> Hello
>>
>> Would the following code invoke an undefined behavior?

>
> "invoke" is a bad term to use for this purpose; it implies that there's
> some particular kind of behavior which is called "undefined behavior".
> You should ask "Would the following code have undefined behavior?"
>
>> char a[10];
>> size_t i=20,j=15;
>> *(a+i-j)=42;
>>
>> It potentially constructs the invalid pointer a+i as an intermediate
>> value. But overall the access is inbounds.

>
> Yes, it does have undefined behavior.
> To make this seem more reasonable, consider a platform with the
> following real-world characteristics: [...]


A colleague who did some work on IBM's AS/400 (they've
changed the name; I forget the new one) told me that simply
trying to calculate an out-of-range pointer yielded a null
pointer as a result. In the O.P.'s case, the intermediate
steps would go something like

a // OK so far
a + i // too big: result = NULL
NULL - j // not sure, but surely not good
*(NULL - j) // really Really REALLY not good

--
Eric Sosman
esosman@comcast-dot-net.invalid

Edward A. Falk 12-11-2012 12:46 AM

Re: Pointer Arithmetic & UB
 
In article <ka59e6$mq7$1@dont-email.me>,
Eric Sosman <esosman@comcast-dot-net.invalid> wrote:
>
> A colleague who did some work on IBM's AS/400 (they've
>changed the name; I forget the new one) told me that simply
>trying to calculate an out-of-range pointer yielded a null
>pointer as a result.


Heh; learn something new every day. I never would have guessed
that there was an actual architecture that would blow up with
this construct.

I assume that *(a+(i-j)) would be ok?

--
-Ed Falk, falk@despams.r.us.com
http://thespamdiaries.blogspot.com/

James Kuyper 12-11-2012 02:28 AM

Re: Pointer Arithmetic & UB
 
Context:
char a[10];
size_t i=20,j=15;
*(a+i-j)=42;

On 12/10/2012 07:46 PM, Edward A. Falk wrote:
....
> Heh; learn something new every day. I never would have guessed
> that there was an actual architecture that would blow up with
> this construct.
>
> I assume that *(a+(i-j)) would be ok?


That should be safe for all conforming implementations of C.
--
James Kuyper

Noob 12-11-2012 10:36 AM

Re: Pointer Arithmetic & UB
 
Edward A. Falk wrote:

> I assume that *(a+(i-j)) would be ok?


Please correct me if I am wrong,

*(a+(i-j)) is strictly equivalent to a[i-j]

(I find the latter clearer.)


Eric Sosman 12-11-2012 08:27 PM

Re: Pointer Arithmetic & UB
 
On 12/10/2012 7:46 PM, Edward A. Falk wrote:
> In article <ka59e6$mq7$1@dont-email.me>,
> Eric Sosman <esosman@comcast-dot-net.invalid> wrote:
>>
>> A colleague who did some work on IBM's AS/400 (they've
>> changed the name; I forget the new one) told me that simply
>> trying to calculate an out-of-range pointer yielded a null
>> pointer as a result.

>
> Heh; learn something new every day. I never would have guessed
> that there was an actual architecture that would blow up with
> this construct.
>
> I assume that *(a+(i-j)) would be ok?


Assuming `i-j' in range, yes.

More on my colleague's tale: The code maintained a buffer
in which items of various sizes accumulated, and which drained
to disk when it got too full or too old. To decide whether a
newly-offered item would fit, the code did something like

itemEndPtr = nextBufferSpacePtr + itemSize;
if (itemEndPtr < bufferStart + bufferSize) ...

This worked as intended on all the other target systems, but
failed on AS/400. I suspect the failure had something to do
with the fact that the buffer was in a shared memory area, so
stepping off the end also meant stepping outside of mapped
address space; the problem might not have shown up with the
`auto' array in your example.

Still, perhaps a salutary lesson for the folks who still
believe "All the world's a VAX^H^H^Hx86^H^H^Hx64^H^H^H..."

--
Eric Sosman
esosman@comcast-dot-net.invalid

glen herrmannsfeldt 12-11-2012 09:09 PM

Re: Pointer Arithmetic & UB
 
Eric Sosman <esosman@comcast-dot-net.invalid> wrote:

(previous snip on pointer offsets)

>>> A colleague who did some work on IBM's AS/400 (they've
>>> changed the name; I forget the new one) told me that simply
>>> trying to calculate an out-of-range pointer yielded a null
>>> pointer as a result.


>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.


>> I assume that *(a+(i-j)) would be ok?


> Assuming `i-j' in range, yes.


> More on my colleague's tale: The code maintained a buffer
> in which items of various sizes accumulated, and which drained
> to disk when it got too full or too old. To decide whether a
> newly-offered item would fit, the code did something like


> itemEndPtr = nextBufferSpacePtr + itemSize;
> if (itemEndPtr < bufferStart + bufferSize) ...


Might fail in x86 (especially the 80286) in huge model.

You can't load arbitrary data into segment selector registers
in protected mode x86. In large mode, though, any offset isn't
tested until an actual access is attempted. (The offset is in
an ordinary register, such as AX.)

In huge model, the system allocates a series of segments,
such that the one can address through them in order.

Still, I believe that the compilers are careful not to load
a segment selector until needed to actually access something,
maybe partly to allow such faulty C code.

> This worked as intended on all the other target systems, but
> failed on AS/400. I suspect the failure had something to do
> with the fact that the buffer was in a shared memory area, so
> stepping off the end also meant stepping outside of mapped
> address space; the problem might not have shown up with the
> `auto' array in your example.


I believe that could happen with protected mode x86, too.

> Still, perhaps a salutary lesson for the folks who still
> believe "All the world's a VAX^H^H^Hx86^H^H^Hx64^H^H^H..."


In the 80286 days, I had OS/2 1.0 and then 1.2 running, when
just about everyone else was running MS-DOS. Instead of using
malloc(), I would directly allocate segments from OS/2 of exactly
the needed length. The hardware will then interrupt for an access,
even read, either before or just after the end of the allocated
space. (Unless the register wraps, and it is back into the
allocated space again.)

As usual in C, a 2D array was allocated as an array of pointers,
each pointing to its own OS/2 allocated segment.

Fortunately, the C compilers were always good at not using segment
selector registers when copying pointers that might not point to
anything.

I don't know AS/400 that well, but there have been systems that relied
on the compiler to generate the appropriate code, instead of run-time
memory protection. I believe some Burroughs ALGOL systems worked that
way. (Maybe still do.)

As far as I know, they never had a C compiler, but if one did it might
also have problems with out of range pointers.

-- glen

Ken Brody 12-12-2012 07:30 PM

Re: Pointer Arithmetic & UB
 
On 12/10/2012 7:46 PM, Edward A. Falk wrote:
> In article <ka59e6$mq7$1@dont-email.me>,
> Eric Sosman <esosman@comcast-dot-net.invalid> wrote:
>>
>> A colleague who did some work on IBM's AS/400 (they've
>> changed the name; I forget the new one) told me that simply
>> trying to calculate an out-of-range pointer yielded a null
>> pointer as a result.

>
> Heh; learn something new every day. I never would have guessed
> that there was an actual architecture that would blow up with
> this construct.
>
> I assume that *(a+(i-j)) would be ok?


No. There is no requirement that the value of "i-j" be calculated prior to
adding it to "a". (Check the numerous threads here involving using
parentheses to "fix" UB in things involving such constructs as "i + (i++)".)
Operator precedence only guarantees how the expression is to be
interpreted, not the actual order of evaluation.



Ken Brody 12-12-2012 07:36 PM

Re: Pointer Arithmetic & UB
 
On 12/10/2012 9:28 PM, James Kuyper wrote:
> Context:
> char a[10];
> size_t i=20,j=15;
> *(a+i-j)=42;
>
> On 12/10/2012 07:46 PM, Edward A. Falk wrote:
> ...
>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.
>>
>> I assume that *(a+(i-j)) would be ok?

>
> That should be safe for all conforming implementations of C.


Are you sure? Does anything in the Standard *require* that "i-j" be
evaluated prior to adding it to "a"?

Haven't we had this discussion earlier, related to other forms of UB, with
the questioner asking if adding parentheses would "fix" the problem?



Keith Thompson 12-12-2012 07:42 PM

Re: Pointer Arithmetic & UB
 
Ken Brody <kenbrody@spamcop.net> writes:
> On 12/10/2012 7:46 PM, Edward A. Falk wrote:
>> In article <ka59e6$mq7$1@dont-email.me>,
>> Eric Sosman <esosman@comcast-dot-net.invalid> wrote:
>>>
>>> A colleague who did some work on IBM's AS/400 (they've
>>> changed the name; I forget the new one) told me that simply
>>> trying to calculate an out-of-range pointer yielded a null
>>> pointer as a result.

>>
>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.
>>
>> I assume that *(a+(i-j)) would be ok?

>
> No. There is no requirement that the value of "i-j" be calculated prior to
> adding it to "a". (Check the numerous threads here involving using
> parentheses to "fix" UB in things involving such constructs as "i + (i++)".)
> Operator precedence only guarantees how the expression is to be
> interpreted, not the actual order of evaluation.


True, but the expression `a+(i-j)` is evaluated *in the abstract
machine* by subtracting j from i and then adding the result to a.
A compiler is free to evaluate it by computing a+i and then
subtracting j from the result *only* if it can guarantee that the
result is the same, or if the canonical order has undefined behavior.

`INT_MAX + (1 - 1)` has well defined behavior.
`INT_MAX + 1 - 1` does not.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


All times are GMT. The time now is 01:59 AM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57