Velocity Reviews > Re: Pointer Arithmetic & UB

# Re: Pointer Arithmetic & UB

glen herrmannsfeldt
Guest
Posts: n/a

 12-12-2012
Ken Brody <(E-Mail Removed)> wrote:
> On 12/10/2012 9:28 PM, James Kuyper wrote:
>> Context:
>> char a[10];
>> size_t i=20,j=15;
>> *(a+i-j)=42;

(snip)
>>> I assume that *(a+(i-j)) would be ok?

>> That should be safe for all conforming implementations of C.

> Are you sure? Does anything in the Standard *require* that "i-j" be
> evaluated prior to adding it to "a"?

On many systems, the result is the same until you try to dereference
the result.

Seems to me that on any system where the result isn't the same, that
the compiler better do it in the appropriate order.

With the appropriate wrap on overflow characteristic, fixed point
arithmetic is associative. If the compiler knows that, it can compute
i+(j-k) as (i+j)-k, knowing the result is the same.

If something else happens on overflow, the compiler shouldn't do that.

It gets more interesting with floating point.

> Haven't we had this discussion earlier, related to other forms of UB, with

-- glen

James Kuyper
Guest
Posts: n/a

 12-12-2012
On 12/12/2012 02:36 PM, Ken Brody wrote:
> On 12/10/2012 9:28 PM, James Kuyper wrote:
>> Context:
>> char a[10];
>> size_t i=20,j=15;
>> *(a+i-j)=42;
>>
>> On 12/10/2012 07:46 PM, Edward A. Falk wrote:
>> ...
>>> Heh; learn something new every day. I never would have guessed
>>> that there was an actual architecture that would blow up with
>>> this construct.
>>>
>>> I assume that *(a+(i-j)) would be ok?

>>
>> That should be safe for all conforming implementations of C.

>
> Are you sure? Does anything in the Standard *require* that "i-j" be
> evaluated prior to adding it to "a"?

Yes, I'm sure; if you aren't, perhaps there's been a miscommunication of
some kind?

Check the grammar rules. The right operand of a binary '+' expression
must be a multiplicative-expression (6.5.6p1). '(' doesn't qualify;
neither does '(i', or '(i+' or '(i+j'; the only thing that can be parsed
as the right operand of the '+' operator in that expression is (i+j),
which parses as primary-expression (6.5.1p1), and therefore as a
postfix-expression (6.5.2p1), a unary-expression (6.5.3p1), a
cast-expression (6.5.4p1), and a multiplicative expression (6.5.5p1), in
that order.

For C99, I would have stopped the explanation at that point, considering
of two events is specified, and when it isn't, so there's a couple of
additional citations that are relevant. I believe that what they say was
inherently true even in C99, where it was not explicitly said:

"The value computations of the operands of an operator
are sequenced before the value computation of the result of the
operator." 6.5p1.

"An evaluation A happens before an evaluation B if A is sequenced before
B." 5.1.2.4p9

> Haven't we had this discussion earlier, related to other forms of UB, with

The problem with *(a+i-j) is that the standard mandates that 'i' be
added to 'a' before 'j' is subtracted from the result. Putting a
parenthesis around 'i - j' converts those three tokens into a single
primary-expression. That's why *(a+(i-j)) fixes the problem. It forces
the value computations for the subtraction expression to happen before
the value computations of the binary addition expression.

Kenneth gives i+(i++) as an example of a case where parentheses do
nothing to resolve the underlying problem. That is because the problem
is the absences of a sequence point separating 'i' from 'i++'.
Parenthesis do not insert a sequence point, and therefore do NOT solve
that problem.

Eric Sosman
Guest
Posts: n/a

 12-12-2012
On 12/12/2012 2:36 PM, Ken Brody wrote:
> On 12/10/2012 9:28 PM, James Kuyper wrote:
>> Context:
>> char a[10];
>> size_t i=20,j=15;
>> *(a+i-j)=42;
>>
>> On 12/10/2012 07:46 PM, Edward A. Falk wrote:
>> ...
>>> Heh; learn something new every day. I never would have guessed
>>> that there was an actual architecture that would blow up with
>>> this construct.
>>>
>>> I assume that *(a+(i-j)) would be ok?

>>
>> That should be safe for all conforming implementations of C.

>
> Are you sure? Does anything in the Standard *require* that "i-j" be
> evaluated prior to adding it to "a"?

No, but the Standard requires that the thing added to `a'
be the value of `i-j'. The "as if" rule still applies, so an
actual implementation might calculate something that might be
written as `a-j+i' or `i+a-j' or `a-(j-i)' or a host of other
possibilities. Still, the result -- including the definedness
of the result -- must be as for "`a' plus `i-j'".

> Haven't we had this discussion earlier, related to other forms of UB,
> with the questioner asking if adding parentheses would "fix" the problem?

Nitpick: Since this isn't UB, "other" is out of place.

The usual misunderstanding is that the association of
operators with their operands -- "expression tree order" --
dictates evaluation order, which it doesn't. (Except for
certain special operators like ||, and even then only in
part.)

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d

James Kuyper
Guest
Posts: n/a

 12-12-2012
On 12/12/2012 04:02 PM, Eric Sosman wrote:
....
> The usual misunderstanding is that the association of
> operators with their operands -- "expression tree order" --
> dictates evaluation order, which it doesn't. (Except for
> certain special operators like ||, and even then only in
> part.)

The expression tree does not impose an evaluation order on it's branches
at the same level (with the exceptions that you noted), but it does
impose a requirement that the operands be evaluated before the
expression itself. I believe that this requirement has always been
implied by the semantics of each expression, but C2011 has made this
requirement explicit for all expression in 6.5p1 and 5.1.2.4p18 (which I
just mis-cited in my response to Kenneth as 5.1.2.4p9).

Ken Brody
Guest
Posts: n/a

 12-12-2012
On 12/12/2012 2:30 PM, Ken Brody wrote:
> On 12/10/2012 7:46 PM, Edward A. Falk wrote:
>> In article <ka59e6\$mq7\$(E-Mail Removed)>,
>> Eric Sosman <(E-Mail Removed)> wrote:
>>>
>>> A colleague who did some work on IBM's AS/400 (they've
>>> changed the name; I forget the new one) told me that simply
>>> trying to calculate an out-of-range pointer yielded a null
>>> pointer as a result.

>>
>> Heh; learn something new every day. I never would have guessed
>> that there was an actual architecture that would blow up with
>> this construct.
>>
>> I assume that *(a+(i-j)) would be ok?

>
> No. There is no requirement that the value of "i-j" be calculated prior to
> adding it to "a". (Check the numerous threads here involving using
> parentheses to "fix" UB in things involving such constructs as "i + (i++)".)
> Operator precedence only guarantees how the expression is to be
> interpreted, not the actual order of evaluation.

As noted in the replies to my post, I stand corrected. Because of the
"as-if" rule, if evaluating "i-j" first would not cause an overflow in
"a+(i-j)", then the compiler must guarantee that any rearranging of the code
will give an identical result, even if an overflow does occur.

Eric Sosman
Guest
Posts: n/a

 12-13-2012
On 12/12/2012 4:13 PM, James Kuyper wrote:
> On 12/12/2012 04:02 PM, Eric Sosman wrote:
> ...
>> The usual misunderstanding is that the association of
>> operators with their operands -- "expression tree order" --
>> dictates evaluation order, which it doesn't. (Except for
>> certain special operators like ||, and even then only in
>> part.)

>
> The expression tree does not impose an evaluation order on it's branches
> at the same level (with the exceptions that you noted), but it does
> impose a requirement that the operands be evaluated before the
> expression itself. I believe that this requirement has always been
> implied by the semantics of each expression, but C2011 has made this
> requirement explicit for all expression in 6.5p1 and 5.1.2.4p18 (which I
> just mis-cited in my response to Kenneth as 5.1.2.4p9).

Although I haven't studied the C11 stuff in detail, I'd
be surprised (and disappointed!) if in

#define WHICH 1
...
int r = WHICH * (x + y) + (1 - WHICH) * (z - x);

.... the Standard required that `z - x' be evaluated at all,
much less "before" the entire expression.

However, neither surprise nor disappointment is entirely
strange to me. Embarrassment is an old pal, too ...

--
Eric Sosman
(E-Mail Removed)d

James Kuyper
Guest
Posts: n/a

 12-13-2012
On 12/12/2012 09:18 PM, Eric Sosman wrote:
....
> Although I haven't studied the C11 stuff in detail, I'd
> be surprised (and disappointed!) if in
>
> #define WHICH 1
> ...
> int r = WHICH * (x + y) + (1 - WHICH) * (z - x);
>
> ... the Standard required that `z - x' be evaluated at all,
> much less "before" the entire expression.

Well, the as-if rule always trumps any other requirements, when it
applies - if a strictly conforming program can't determine whether or
not sub-expressions were evaluated in the required order, evaluating
them in that order isn't really required. If it can't even determine
whether they were evaluated, they don't even have to be evaluated.

> However, neither surprise nor disappointment is entirely
> strange to me. Embarrassment is an old pal, too ...

Yep, I know him well myself.
--
James Kuyper

Phil Carmody
Guest
Posts: n/a

 12-17-2012
Eric Sosman <(E-Mail Removed)> writes:
> On 12/12/2012 4:13 PM, James Kuyper wrote:
> > On 12/12/2012 04:02 PM, Eric Sosman wrote:
> > ...
> >> The usual misunderstanding is that the association of
> >> operators with their operands -- "expression tree order" --
> >> dictates evaluation order, which it doesn't. (Except for
> >> certain special operators like ||, and even then only in
> >> part.)

> >
> > The expression tree does not impose an evaluation order on it's branches
> > at the same level (with the exceptions that you noted), but it does
> > impose a requirement that the operands be evaluated before the
> > expression itself. I believe that this requirement has always been
> > implied by the semantics of each expression, but C2011 has made this
> > requirement explicit for all expression in 6.5p1 and 5.1.2.4p18 (which I
> > just mis-cited in my response to Kenneth as 5.1.2.4p9).

>
> Although I haven't studied the C11 stuff in detail, I'd
> be surprised (and disappointed!) if in
>
> #define WHICH 1
> ...
> int r = WHICH * (x + y) + (1 - WHICH) * (z - x);
>
> ... the Standard required that `z - x' be evaluated at all,
> much less "before" the entire expression.

I am deliriously happy that the Standard requires that (the implementation
behave as if) `z - x' is evaluated. That would be, and is, consistent behaviour.

Pulling out the big cannon - if z and x are volatile, of course you

If you meant to say

int r = WHICH ? (x+y) : (z-x);

then write that, not some other silly expression which does arithmetic rather
than conditional evaluation.

Phil
--
I'm not saying that google groups censors my posts, but there's a strong link
between me saying "google groups sucks" in articles, and them disappearing.

Oh - I guess I might be saying that google groups censors my posts.

Phil Carmody
Guest
Posts: n/a

 12-17-2012
"christian.bau" <(E-Mail Removed)> writes:
> On Dec 11, 10:36*am, Noob <r...@127.0.0.1> wrote:
> > Edward A. Falk wrote:
> > > I assume that *(a+(i-j)) would be ok?

> >
> > Please correct me if I am wrong,
> >
> > *(a+(i-j)) is strictly equivalent to a[i-j]
> >
> > (I find the latter clearer.)

>
> Yes, it's the same. But there are also cases where * (a + i - j) would
> be fine and * (a + (i - j)) or a [i - j] wouldn't: If you have 64 bit
> pointers and 32 bit ints, then i - j might overflow, while a + i - j
> could be correct.

i and j are not int but size_t. What do you mean by "overflow" in that context?
Can you come up with a concrete example of failure which doesn't have UB
in the "correct" version?

Phil
--
I'm not saying that google groups censors my posts, but there's a strong link
between me saying "google groups sucks" in articles, and them disappearing.

Oh - I guess I might be saying that google groups censors my posts.

Phil Carmody
Guest
Posts: n/a

 12-17-2012
Ken Brody <(E-Mail Removed)> writes:
> On 12/10/2012 7:46 PM, Edward A. Falk wrote:
> > In article <ka59e6\$mq7\$(E-Mail Removed)>,
> > Eric Sosman <(E-Mail Removed)> wrote:
> >>
> >> A colleague who did some work on IBM's AS/400 (they've
> >> changed the name; I forget the new one) told me that simply
> >> trying to calculate an out-of-range pointer yielded a null
> >> pointer as a result.

> >
> > Heh; learn something new every day. I never would have guessed
> > that there was an actual architecture that would blow up with
> > this construct.
> >
> > I assume that *(a+(i-j)) would be ok?

>
> No. There is no requirement that the value of "i-j" be calculated
> prior to adding it to "a". (Check the numerous threads here involving
> using parentheses to "fix" UB in things involving such constructs as
> "i + (i++)".) Operator precedence only guarantees how the expression
> is to be interpreted, not the actual order of evaluation.

The "*(a+(i-j))" expression has *nothing* in common with the part of the
"i+(i++)" that pertains to UB. That brackets fail to do something in an
unrelated situation is basically irrelevant.

Phil
--
I'm not saying that google groups censors my posts, but there's a strong link
between me saying "google groups sucks" in articles, and them disappearing.

Oh - I guess I might be saying that google groups censors my posts.