Velocity Reviews > sequence points in subexpressions

# sequence points in subexpressions

Beej Jorgensen
Guest
Posts: n/a

 12-13-2009
On 12/13/2009 12:49 PM, Chad wrote:
> On Dec 13, 11:20 am, Flash Gordon <(E-Mail Removed)> wrote:
>> Seebs wrote:
>> > a[i] = (1, i++, 1);

>>
>> > It seems clear to me that there's a real-world risk that the evaluation of
>> > i on the left is at risk of occurring during the evaluation of the RHS.
>> > So I don't think there's a sequence point between the sides.

>>
>> That example is different because i is used on the left to determine the
>> object to be stored, where as in the original it is merely the object in
>> which the result will be stored.
>>

> Why wouldn't the result get store in a[i] = (1, i++, 1); ?

Just because the right side of the assignment is in parentheses and uses
the comma operator doesn't mean the subexpression in left side can't be
evaluated at the same time. (C99 6.5p3)

(1, i++, 1) definitely has to be evaluated before the assignment, but
the a+i subexpression of *(a+i) (same as a[i]) can be evaluated before
or after (1, i++, 1).

I don't think this example runs afoul of 6.5p2, which forbids things
like a[i++]=i, because of the sequence point after i++. ...?

-Beej

Flash Gordon
Guest
Posts: n/a

 12-13-2009
James Dow Allen wrote:
> On Dec 14, 2:20 am, Flash Gordon <(E-Mail Removed)> wrote:
>> Seebs wrote:
>>> I think it is UB.

>> I think it isn't UB.

>
> I'm not sure. In the simple case:
> i = (1, i++, i) + 1;
> It may be hard to imagine how the C system
> could go wrong, but one might be able to imagine
> some cache-speeding trick that assumes it
> won't encounter this code (or can do what it wants
> with it, if marked UB in The Standard).

Certainly if it is UB such assumptions can be made, but it is?
There is a sequence point between the evaluation of i++ and the
evaluation of i to its right, and it is the result of that i which is
yielded by the comma operator and then has 1 added to it before being
assigned to i. So, the sequence point of the comma operator is before
the assignment side effect of the equals operator.

> For those who think commas are permitted, what about:
> *(p += i, ++i, p += i) = j++, ++j, j;
> No problem right?
> The commas at left separate left-side sequence points,
> and commas at right separate (order) a different
> set of sequence points. We end up, in effect with
> i += 1, *(p += i+i-1) = j += 2;
>
> What do we know about *which* sequence points are
> reached first, right-side vs left-side, or can they
> be interelaved?

They can.

> *(p += i, ++i, p += i) = i++, ++i, i;
> Definitely UB-lookingish.

Yes, because there here i is not simply the object to which the right
hand side of the equals operator is being assigned.

>>> In most of the "interesting" edge cases, the right answer is not to go there.

>> .
>> That I definitely agree with. I would reject any code like this I came
>> across in a code review.

>
> While certainly this code would be rejected,
> it *is* good to look at border cases.

Well, it doesn't particularly bother me.

> On Dec 13, 9:30 pm, Nick <(E-Mail Removed)> wrote:
>> doesn't look the sort of thing that might be produced by computer
>> generated code, and it doesn't look the thing you'd actually want to
>> write into a program. If it's that borderline and you'd never need it,
>> why does it actually matter, other than as a sort of C language sudoku.

>
> I argued much like this 5 weeks back in a somewhat similar thread and
> was rebuked. And the old thread was the same old silly expression
> designed
> to provoke UB, while OP's query *does* represent a defining corner-
> case.

Ah well, you can't expect to always get the same answer!
--
Flash Gordon

Kaz Kylheku
Guest
Posts: n/a

 12-14-2009
On 2009-12-13, pete <(E-Mail Removed)> wrote:
> Richard wrote: >> pete <(E-Mail Removed)> writes:
>>
>>
>>>(E-Mail Removed) wrote:
>>>
>>>>Does the statement given below invoke undefined behavior?
>>>>i = (i, i++, i) + 1;
>>>>
>>>>I am almost convinced that it does not

>
> It does.
>
>> because of the following
>>>>reasons
>>>>
>>>>1> the RHS must be evaluated before a value can be stored in i

>
> That's wrong.
>
>>>
>>>You would think so but,
>>>the evaluation of an expression also includes side effects,
>>>and the side effects of the evaluation of the right operand
>>>do not have to occur
>>>before the assignment operation on the left operand.

>>
>>
>> What side affects would you expect from the right hand side, keeping in
>> mind the sequence points?
>>

>
> The side effect from the increment operator.

The value being stored in the assignment is that of the rightmost
operand of the comma expression. The computation of the rightmost
operand follows a sequence point. So the modification of i in the
assignment is well-ordered with regard to the prior side effects
in the comma expression.

> The sequence points from the comma operator are not relevant
> because there is no sequence point between the evaluation
> of the right and left operands of the assignment operator.

There is a data flow dependency, however. The value cannot be
stored before it is computed.

In the expression

i = (i, i++, i) + 1;
^

the value to be stored is derived from the value of the expression
denoted by the caret, by adding 1.

The denoted expression is the right operand of a comma, so its
evaluation is delayed until prior side effects have settled.

The sequencing in the comma operator, plus the dataflow dependency
in the assigment, add up to well-defined behavior.

Keith Thompson
Guest
Posts: n/a

 12-14-2009
pete <(E-Mail Removed)> writes:
> Richard wrote:
>> pete <(E-Mail Removed)> writes:
>>
>>
>>>(E-Mail Removed) wrote:
>>>
>>>>Does the statement given below invoke undefined behavior?
>>>>i = (i, i++, i) + 1;
>>>>
>>>>I am almost convinced that it does not

>
> It does.
>
>> because of the following
>>>>reasons
>>>>
>>>>1> the RHS must be evaluated before a value can be stored in i

>
> That's wrong.
>
>>>
>>>You would think so but,
>>>the evaluation of an expression also includes side effects,
>>>and the side effects of the evaluation of the right operand
>>>do not have to occur
>>>before the assignment operation on the left operand.

>>
>>
>> What side affects would you expect from the right hand side, keeping in
>> mind the sequence points?
>>

>
> The side effect from the increment operator.
>
> The sequence points from the comma operator are not relevant
> because there is no sequence point between the evaluation
> of the right and left operands of the assignment operator.
>
>
> If the right operand of the assignment opeartor is evaluated first,
> then there shouldn't be any problem with
> i = i++;
> but there is a problem.
> Assignment is not a sequence point.

The problem with

i = i++;

is that the side effect of the "++" can happen any time before
the end of the statement.

In

i = (i, i++, i) + 1;

let's consider the subexpression

(i, i++, i)

Here, the side effect of the "++" must happen before the next sequence
point, which occurs, not at the end of the statement, but at the comma
operator. The entire subexpression yields the value of i after it's
been incremented, and the value of i is updated before the
subexpression completes.

Looking at the full expression:

i = (i, i++, i) + 1;

I argue that adding 1 to the result of the subexpression and
assigning that to i doesn't introduce any undefined behavior.
The assignment cannot modify i until the RHS has been evaluated.
The RHS cannot yield a result until after the side effect of the
increment has occurred.

So you could at least have a reasonable and consistent set of rules
that makes "i = i++" undefined but makes "i = (i, i++, i) + 1" well
defined.

Whether C99 actually has such rules is another question. N1256 says:

Between the previous and next sequence point an object shall
have its stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be read
only to determine the value to be stored.

I *think* this makes the expression in question well defined, but
I'm not certain.

The C201X drafts (the latest is
<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1425.pdf>)
use different wording in this area, referring to operations being
"sequenced before" or "sequenced after" other operations. The new
wording might make this case clearer (I'm too lazy to check at
the moment).

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Beej Jorgensen
Guest
Posts: n/a

 12-14-2009
On 12/12/2009 09:09 PM, (E-Mail Removed) wrote:
> Does the statement given below invoke undefined behavior?
> i = (i, i++, i) + 1;
>
> I am almost convinced that it does not because of the following
> reasons

Ok, I've been poring over the latest draft, which takes a better stab at
all of this. I still don't really know the answer, but here's more
stuff according to that draft. (I stared at C99 today trying to coax
the real answer out of it, but I was just getting unhappy with the model
which didn't seem to want to spit it out. C09 is more complex with some
abstractions that I think help clarify these issues.)

this is my understanding of a document that I just looked at for the
first time today and is in no way necessarily correct or definitive.

> 1> the RHS must be evaluated before a value can be stored in i

It's a little bit nuanced, because "evaluation" is two things:

# Evaluation of an expression in general includes both value
# computations and initiation of side effects. [5.1.2.3p2]

Note that it's not "resolution" of side effects (which don't necessarily
occur until a sequence point.)

With respect to expressions:

# The value computations of the operands of an operator are sequenced
# before the value computation of the result of the operator. [6.5p1]

So, yes, the value computation must be done before the assignment, but
not necessarily the resolution of side effects.

In terms of sequencing of operations:

# Given any two evaluations A and B, if A is sequenced before B, then
# the execution of A shall precede the execution of B. [...] If A is
# not sequenced before or after B, then A and B are unsequenced.
# [5.1.2.3p3]

Remember, we're talking about "evaluations", which does not necessarily
include resolution of side effects.

And how this relates to expressions (this is *the* paragraph that lays
down the law):

# If a side effect on a scalar object is unsequenced relative to either
# a different side effect on the same scalar object or a value
# computation using the value of the same scalar object, the behavior is
# undefined. [6.5p2]

So back to the example:

i = (i, i++, i) + 1;

We have two side effects in the assignment and the ++. The question is,
are they sequenced?

Well, we know that the value computations of the operands to + are
sequenced before the value computation of the result of +. So the value
of 1 and the value of (i,i++,i) are computed before the result of + is.

What of the comma operator?

# The left operand of a comma operator is evaluated as a void
# expression; there is a sequence point between its evaluation and that
# of the right operand. Then the right operand is evaluated; the result
# has its type and value. [6.5.17p2]

What is a sequence point?

# The presence of a sequence point between the evaluation of expressions
# A and B implies that every value computation and side effect
# associated with A is sequenced before every value computation and side
# effect associated with B. [5.1.2.3p3]

So now we get our forced sequencing of side effects, as well. With the
expression (i,i++,i), the side effect of i++ must be complete before the
value of the expression (namely i) is can be computed. And the value of
the expression must be computed before it can subsequently be used by +.

And +'s value must be computed before the assignment can occur:

# The side effect of updating the stored value of the left operand is
# sequenced after the value computations of the left and right operands.
# [6.5.16p3]

Working backward:

o For the assignment side effect to occur, the value computations of
both operands of the assignment must be complete.

o For the value computations on the right side of the assignment to be
complete, the value computations of the + operator's operands have
to be complete.

o For the value computation of (i,i++,i) to be complete, i++'s side
effects must be complete.

And so, I think, the side effect of i++ is sequenced before the side
effect of i=, and so in this case is not undefined behavior.

Some counter cases:

i = i++;

While the sequence of value computations is defined for i=i++, the
side effects are unsequenced, and so it is undefined behavior.

|----- A ----| |----- B ----|
k = (i, i /= 3, i) + (i, i *= 5, i); // "please...kill me..."

In this case, the value computations of both subexpressions A and B
must be complete before +, and therefore, by the previous pages of
arguments, the side effects of i/=3 and i*=5 must also be complete
before the +.

And, therefore, the side effects of i/=3 and i*=5 must also be
complete before the result of the value computation of + is finally
assigned into k.

However, the two subexpressions A and B are unsequenced relative to
one another and both modify the same object, and so the behavior is
undefined.

Do I believe it myself? I don't even know anymore.

What do you think, folks?

-Beej

(Remember: this analysis is based on the draft, not the Standard. I'm
just presuming they're going to try to keep it basically compatible.)

Ben Bacarisse
Guest
Posts: n/a

 12-14-2009
Beej Jorgensen <(E-Mail Removed)> writes:
<snip>
> Ok, I've been poring over the latest draft, which takes a better stab at
> all of this.

<snip>
> So back to the example:
>
> i = (i, i++, i) + 1;
>
> We have two side effects in the assignment and the ++. The question is,
> are they sequenced?
>
> Well, we know that the value computations of the operands to + are
> sequenced before the value computation of the result of +. So the value
> of 1 and the value of (i,i++,i) are computed before the result of + is.
>
> What of the comma operator?
>
> # The left operand of a comma operator is evaluated as a void
> # expression; there is a sequence point between its evaluation and that
> # of the right operand. Then the right operand is evaluated; the result
> # has its type and value. [6.5.17p2]
>
> What is a sequence point?
>
> # The presence of a sequence point between the evaluation of expressions
> # A and B implies that every value computation and side effect
> # associated with A is sequenced before every value computation and side
> # effect associated with B. [5.1.2.3p3]
>
> So now we get our forced sequencing of side effects, as well. With the
> expression (i,i++,i), the side effect of i++ must be complete before the
> value of the expression (namely i) is can be computed. And the value of
> the expression must be computed before it can subsequently be used by +.
>
> And +'s value must be computed before the assignment can occur:
>
> # The side effect of updating the stored value of the left operand is
> # sequenced after the value computations of the left and right operands.
> # [6.5.16p3]

I find all this wording much clearer than the old description. The
trouble I always had with the old wording is that sequence points are
points in the program text, but the restriction on what is permitted
is worded in terms of temporal ordering of actual events. When the
C99 standard says, in effect, that the order of execution is
unspecified, you are left trying to relate possible execution paths
though the text so as to get all the event orderings that are possible
to see if any violate the constraint.

This new wording greatly simplifies the task of ascertaining the
permitted orderings. I like it much better.

> Working backward:
>
> o For the assignment side effect to occur, the value computations of
> both operands of the assignment must be complete.
>
> o For the value computations on the right side of the assignment to be
> complete, the value computations of the + operator's operands have
> to be complete.
>
> o For the value computation of (i,i++,i) to be complete, i++'s side
> effects must be complete.
>
> And so, I think, the side effect of i++ is sequenced before the side
> effect of i=, and so in this case is not undefined behavior.

I agree. I also agree (for what it is worth) that it is not undefined
even using the current text of the standard.

<snip>
--
Ben.

Michael Foukarakis
Guest
Posts: n/a

 12-14-2009
On Dec 14, 10:19*am, pete <(E-Mail Removed)> wrote:
> Beej Jorgensen wrote:
> > *o *For the value computation of (i,i++,i) to be complete, i++'s side
> > * * effects must be complete.

>
> I disagree.
> I know that the value of (i,i++,i) is one greater
> than the original value of (i).
> I computed that without accomplishing any side effects.

The postfix increment operator's side effect is incrementing its
operand by 1. That's what you "computed". Get it now? Basic
comprehension OK?

Beej's post is great and very informative. The OP's construct doesn't
invoke UB.

Beej Jorgensen
Guest
Posts: n/a

 12-14-2009
On 12/14/2009 12:19 AM, pete wrote:
> Beej Jorgensen wrote:
>
>> o For the value computation of (i,i++,i) to be complete, i++'s side
>> effects must be complete.

>
> I disagree.
> I know that the value of (i,i++,i) is one greater
> than the original value of (i).
> I computed that without accomplishing any side effects.

Then I think you skipped over a sequence point without performing every
value computation and side effect associated with subexpression i++, in
violation of 5.1.2.3p3:

# The presence of a sequence point between the evaluation of expressions
# A and B implies that every value computation and side effect
# associated with A is sequenced before every value computation and side
# effect associated with B.

But you're arguing the side effects don't necessarily take place at the
sequence point, is that right?

-Beej

Eric Sosman
Guest
Posts: n/a

 12-14-2009
On 12/13/2009 6:20 PM, Flash Gordon wrote:
> James Dow Allen wrote:
>> On Dec 14, 2:20 am, Flash Gordon <(E-Mail Removed)> wrote:
>>> Seebs wrote:
>>>> I think it is UB.
>>> I think it isn't UB.

>>
>> I'm not sure. In the simple case:
>> i = (1, i++, i) + 1;
>> It may be hard to imagine how the C system
>> could go wrong, but one might be able to imagine
>> some cache-speeding trick that assumes it
>> won't encounter this code (or can do what it wants
>> with it, if marked UB in The Standard).

>
> Certainly if it is UB such assumptions can be made, but it is?
> There is a sequence point between the evaluation of i++ and the
> evaluation of i to its right, and it is the result of that i which is
> yielded by the comma operator and then has 1 added to it before being
> assigned to i. So, the sequence point of the comma operator is before
> the assignment side effect of the equals operator.

The value of the parenthesized sub-expression is the
value of `i' after incrementation, yes. But where is it
written that the sub-expression's value must be determined
by actually reading it from `i'? If an optimizing compiler
knew that `i' was 42 before the line in question, could it
not replace the assignment with `i=44', with the `i++'
happening at some undetermined moment?

--
Eric Sosman
(E-Mail Removed)lid

Michael Tsang
Guest
Posts: n/a

 12-14-2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(E-Mail Removed) wrote:

> Does the statement given below invoke undefined behavior?
> i = (i, i++, i) + 1;
>
> I am almost convinced that it does not because of the following
> reasons
>
> 1> the RHS must be evaluated before a value can be stored in i
> 2> evaluation of RHS does not invoke UB due to the sequence points
> introduced by comma operator
>
> Correct me if i an wrong!
>
> Thanks

I don't think it is UB. Let SQ 0 be the last sequence point before the full
expression, SQ 1 be the sequence point between i and i++, SQ 2 be the
sequence point between ++i and i, SQ 3 be the sequence point the the end of
the full expression. Because the right hand side must be read in order to
determine the value stored, the = operator must be evaluated between SQ 2
and SQ 3 but in the right hand side, i++ is done between SQ 1 and SQ 2 so
there is no UB.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksmTKEACgkQG6NzcAXitM9oxgCfTCFGZxWjIl 4iJP/5YYlLOgkq
hVgAnjmhyxO6RMYmsa6WztW65CNBlyHC
=VY9x
-----END PGP SIGNATURE-----