Velocity Reviews > Is the aliasing rule symmetric?

# Is the aliasing rule symmetric?

Johannes Schaub (litb)
Guest
Posts: n/a

 02-07-2011
Joshua Maurice wrote:

> On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
> <(E-Mail Removed)> wrote:
>> [snipped]

> To be clear, you think that there's a difference between
> a->x = 2;
> and
> int* x = & a->x;
> *x = 2;
> ?
>
> It would take me a long time to buy that.

Yes I think there is a difference between te two. The first uses the struct
for the access. The second does not.

Joshua Maurice
Guest
Posts: n/a

 02-07-2011
On Feb 7, 8:09*am, "Johannes Schaub (litb)"
<(E-Mail Removed)> wrote:
> Joshua Maurice wrote:
> > On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
> > <(E-Mail Removed)> wrote:
> >> [snipped]

> > To be clear, you think that there's a difference between
> > * a->x = 2;
> > and
> > * int* x = & a->x;
> > * *x = 2;
> > ?

>
> > It would take me a long time to buy that.

>
> Yes I think there is a difference between te two. The first uses the struct
> for the access. The second does not.

I never really considered this beyond a first glance.

Again, to be crystal clear, consider:
/* 1 */
a -> x = 2;
and
/* 2 */
* ( & ( a -> x )) = 2;
and
/* 3 */
int* x = & a->x;
*x = 2;

You really think there's a difference? Really? Where's the difference?
Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
be entirely equivalent, or I'm really losing it.

As a naive understanding for the difference between 1 and 2: The
referred to the lvalue, and then the dereference operator (*) simply
takes that pointer value and returns back the same lvalue (which
refers to the same object). This isn't operator overloading in C++. I
would think that it ought to be a noop. If there is any difference at
all between any of 1, 2, and 3 above in this post, then I have a
fundamental misunderstanding of the language.

Wojtek Lerch
Guest
Posts: n/a

 02-07-2011
On 07/02/2011 3:11 PM, Joshua Maurice wrote:
> Again, to be crystal clear, consider:
> /* 1 */
> a -> x = 2;
> and
> /* 2 */
> * (& ( a -> x )) = 2;
> and
> /* 3 */
> int* x =& a->x;
> *x = 2;
>
> You really think there's a difference? Really? Where's the difference?
> Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
> be entirely equivalent, or I'm really losing it.
>
> As a naive understanding for the difference between 1 and 2: The
> referred to the lvalue, and then the dereference operator (*) simply
> takes that pointer value and returns back the same lvalue (which
> refers to the same object). This isn't operator overloading in C++. I
> would think that it ought to be a noop. If there is any difference at
> all between any of 1, 2, and 3 above in this post, then I have a
> fundamental misunderstanding of the language.

I don't think a C pointer is simply just the address of an object. If
you consider the rules of pointer arithmetic and DR260, a pointer value
carries some extra properties that decide what operations on it are
defined. Two pointers may compare equal and be represented by identical
bit patterns, but depending on their "provenance", one of them may be
safe to dereference or increment but not decrement, while the other may
be safe to decrement but not increment or dereference. The standard
tells us that every object can be considered an array element, and every
pointer to an object has a range of integers that can be legitimately
added to it, based on the object's "arrayness"; but the standard rarely
bothers explaining how to determine what that array is, and all we can
do is rely on obvious guesses where they're obvious, and in less-obvious
cases we can hope that the guess is even harder for a compiler, forcing
it to generate code that does the "naive" thing regardless of what the
limit would be if the standard didn't neglect to specify it.

Since this "arrayness" is not exactly on topic here, I don't want to go
too deep into it now; but maybe pointers are also supposed to remember
their "structness", and your &a->x (and also your x) are not just
pointers to an int that is known not to be an array element, but
pointers to an int that is known not to be an array element but is also
known to be the "x" member of a struct T1? If that were the case, then
maybe a simple assignment to *x could still impose an effective type of
struct T1 on the object surrounding the int that x points to. But of
course none of that is actually discussed in the standard, just like the
transformations of "arrayness" are not discussed for most of the
operations where they apparently happen.

Joshua Maurice
Guest
Posts: n/a

 02-07-2011
On Feb 7, 1:31*pm, Wojtek Lerch <(E-Mail Removed)> wrote:
> On 07/02/2011 3:11 PM, Joshua Maurice wrote:
>
>
>
> > Again, to be crystal clear, consider:
> > * */* 1 */
> > * *a -> *x = 2;
> > and
> > * */* 2 */
> > * ** (& *( a -> *x )) = 2;
> > and
> > * */* 3 */
> > * *int* x =& *a->x;
> > * **x = 2;

>
> > You really think there's a difference? Really? Where's the difference?
> > Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
> > be entirely equivalent, or I'm really losing it.

>
> > As a naive understanding for the difference between 1 and 2: The
> > addressof operator (&) simply returns the address of the object
> > referred to the lvalue, and then the dereference operator (*) simply
> > takes that pointer value and returns back the same lvalue (which
> > refers to the same object). This isn't operator overloading in C++. I
> > would think that it ought to be a noop. If there is any difference at
> > all between any of 1, 2, and 3 above in this post, then I have a
> > fundamental misunderstanding of the language.

>
> I don't think a C pointer is simply just the address of an object. *If
> you consider the rules of pointer arithmetic and DR260, a pointer value
> carries some extra properties that decide what operations on it are
> defined. *Two pointers may compare equal and be represented by identical
> bit patterns, but depending on their "provenance", one of them may be
> safe to dereference or increment but not decrement, while the other may
> be safe to decrement but not increment or dereference. *The standard
> tells us that every object can be considered an array element, and every
> pointer to an object has a range of integers that can be legitimately
> added to it, based on the object's "arrayness"; but the standard rarely
> bothers explaining how to determine what that array is, and all we can
> do is rely on obvious guesses where they're obvious, and in less-obvious
> cases we can hope that the guess is even harder for a compiler, forcing
> it to generate code that does the "naive" thing regardless of what the
> limit would be if the standard didn't neglect to specify it.
>
> Since this "arrayness" is not exactly on topic here, I don't want to go
> too deep into it now; but maybe pointers are also supposed to remember
> their "structness", and your &a->x (and also your x) are not just
> pointers to an int that is known not to be an array element, but
> pointers to an int that is known not to be an array element but is also
> known to be the "x" member of a struct T1? *If that were the case, then
> maybe a simple assignment to *x could still impose an effective type of
> struct T1 on the object surrounding the int that x points to. *But of
> course none of that is actually discussed in the standard, just like the
> transformations of "arrayness" are not discussed for most of the
> operations where they apparently happen.

Indeed. This is exactly what I meant when I was saying "data
dependency analysis". I think your way is clearer. (It could be that)
pointer values carry with them some semantic information, in this case
it remembers that it came from a memberof expression on a T1 lvalue.
I'll have to check out that DR. Are there any other spots in the C
standard which you suggest that I look at regarding this "arrayness"?

Johannes Schaub (litb)
Guest
Posts: n/a

 02-08-2011
On 07.02.2011 12:10, Tim Rentsch wrote:
> "Johannes Schaub (litb)"<(E-Mail Removed)> writes:
>
>> [snip]
>>
>> In particular, I think the committee intends the spec to say that a struct
>> or union access expression involves an access with the struct or union
>> lvalue.
>>
>> T1 *p = malloc(sizeof *p);
>> p->x = 0;
>>
>> In this case, I think the committee's intent is that the object pointed to
>> by "p" is accesse by an lvalue of type T1, and so the effective type of the
>> object containing the int changes to T1. So a later cast and access by an
>> lvalue of T2 will be undefined behavior.

>
> I'm not aware of any evidence that supports this theory (ie,
> that using '.' or '->' is also an access for the left operand).
> Furthermore it seems to be in conflict with the definitions the
> Standard gives for access, value, etc.
>
> Do you have any such evidence to offer? Or are you simply
> stating an unsupported opinion?

The committee argues that way in the union DR. See http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_236.htm .

Johannes Schaub (litb)
Guest
Posts: n/a

 02-08-2011
On 07.02.2011 21:11, Joshua Maurice wrote:
> On Feb 7, 8:09 am, "Johannes Schaub (litb)"
> <(E-Mail Removed)> wrote:
>> Joshua Maurice wrote:
>>> On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
>>> <(E-Mail Removed)> wrote:
>>>> [snipped]
>>> To be clear, you think that there's a difference between
>>> a->x = 2;
>>> and
>>> int* x =& a->x;
>>> *x = 2;
>>> ?

>>
>>> It would take me a long time to buy that.

>>
>> Yes I think there is a difference between te two. The first uses the struct
>> for the access. The second does not.

>
> I never really considered this beyond a first glance.
>
> Again, to be crystal clear, consider:
> /* 1 */
> a -> x = 2;
> and
> /* 2 */
> * (& ( a -> x )) = 2;
> and
> /* 3 */
> int* x =& a->x;
> *x = 2;
>
> You really think there's a difference? Really? Where's the difference?
> Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
> be entirely equivalent, or I'm really losing it.
>

Yes, I think /* 1 */ is different from /* 2 */ in that /* 1 */ involves
the type of a's struct in the access. /* 2 */ is equivalen to /* 3 */ I
think.

Anyway, the committee says in the union-DR that this is UB:

union A { int a; float b; } u;
u.a = 0;
float *b = &u.b;
*b = 0.f;
// *&u.b = 0.f; // i think this is equivalent

The only way I can use aliasing rule to get to UB is: The object at "u"
has effective type "union A" (with a sizeof union A) and effective type
int (with a sizeof int). If you access it with merely "int", you access
the object whose' effecive type is A with an lvalue of type int. And
have undefined behavior.

But I don't think that this makes sense. It would mean the following is
UB too:

struct A { int a; } b;
b.a = 0;
*&b.a = 0;

Same situation. We access an object whose effective type is struct A by
an lvalue of type int. So I can't follow the committee's intent here
anyway. I.e whatever you might think about /* 1 */ and /* 2 */ having
apparently different semantics, I can neither explain nor understand the
extent of it.

> As a naive understanding for the difference between 1 and 2: The
> referred to the lvalue, and then the dereference operator (*) simply
> takes that pointer value and returns back the same lvalue (which
> refers to the same object). This isn't operator overloading in C++. I
> would think that it ought to be a noop. If there is any difference at
> all between any of 1, 2, and 3 above in this post, then I have a
> fundamental misunderstanding of the language.

I thought we agreed that "a.b = ..." and "*x = ..." are different in
that the type of "a" has some influence on the access, in order to deem
the following UB.

typedef struct A { int a; } A;
typedef struct B { int a; } B;
A *x = malloc(sizeof *a);
x->a = 0; // access with effective type A and int
((B*)x)->a = 0; // I thought we agreed this is UB
// and committee intent.

I think *I* am misunderstanding the matter rather than you

Johannes Schaub (litb)
Guest
Posts: n/a

 02-08-2011
On 07.02.2011 21:11, Joshua Maurice wrote:
> On Feb 7, 8:09 am, "Johannes Schaub (litb)"
> <(E-Mail Removed)> wrote:
>> Joshua Maurice wrote:
>>> On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
>>> <(E-Mail Removed)> wrote:
>>>> [snipped]
>>> To be clear, you think that there's a difference between
>>> a->x = 2;
>>> and
>>> int* x =& a->x;
>>> *x = 2;
>>> ?

>>
>>> It would take me a long time to buy that.

>>
>> Yes I think there is a difference between te two. The first uses the struct
>> for the access. The second does not.

>
> I never really considered this beyond a first glance.
>
> Again, to be crystal clear, consider:
> /* 1 */
> a -> x = 2;
> and
> /* 2 */
> * (& ( a -> x )) = 2;
> and
> /* 3 */
> int* x =& a->x;
> *x = 2;
>
> You really think there's a difference? Really? Where's the difference?
> Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
> be entirely equivalent, or I'm really losing it.
>

Yes, I think /* 1 */ is different from /* 2 */ in that /* 1 */ involves
the type of a's struct in the access. /* 2 */ is equivalen to /* 3 */ I
think.

Anyway, the committee says in the union-DR that this is UB:

union A { int a; float b; } u;
u.a = 0;
float *b = &u.b;
*b = 0.f;
// *&u.b = 0.f; // i think this is equivalent

The only way I can use aliasing rule to get to UB is: The object at "u"
has effective type "union A" (with a sizeof union A) and effective type
int (with a sizeof int). If you access it with merely "int", you access
the object whose' effecive type is A with an lvalue of type int. And
have undefined behavior.

But I don't think that this makes sense. It would mean the following is
UB too:

struct A { int a; } b;
b.a = 0;
*&b.a = 0;

Same situation. We access an object whose effective type is struct A by
an lvalue of type int. So I can't follow the committee's intent here
anyway. I.e whatever you might think about /* 1 */ and /* 2 */ having
apparently different semantics, I can neither explain nor understand the
extent of it.

> As a naive understanding for the difference between 1 and 2: The
> referred to the lvalue, and then the dereference operator (*) simply
> takes that pointer value and returns back the same lvalue (which
> refers to the same object). This isn't operator overloading in C++. I
> would think that it ought to be a noop. If there is any difference at
> all between any of 1, 2, and 3 above in this post, then I have a
> fundamental misunderstanding of the language.

I thought we agreed that "a.b = ..." and "*x = ..." are different in
that the type of "a" has some influence on the access, in order to deem
the following UB.

typedef struct A { int a; } A;
typedef struct B { int a; } B;
A *x = malloc(sizeof *a);
x->a = 0; // access with effective type A and int
((B*)x)->a = 0; // I thought we agreed this is UB
// and committee intent.

I think *I* am misunderstanding the matter rather than you

Tim Rentsch
Guest
Posts: n/a

 02-08-2011
"Johannes Schaub (litb)" <(E-Mail Removed)> writes:

> On 07.02.2011 12:10, Tim Rentsch wrote:
>> "Johannes Schaub (litb)"<(E-Mail Removed)> writes:
>>
>>> [snip]
>>>
>>> In particular, I think the committee intends the spec to say that a struct
>>> or union access expression involves an access with the struct or union
>>> lvalue.
>>>
>>> T1 *p = malloc(sizeof *p);
>>> p->x = 0;
>>>
>>> In this case, I think the committee's intent is that the object pointed to
>>> by "p" is accesse by an lvalue of type T1, and so the effective type of the
>>> object containing the int changes to T1. So a later cast and access by an
>>> lvalue of T2 will be undefined behavior.

>>
>> I'm not aware of any evidence that supports this theory (ie,
>> that using '.' or '->' is also an access for the left operand).
>> Furthermore it seems to be in conflict with the definitions the
>> Standard gives for access, value, etc.
>>
>> Do you have any such evidence to offer? Or are you simply
>> stating an unsupported opinion?

>
> The committee argues that way in the union DR. See http://www.open-
> std.org/jtc1/sc22/wg14/www/docs/dr_236.htm .

Actually they don't. You might infer that's what they are thinking,
but no such position is stated, nor is it necessary to reach the
conclusions they reach.

Johannes Schaub (litb)
Guest
Posts: n/a

 02-08-2011
On 08.02.2011 04:17, Tim Rentsch wrote:
> "Johannes Schaub (litb)"<(E-Mail Removed)> writes:
>
>> On 07.02.2011 21:11, Joshua Maurice wrote:
>>> On Feb 7, 8:09 am, "Johannes Schaub (litb)"
>>> <(E-Mail Removed)> wrote:
>>>> Joshua Maurice wrote:
>>>>> On Feb 6, 3:34 pm, "Johannes Schaub (litb)"
>>>>> <(E-Mail Removed)> wrote:
>>>>>> [snipped]
>>>>> To be clear, you think that there's a difference between
>>>>> a->x = 2;
>>>>> and
>>>>> int* x =& a->x;
>>>>> *x = 2;
>>>>> ?
>>>>
>>>>> It would take me a long time to buy that.
>>>>
>>>> Yes I think there is a difference between te two. The first uses the struct
>>>> for the access. The second does not.
>>>
>>> I never really considered this beyond a first glance.
>>>
>>> Again, to be crystal clear, consider:
>>> /* 1 */
>>> a -> x = 2;
>>> and
>>> /* 2 */
>>> * (& ( a -> x )) = 2;
>>> and
>>> /* 3 */
>>> int* x =& a->x;
>>> *x = 2;
>>>
>>> You really think there's a difference? Really? Where's the difference?
>>> Between 1 and 2, or 2 and 3? I /hope/ between 1 and 2. 2 and 3 better
>>> be entirely equivalent, or I'm really losing it.
>>>

>>
>> Yes, I think /* 1 */ is different from /* 2 */ in that /* 1 */
>> involves the type of a's struct in the access. /* 2 */ is equivalen to
>> /* 3 */ I think.
>>
>> Anyway, the committee says in the union-DR that this is UB:
>>
>> union A { int a; float b; } u;
>> u.a = 0;
>> float *b =&u.b;
>> *b = 0.f;
>> // *&u.b = 0.f; // i think this is equivalent

>
> Assuming you're talking about DR 236, they say no such thing.

Then I encourage you to tell us what else they say by:

> Committee believes that Example 2 violates the aliasing rules in 6.5 paragraph 7:
>
> "an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union)."
> In order to not violate the rules, function f in example should be written as:
> union tag {
> int mi;
> double md;
> } u;
> void f(int *qi, double *qd) {
> int i = *qi + 2;
> u.md = 3.1; // union type must be used when changing effective type
> *qd *= i;
> return;
> }

Johannes Schaub (litb)
Guest
Posts: n/a

 02-08-2011
On 08.02.2011 04:30, Johannes Schaub (litb) wrote:
>> Assuming you're talking about DR 236, they say no such thing.

>
> Then I encourage you to tell us what else they say by:
>
>> Committee believes that Example 2 violates the aliasing rules in 6.5
>> paragraph 7:
>>
>> "an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a
>> subaggregate or contained union)."
>> In order to not violate the rules, function f in example should be
>> written as:
>> union tag {
>> int mi;
>> double md;
>> } u;
>> void f(int *qi, double *qd) {
>> int i = *qi + 2;
>> u.md = 3.1; // union type must be used when changing
>> effective type
>> *qd *= i;
>> return;
>> }

>

Hm, it seems I may have misunderstood what they say. They actually seems
to say that a write like "*qd = 0" does *not* chagne the effective type
of the accessed object.

But that seems wrong, because the aliasing rule says that a write
changes the effective type *for that access* and for all further read
accesses. So WTF does the committee say!?