Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Is the aliasing rule symmetric?

Reply
Thread Tools

Is the aliasing rule symmetric?

 
 
ptyxs
Guest
Posts: n/a
 
      01-29-2011
On Jan 29, 12:18*am, Joshua Maurice <(E-Mail Removed)> wrote:
> On Jan 26, 8:38*am, Keith Thompson <(E-Mail Removed)> wrote:
>
> > Joshua Maurice <(E-Mail Removed)> writes:
> > > Is one good and the other not? If so, what's the important difference,
> > > and most importantly what part of the standard, if any, can be read to
> > > describe that difference?

>
> > As a matter of style, it's a much more verbose way of saying essentially
> > the same thing.

>
> So let me ask again, to you and anyone else. Is there any difference
> between the two programs:
>
> * #include <stddef.h>
> * #include <stdlib.h>
> * typedef struct T1 { int x; int y; } T1;
> * typedef struct T2 { int x; int y; } T2;
> * int main(void)
> * { T1 *p = malloc(sizeof *p);
> * * p->x = 1;
> * * p->y = 2;
> * * return p->y;
> * }
>
> and
>
> * #include <stddef.h>
> * #include <stdlib.h>
> * typedef struct T1 { int x; int y; } T1;
> * typedef struct T2 { int x; int y; } T2;
> * int main()
> * {
> * * void* p = malloc(sizeof(T1));
> * * * (int*) (((char*)p) + offsetof(T1, x)) = 1;
> * * * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> * * return ((T1*)p)->y;
> * }
>
> Specifically, I presume that everyone agrees C and C++ needs to
> support the first program with no UB. The interesting questions I have
> concern the second. Does the "return ((T1*)p)->y;" result in UB? Why?
> What's the important different between these two programs, and
> specifically the parts of the standards which explain the important
> differences.
>
> Also, if the second program has no UB, can we instead return "return
> ((T2*)p)->y;" for implementations which we've tested that T1 and T2
> have equivalent layout? That is, it might not be a portable program,
> but for those systems which there is no difference in layout, would
> the access through T2 have UB? Why?


Perhaps I missed something, but your first program does'nt compile on
my system :
error: invalid conversion from void* to T1*
 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      01-29-2011
On Jan 29, 1:15*am, ptyxs <(E-Mail Removed)> wrote:
> On Jan 29, 12:18*am, Joshua Maurice <(E-Mail Removed)> wrote:
> > So let me ask again, to you and anyone else. Is there any difference
> > between the two programs:

>
> > * #include <stddef.h>
> > * #include <stdlib.h>
> > * typedef struct T1 { int x; int y; } T1;
> > * typedef struct T2 { int x; int y; } T2;
> > * int main(void)
> > * { T1 *p = malloc(sizeof *p);
> > * * p->x = 1;
> > * * p->y = 2;
> > * * return p->y;
> > * }

>
> > and

>
> > * #include <stddef.h>
> > * #include <stdlib.h>
> > * typedef struct T1 { int x; int y; } T1;
> > * typedef struct T2 { int x; int y; } T2;
> > * int main()
> > * {
> > * * void* p = malloc(sizeof(T1));
> > * * * (int*) (((char*)p) + offsetof(T1, x)) = 1;
> > * * * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> > * * return ((T1*)p)->y;
> > * }

>
> > Specifically, I presume that everyone agrees C and C++ needs to
> > support the first program with no UB. The interesting questions I have
> > concern the second. Does the "return ((T1*)p)->y;" result in UB? Why?
> > What's the important different between these two programs, and
> > specifically the parts of the standards which explain the important
> > differences.

>
> > Also, if the second program has no UB, can we instead return "return
> > ((T2*)p)->y;" for implementations which we've tested that T1 and T2
> > have equivalent layout? That is, it might not be a portable program,
> > but for those systems which there is no difference in layout, would
> > the access through T2 have UB? Why?

>
> Perhaps I missed something, but your first program does'nt compile on
> my system :
> error: invalid conversion from void* to T1*


Sorry. I was thinking in C, and in C, void pointers implicitly convert
to any other point. Add an explicit cast to make it legal C and C++
(though very un-idiomatic C), as follows:
{ T1 *p = (T1*) malloc(sizeof *p);
 
Reply With Quote
 
 
 
 
dSpam@arcor.de
Guest
Posts: n/a
 
      02-03-2011
On 29 Jan., 00:18, Joshua Maurice <(E-Mail Removed)> wrote:
> ...
> Also, if the second program has no UB, can we instead return "return
> ((T2*)p)->y;" for implementations which we've tested that T1 and T2
> have equivalent layout? That is, it might not be a portable program,
> but for those systems which there is no difference in layout, would
> the access through T2 have UB? Why?


Behavior, upon use of a nonportable program construct, for which the C
Standard imposes no requirements is by definition 3.4.3 so-called
"undefined behavior".
(That doesn't imply that an existing implementation would behave
unpredictable. If you don't care for portability and have found by
inspection of code, data, machine architecture, whatever, that your
program behaves predictable, then don't worry.)
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      02-03-2011
On Feb 2, 11:26*pm, "(E-Mail Removed)" <(E-Mail Removed)> wrote:
> On 29 Jan., 00:18, Joshua Maurice <(E-Mail Removed)> wrote:
>
> > ...
> > Also, if the second program has no UB, can we instead return "return
> > ((T2*)p)->y;" for implementations which we've tested that T1 and T2
> > have equivalent layout? That is, it might not be a portable program,
> > but for those systems which there is no difference in layout, would
> > the access through T2 have UB? Why?

>
> Behavior, upon use of a nonportable program construct, for which the C
> Standard imposes no requirements is by definition 3.4.3 so-called
> "undefined behavior".
> (That doesn't imply that an existing implementation would behave
> unpredictable. If you don't care for portability and have found by
> inspection of code, data, machine architecture, whatever, that your
> program behaves predictable, then don't worry.)


I would kindly ask you to read the rest of this thread, and realize
that I am quite well versed on the issues at hand, and I understand
that the intent of the C standards committee was to make the following
program have undefined behavior.

#include <stdlib.h>
typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;
int main(void)
{ T1 *p = malloc(sizeof *p);
p->x = 1;
p->y = 2;
return ((T2*)p)->y;
}

I don't mean to dispute that's how people understand the standard. I
don't plan on writing any code any time soon that violates the well
understood intent of the standard.

However, that's not the rules as written. What I do want to discuss is
if there's any sensible reading of the standard as written which can
give the desired conclusion, while preserving idiomatic usages of C,
such as casting the return of malloc to a struct type pointer, and
assigning to members of that struct.

I note how you did not answer any of my questions, and instead read
the standard line given to those new to the issues. I understand that
this is a generally acceptable method of imparting information, but it
does not apply in this case.

Again: Which of the following programs have UB as written, 1, 2, both,
neither? Why? Please quote exact parts of the standard (C or C++) with
thorough reasoning. Which of the following would have UB if the return
was replaced with "return ((T2*)p)->y;", 1, 2, both, neither? Why?
Please quote exact parts of the standard (C or C++) with thorough
reasoning.

//program 1
#include <stddef.h>
#include <stdlib.h>
typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;
int main(void)
{ T1 *p = (T1*) malloc(sizeof *p);
p->x = 1;
p->y = 2;
return p->y;
}

//program 2
#include <stddef.h>
#include <stdlib.h>
typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;
int main()
{
void* p = malloc(sizeof(T1));
* (int*) (((char*)p) + offsetof(T1, x)) = 1;
* (int*) (((char*)p) + offsetof(T1, y)) = 2;
return ((T1*)p)->y;
}

PS: I hope the intended answer is both do not have UB as written. I
understand the answer that the first would have UB if the return was
changed to "return ((T2*)p)->y;", but even I cannot grasp at the
straws to come to the conclusion that program 2 would have UB if the
return was changed to "return ((T2*)p)->y;".

Note that the above is all assuming a particular implementation where
sizeof(T1) == sizeof(T1), and offsetof(T1, y) == offsetof(T2, y). I
know it's definitely not portable, but I don't see any /rules as
written/ which demand UB on all platforms. (offsetof(T1, x) == 0 and
offsetof(T2, x) == 0 by an already existing guarantee in the C and C++
standards.)

offsetof is just a macro which evaluates to an integer. So what that I
passed T1 to it. It shouldn't matter. I should be able to hardcore 4
in place of offsetof(T1, y) and offsetof(T2, y) and expect it to work
on some platforms, like the common x86 win32. I see nothing in program
2 that says we have an object of type or effective type T1 nor T2. I
see only writes through int lvalues. Moreover, I see little to no
difference between program 1 and program 2 in this regard - I see
little reason to talk about an object with type or effective type T1
nor T2 in program 1.

This is especially true in light of the rules for volatile, and the
rules of POSIX, win32 (maybe?) and C++0x threading, which heavily
interact with the definition of "access". What does "access" mean? I
would argue that if that word is to have any meaning, it means exactly
a read, a write, or both. What does it mean to access an object of
struct type with a member expression, ex: "x.y"? It definitely doesn't
read the full struct x, nor write the full struct x. Hell, it doesn't
even imply a read or a write, ex: "int * a = & x.y;". From our well
understood knowledge of threading, that is neither a read nor a write
of "x" nor "x.y". So, what can the strict aliasing rules say about
this? In the above programs, there is no single expression which we
can say "accesses" an object through a T1 nor T2 lvalue, unless we
want to start using two different contradictory definitions of the
word "access". The same conclusion can be reached through a discussion
of the observable behavior requirements of volatile objects.

Let me again emphasis that I don't plan to write production code like
this ever, but these simple examples elucidate the actual scope and
effect of the standards, such as the proper and correct way to write a
pooling memory allocator on top of malloc or the new operator.
Specifically, at least in the C++ case, we need to know when the
lifetimes of objects begin and end, and what it means to access an
object through an incorrectly typed lvalue.

Again, finally, as far as I can see, the only way to make program 1
have UB with the T2 cast return is to invent some rules which
explicitly mention data dependency analysis in the effective type
rules of C, in the object lifetime rules of C++ (or just copy the
effective type rules of C for POD types into C++), and in the allowed
lvalue access rules, aka the strict aliasing rules of C and C++. When
you consider the implications raised with volatile and threading which
heavily interact with the definition of "access", this seems like the
only way out.

Or, if you can prove me wrong, and help me understand an error of
mine, please do so.
 
Reply With Quote
 
Johannes Schaub (litb)
Guest
Posts: n/a
 
      02-06-2011
Joshua Maurice wrote:

> On Jan 21, 9:20 am, "Johannes Schaub (litb)" <(E-Mail Removed)>
> wrote:
>> Ben Bacarisse wrote:
>> > "Johannes Schaub (litb)" <(E-Mail Removed)> writes:

>>
>> >> Would we be allowed to do this in the opposite direction, if we know
>> >> that the alignment is fine?

>>
>> > What's the opposite direction? Are you asking if changing the int will
>> > change the value of *b? If so, yes (provided the new int value's bits
>> > do indeed affect the byte in question).

>>
>> I mean to ask: If aliasing of an A object by an lvalue of type B is OK,
>> is aliasing of a B object by an lvalue of type A OK?

>
> No. Don't think about it as an aliasing rule. Think about it as a rule
> which restricts the types of lvalues with which you can legally access
> objects.
>
> You can always access an object through a char or unsigned char
> lvalue. (Or maybe it's only for POD types - there's no consensus. I
> would only use char and unsigned char to access POD objects.)
>
> You can always access an object through a base class lvalue, but you
> can never do the reverse: you can never take a complete object of type
> T and access it through a derived type of type T.


You cannot access an object of derived class type and access it as a base
class lvalue either. You always need to point to the proper base class
subobject. If you try to directly access the complete object by a base class
lvalue, you will be lucky if it crashes.

In this sense it's the same for base/derived relationship in both
directions. If the base-class subobject and the complete object have the
same address, you can reinterpret_cast and if you aren't lucky you can
read/write with the resulting lvalue. If you do the proper thing and use an
implicit conversion or an explicit conversion (for the downcast), you have
defined behavior. But that has nothing to do with the aliasing rule. IMO the
respective bullet in 3.10p15 is flawed.

 
Reply With Quote
 
Johannes Schaub (litb)
Guest
Posts: n/a
 
      02-06-2011
Johannes Schaub (litb) wrote:

> Joshua Maurice wrote:
>
>> On Jan 21, 9:20 am, "Johannes Schaub (litb)" <(E-Mail Removed)>
>> wrote:
>>> Ben Bacarisse wrote:
>>> > "Johannes Schaub (litb)" <(E-Mail Removed)> writes:
>>>
>>> >> Would we be allowed to do this in the opposite direction, if we know
>>> >> that the alignment is fine?
>>>
>>> > What's the opposite direction? Are you asking if changing the int
>>> > will
>>> > change the value of *b? If so, yes (provided the new int value's bits
>>> > do indeed affect the byte in question).
>>>
>>> I mean to ask: If aliasing of an A object by an lvalue of type B is OK,
>>> is aliasing of a B object by an lvalue of type A OK?

>>
>> No. Don't think about it as an aliasing rule. Think about it as a rule
>> which restricts the types of lvalues with which you can legally access
>> objects.
>>
>> You can always access an object through a char or unsigned char
>> lvalue. (Or maybe it's only for POD types - there's no consensus. I
>> would only use char and unsigned char to access POD objects.)
>>
>> You can always access an object through a base class lvalue, but you
>> can never do the reverse: you can never take a complete object of type
>> T and access it through a derived type of type T.

>
> You cannot access an object of derived class type and access it as a base
> class lvalue either. You always need to point to the proper base class
> subobject. If you try to directly access the complete object by a base
> class lvalue, you will be lucky if it crashes.
>
> In this sense it's the same for base/derived relationship in both
> directions. If the base-class subobject and the complete object have the
> same address, you can reinterpret_cast and if you aren't lucky you can
> read/write with the resulting lvalue. If you do the proper thing and use
> an implicit conversion or an explicit conversion (for the downcast), you
> have defined behavior. But that has nothing to do with the aliasing rule.
> IMO the respective bullet in 3.10p15 is flawed.


Having thought about this again, I think the respective bullet is NOT
flawed. The bullet implies that you already have made a successful
conversion and have a proper lvalue.

We do actually have the reverse (access a base class object by the derived
class type), by means of "the dynamic type of the object" (first bullet). It
is catched by that, and to my surprise, if you turn around the bullet about
the base-class subobject rule according to symmetry rule, you get nearly the
same wording

- a type that is the (possibly cv-qualified) dynamic class type of
the type of the object

So I think we again see that the following rule seems to be true:

If aliasing of an A object by an lvalue of type B is OK,
is aliasing of a B object by an lvalue of type A OK?

Please correct me If I'm misunderstanding anything.
 
Reply With Quote
 
Johannes Schaub (litb)
Guest
Posts: n/a
 
      02-06-2011
Johannes Schaub (litb) wrote:

> Joshua Maurice wrote:
>
>> On Jan 21, 9:20 am, "Johannes Schaub (litb)" <(E-Mail Removed)>
>> wrote:
>>> Ben Bacarisse wrote:
>>> > "Johannes Schaub (litb)" <(E-Mail Removed)> writes:
>>>
>>> >> Would we be allowed to do this in the opposite direction, if we know
>>> >> that the alignment is fine?
>>>
>>> > What's the opposite direction? Are you asking if changing the int
>>> > will
>>> > change the value of *b? If so, yes (provided the new int value's bits
>>> > do indeed affect the byte in question).
>>>
>>> I mean to ask: If aliasing of an A object by an lvalue of type B is OK,
>>> is aliasing of a B object by an lvalue of type A OK?

>>
>> No. Don't think about it as an aliasing rule. Think about it as a rule
>> which restricts the types of lvalues with which you can legally access
>> objects.
>>
>> You can always access an object through a char or unsigned char
>> lvalue. (Or maybe it's only for POD types - there's no consensus. I
>> would only use char and unsigned char to access POD objects.)
>>
>> You can always access an object through a base class lvalue, but you
>> can never do the reverse: you can never take a complete object of type
>> T and access it through a derived type of type T.

>
> You cannot access an object of derived class type and access it as a base
> class lvalue either. You always need to point to the proper base class
> subobject. If you try to directly access the complete object by a base
> class lvalue, you will be lucky if it crashes.
>
> In this sense it's the same for base/derived relationship in both
> directions. If the base-class subobject and the complete object have the
> same address, you can reinterpret_cast and if you aren't lucky you can
> read/write with the resulting lvalue. If you do the proper thing and use
> an implicit conversion or an explicit conversion (for the downcast), you
> have defined behavior. But that has nothing to do with the aliasing rule.
> IMO the respective bullet in 3.10p15 is flawed.


Having thought about this again, I think the respective bullet is NOT
flawed. The bullet implies that you already have made a successful
conversion and have a proper lvalue.

I think we do actually have the reverse (access a base class object by the
derived class type),

- a type that is a (possibly cv-qualified) derived class type of the
dynamic type of the object

Converting from a "Base&" to a "Derived&" is already UB if the type of the
complete object of the object referred to is not of type "Derived" or not of
a type derived from Derived. So this rule too assumes that we have a proper
lvalue, and thus the symmetric equivalent to that bullet as above is true
too.

So I think we again see that the following rule seems to be true:

If aliasing of an A object by an lvalue of type B is OK,
is aliasing of a B object by an lvalue of type A OK?

Please correct me If I'm misunderstanding anything.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      02-06-2011
On Feb 6, 7:46*am, "Johannes Schaub (litb)"
<(E-Mail Removed)> wrote:
> Joshua Maurice wrote:
> > No. Don't think about it as an aliasing rule. Think about it as a rule
> > which restricts the types of lvalues with which you can legally access
> > objects.

>
> > You can always access an object through a char or unsigned char
> > lvalue. (Or maybe it's only for POD types - there's no consensus. I
> > would only use char and unsigned char to access POD objects.)

>
> > You can always access an object through a base class lvalue, but you
> > can never do the reverse: you can never take a complete object of type
> > T and access it through a derived type of type T.

>
> You cannot access an object of derived class type and access it as a base
> class lvalue either. You always need to point to the proper base class
> subobject. If you try to directly access the complete object by a base class
> lvalue, you will be lucky if it crashes.
>
> In this sense it's the same for base/derived relationship in both
> directions. If the base-class subobject and the complete object have the
> same address, you can reinterpret_cast and if you aren't lucky you can
> read/write with the resulting lvalue. If you do the proper thing and use an
> implicit conversion or an explicit conversion (for the downcast), you have
> defined behavior. But that has nothing to do with the aliasing rule. IMO the
> respective bullet in 3.10p15 is flawed.


Indeed and agreed. Pedantic, but still important. This becomes evident
in multiple inheritance and virtual inheritance cases.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      02-06-2011
On Feb 6, 7:55*am, "Johannes Schaub (litb)"
<(E-Mail Removed)> wrote:
> Johannes Schaub (litb) wrote:
> > Joshua Maurice wrote:

>
> >> On Jan 21, 9:20 am, "Johannes Schaub (litb)" <(E-Mail Removed)>
> >> wrote:
> >>> Ben Bacarisse wrote:
> >>> > "Johannes Schaub (litb)" <(E-Mail Removed)> writes:

>
> >>> >> Would we be allowed to do this in the opposite direction, if we know
> >>> >> that the alignment is fine?

>
> >>> > What's the opposite direction? *Are you asking if changing the int
> >>> > will
> >>> > change the value of *b? *If so, yes (provided the new int value's bits
> >>> > do indeed affect the byte in question).

>
> >>> I mean to ask: If aliasing of an A object by an lvalue of type B is OK,
> >>> is aliasing of a B object by an lvalue of type A OK?

>
> >> No. Don't think about it as an aliasing rule. Think about it as a rule
> >> which restricts the types of lvalues with which you can legally access
> >> objects.

>
> >> You can always access an object through a char or unsigned char
> >> lvalue. (Or maybe it's only for POD types - there's no consensus. I
> >> would only use char and unsigned char to access POD objects.)

>
> >> You can always access an object through a base class lvalue, but you
> >> can never do the reverse: you can never take a complete object of type
> >> T and access it through a derived type of type T.

>
> > You cannot access an object of derived class type and access it as a base
> > class lvalue either. You always need to point to the proper base class
> > subobject. If you try to directly access the complete object by a base
> > class lvalue, you will be lucky if it crashes.

>
> > In this sense it's the same for base/derived relationship in both
> > directions. If the base-class subobject and the complete object have the
> > same address, you can reinterpret_cast and if you aren't lucky you can
> > read/write with the resulting lvalue. If you do the proper thing and use
> > an implicit conversion or an explicit conversion (for the downcast), you
> > have defined behavior. But that has nothing to do with the aliasing rule.
> > IMO the respective bullet in 3.10p15 is flawed.

>
> Having thought about this again, I think the respective bullet is NOT
> flawed. The bullet implies that you already have made a successful
> conversion and have a proper lvalue.
>
> We do actually have the reverse (access a base class object by the derived
> class type), by means of "the dynamic type of the object" (first bullet). It
> is catched by that, and to my surprise, if you turn around the bullet about
> the base-class subobject rule according to symmetry rule, you get nearly the
> same wording
>
> * - a type that is the (possibly cv-qualified) dynamic class type of
> * * the type of the object


Implicit in that entire piece of standard is that you obtained that
lvalue through a "proper" explicit or implicit conversion or cast. If
you start throwing around reinterpret_casts, then it's quite easy to
break it. Consider:
struct A { int x; };
struct B : A {};
int main()
{
B b;
b.x = 1;
A* a = & b;
return a->x;
}
Now, what's left is quite pedantic, and I'm not sure of the exact
nomenclature. When I access the base class subobject ala "return a-
>x;", is that consider "accessing the stored value of the [derived

class] object" according to the wording of C++03 "3.10 Lvalues and
rvalues / 15" ? I presume yes. Those bullets are there just as
allowance that you /can/ access base class subobjects through base
class type lvalues and through lvalues of the member types, and you
can access the object through the dynamic type of the object. It
doesn't mention that the lvalues must have been properly obtained -
the following is an example of improperly obtaining the lvalue:
int main()
{
B b;
b.x = 1;
A* a = reinterpret_cast<A*>(&b);
return a->x;
}
Is the above UB? I don't know. Maybe? Either way you should never do
it. It definitely is UB if we have virtual or multiple inheritance.
Where is this fundamental distinction mentioned in the standard?
Nowhere where I can see.

> So I think we again see that the following rule seems to be true:
>
> * * If aliasing of an A object by an lvalue of type B is OK,
> * * is aliasing of a B object by an lvalue of type A OK?
>
> Please correct me If I'm misunderstanding anything.


Well, yes. If you have an A object, and you can access a sub-object of
that, or a containing object of that, through a B lvalue, then you can
definitely take that same B object, and access the corresponding A
object through an A lvalue. Are you trying to say something more?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is the aliasing rule symmetric? Johannes Schaub (litb) C Programming 110 02-16-2011 11:00 AM
Is the aliasing rule symmetric? Johannes Schaub (litb) C++ 2 01-21-2011 11:30 PM
how to add validation rule for url in the validation-rule.xml ,I added some thing like this but......... shailajabtech@gmail.com Java 0 10-12-2006 08:36 AM
Anti-aliasing GIF Images Kevin Bertman Java 4 11-29-2004 05:46 AM
LCD anti-aliasing in Java Tim Tyler Java 2 09-05-2003 09:01 AM



Advertisments