Velocity Reviews > Is the aliasing rule symmetric?

# Is the aliasing rule symmetric?

James Kanze
Guest
Posts: n/a

 01-24-2011
On Jan 22, 10:29 pm, Joshua Maurice <(E-Mail Removed)> wrote:
> On Jan 22, 11:36 am, Seebs <(E-Mail Removed)> wrote:
> > On 2011-01-22, James Kuyper <(E-Mail Removed)> wrote:

> > > On 01/21/2011 06:36 PM, Joshua Maurice wrote:
> > > ...
> > >> No. Don't think about it as an aliasing rule. Think about it as a rule
> > >> which restricts the types of lvalues with which you can legally access
> > >> objects.
> > > What distinction is there between an aliasing rule and that description?
> > > That description seems, to me, to be a fairly good definition of what
> > > "aliasing rule" means, at least in the context of C or C++.

> > return foo(int *i1, int *i2) {
> > *i1 = 1;
> > *i2 = 2;
> > return *i1;
> > }

> > The reason this might return either 1 or 2 is aliasing, but has nothing
> > to do with the types with which you can legally access objects.

> Apparently it was cross-posted, and I didn't notice, and someone
> replied without the cross posting, and I definitely didn't notice.

> Seebs is right that
> int foo(int *i1, int *i2) {
> *i1 = 1;
> *i2 = 2;
> return *i1;
> }
> If i1 and i2 alias, then it returns 2. If they don't alias, then it
> returns 1. This function doesn't have a violation of the "strict
> aliasing rules", which would be better called "effective type access
> rules".
>
> Let's consider this function though:
> int foo(int* x, short* y)
> {
> *x = 1;
> *y = 2;
> return 1;
> }
> int bar(int* x, short* y)
> {
> *x = 1;
> *y = 2;
> return *x;
> }

> Let's consider functions foo and bar. Let's suppose that x and
> y alias in both. For function foo, there is no undefined
> behavior even though both alias (at least according to what
> appears to be the prominent interpretation of these rules).

I'm not sure about C here, but in C++, there is definitely
undefined behavior in foo if x and y alias. In fact, there
would be undefined behavior even if foo were simply:

void foo(const int* x, const short* y)
{
printf("%d, %d\n", *x, *y)
}

If the two pointers point to the same physical address, there is
no way that the memory they point to can be both an int and
a short. And the C++ standard clearly says:
If a program attempts to access the stored value of an
object through an lvalue of other than one of the following
types the behavior is undefined:
[...]
and short for int or vice versa isn't in the list. And I'm
certain that the intent in C is the same: C definitly allows
trapping representations for integer values, and reading part of
an int as a short could conceivably result in a trapping
representation for a short. (Think of a one's complement
machine which traps on -0.)

The problem becomes more interesting if we replace short with
unsigned char. In that case, my version is legal and defined
behavior: accessing a stored value through an lvalue of char or
unsgiend char type is in the list after the cited paragraph.
(IIRC, in C, this exception only applies to unsigned char; for
some reason, C++ added plain char to the list.) But what about
the original version, which modifies. Is modifying an int
through an unsigned char* undefined behavior? What if it
results in a trapping representation in the int? Or is it just
undefined behavior if you access the int? And of course,
modifying all of the bytes in the int, from a bytewise copy of
another int, has to be fully defined behavior.

--
James Kanze

Joshua Maurice
Guest
Posts: n/a

 01-24-2011
On Jan 24, 2:49*am, James Kanze <(E-Mail Removed)> wrote:
> On Jan 22, 10:29 pm, Joshua Maurice <(E-Mail Removed)> wrote:
> > Let's consider this function though:
> > * int foo(int* x, short* y)
> > * {
> > * * *x = 1;
> > * * *y = 2;
> > * * return 1;
> > * }
> > * int bar(int* x, short* y)
> > * {
> > * * *x = 1;
> > * * *y = 2;
> > * * return *x;
> > * }
> > Let's consider functions foo and bar. Let's suppose that x and
> > y alias in both. For function foo, there is no undefined
> > behavior even though both alias (at least according to what
> > appears to be the prominent interpretation of these rules).

>
> I'm not sure about C here, but in C++, there is definitely
> undefined behavior in foo if x and y alias. *In fact, there
> would be undefined behavior even if foo were simply:
>
> * * void foo(const int* x, const short* y)
> * * {
> * * * * printf("%d, %d\n", *x, *y)
> * * }
>
> If the two pointers point to the same physical address, there is
> no way that the memory they point to can be both an int and
> a short. *And the C++ standard clearly says:
> * * If a program attempts to access the stored value of an
> * * object through an lvalue of other than one of the following
> * * types the behavior is undefined:
> * * [...]
> and short for int or vice versa isn't in the list. *And I'm
> certain that the intent in C is the same: C definitly allows
> trapping representations for integer values, and reading part of
> an int as a short could conceivably result in a trapping
> representation for a short. *(Think of a one's complement
> machine which traps on -0.)
>
> The problem becomes more interesting if we replace short with
> unsigned char. *In that case, my version is legal and defined
> behavior: accessing a stored value through an lvalue of char or
> unsgiend char type is in the list after the cited paragraph.
> (IIRC, in C, this exception only applies to unsigned char; for
> some reason, C++ added plain char to the list.) *But what about
> the original version, which modifies. *Is modifying an int
> through an unsigned char* undefined behavior? *What if it
> results in a trapping representation in the int? *Or is it just
> undefined behavior if you access the int? *And of course,
> modifying all of the bytes in the int, from a bytewise copy of
> another int, has to be fully defined behavior.

Is the following a well-formed C++ program without UB?

#include <cstdlib>
using namespace std;
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
int*x = (int*) p;
*x = 1;
}

#include <cstdlib>
using namespace std;
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
int*x = (int*) p;
*x = 1;
float* y = (float*) p;
*y = 1;
}

issues, and I've gotten 0 replies. It's quite frustrating.

In short, I would argue that both of the above programs have no UB in C
++, nor their equivalent program in C. You need both programs above to
have no UB in order to have user-space memory allocators in standard
conforming C++. I think that the standard's intent is not to forbid
user-space C++ standard conforming pooling memory allocators.

Let's look at "3.8 Objectlifetime / 1, 2, 4, 5, 6, and 7". Each of
those sections make reference to "reusing storage", something which is
distinct from "releasing the storage". "Reusing the storage" of an
object ends that object's lifetime. What else can this mean besides
the following?

void* p = malloc(sizeof(int) + sizeof(float));
int*x = (int*) p;
*x = 1;
float* y = (float*) p;
*y = 1; /* reuse of storage, the int object's lifetime ends, and
the float object's lifetime begins */

Furthermore, let's look at the rules in "3.8 Object Lifetime". "3.8
Object Lifetime / 1" is actually nonsensical as written. Consider:
void* p = malloc(sizeof(char))
Well, we've allocated storage with proper alignment and type for an
arbitrarily large number of types, and if those types have a trivial
constructor, such as:
struct T1 {};
struct T2 {};
struct T3 {};
//etc.
then an object of each of those types exists at that location. So, an
arbitrarily large number of distinct complete objects coexist in "*p"
according to that reading of the rules, which is entirely
nonsensical.

Unfortunately, as I've expounded at length in the thread on comp.std.c+
+, the sensible way forward isn't clear. However, some of the proposed
changes to C++0x in 3.10 / 15 are taking the language in quite the
wrong direction IMO.

We need to solve a couple of basic problems. The most important and
basic is: when does the lifetime of a POD class even begin? Consider:

#include <cstdlib>
using namespace std;

struct T1 { int x; int y; };
struct T2 { int x; int y; };

int main()
{
void* p = 0;
T1 * t1 = 0;
T2 * t2 = 0;
int * x = 0;

if (sizeof(T1) != sizeof(T2))
return 1;
if ( (char*)(& t1->y) - (char*) (& t1) != (char*)(& t2->y) -
(char*) (& t2) )
return 1;

p = malloc(sizeof(T1));
/* Do we have a T1 object here? Presumably no. Otherwise we also
have a T2 object here, and we definitely don't want to start talking
about two distinct complete objects occupying the same storage at the
same time. */

t1 = (T1*) p;
/* T1 object yet? Presumably the answer hasn't changed since the
above comment. */

x = & t1->x;
/* T1 object yet? */

*x = 1;
/* Do we have a T1 object here? Maybe. I just see a write through
an int lvalue. I see no writes nor reads through a T1 lvalue. I see
nothing that favors T1 over T2, besides some sort of data dependency
analysis through the member-of operator. However, there isn't even a
hint of data dependency analysis in the standard with regards to

x = & t1->y;
*x = 2;
/* Do we have a T1 object here? The answer must be yes, or we'll
never have a T1 object. However, again, I see nothing to favor having
a T1 object over a T2 object besides data dependency analysis through
the member-of operator. */

t2 = (T2*) p;
return t2->y; /* UB? Why? Why is reading "t1->y" not UB, but
reading "t2->y" is UB? In other words, why do we have a T1 object, but
not a T2 object? */
}

Also, what if we used offsetof hackery to initialize both int members
of the T1 object without using a member-of operator on a T1 lvalue?

As far as I can tell, gcc doesn't even bother doing aliasing analysis
on anything besides primitive types, for exactly the reasons outlined
above. They must not have seen a sensible way to differentiate between
T1 and T2, just as I cannot.

Joshua Maurice
Guest
Posts: n/a

 01-24-2011
On Jan 24, 3:44*pm, Joshua Maurice <(E-Mail Removed)> wrote:
> On Jan 24, 2:49*am, James Kanze <(E-Mail Removed)> wrote:
> * * if ( (char*)(& t1->y) - (char*) (& t1) != (char*)(& t2->y) -
> (char*) (& t2) )
> * * * return 1;

Err, let me fix that. That should be:

if ( (char*)(& t1->y) - (char*) (t1) != (char*)(& t2->y) -
(char*) (t2) )
return 1;

The goal is to test if they have the same layout, and only run the
rest of the program if they do.

Ben Bacarisse
Guest
Posts: n/a

 01-25-2011

This is cross posted and I don't think the C side of the questions have

Joshua Maurice <(E-Mail Removed)> writes:

> On Jan 24, 2:49Â*am, James Kanze <(E-Mail Removed)> wrote:
>> On Jan 22, 10:29 pm, Joshua Maurice <(E-Mail Removed)> wrote:
>> > Let's consider this function though:
>> > Â* int foo(int* x, short* y)
>> > Â* {
>> > Â* Â* *x = 1;
>> > Â* Â* *y = 2;
>> > Â* Â* return 1;
>> > Â* }
>> > Â* int bar(int* x, short* y)
>> > Â* {
>> > Â* Â* *x = 1;
>> > Â* Â* *y = 2;
>> > Â* Â* return *x;
>> > Â* }
>> > Let's consider functions foo and bar. Let's suppose that x and
>> > y alias in both. For function foo, there is no undefined
>> > behavior even though both alias (at least according to what
>> > appears to be the prominent interpretation of these rules).

>>
>> I'm not sure about C here, but in C++, there is definitely
>> undefined behavior in foo if x and y alias.

If x and y both point to the same allocated object, then neither
function is undefined. The assignments set the "effective type" of the
allocated object.

>> Â*In fact, there
>> would be undefined behavior even if foo were simply:
>>
>> Â* Â* void foo(const int* x, const short* y)
>> Â* Â* {
>> Â* Â* Â* Â* printf("%d, %d\n", *x, *y)
>> Â* Â* }

Again, this is not always UB when the object being aliased is allocated
rather than declared. When the aliased object is allocated, whether the
accesses are defined or not depends on the effective type of the aliased
allocated object. To be certain of UB when the pointers point to the
same allocated object you need something like this:

void foo(int *x, short *y)
{
*y = 1;
printf("%d\n", *x);
}

The assignment ensures that the effective type of the allocated object
is int so the the second is undefined.

>> If the two pointers point to the same physical address, there is
>> no way that the memory they point to can be both an int and
>> a short.

In C it can be if the storage is allocated and only stores are done (as
in the first foo and bar above).

>> Â*And the C++ standard clearly says:
>> Â* Â* If a program attempts to access the stored value of an
>> Â* Â* object through an lvalue of other than one of the following
>> Â* Â* types the behavior is undefined:
>> Â* Â* [...]
>> and short for int or vice versa isn't in the list. Â*And I'm
>> certain that the intent in C is the same: C definitly allows
>> trapping representations for integer values, and reading part of
>> an int as a short could conceivably result in a trapping
>> representation for a short. Â*(Think of a one's complement
>> machine which traps on -0.)
>>
>> The problem becomes more interesting if we replace short with
>> unsigned char. Â*In that case, my version is legal and defined
>> behavior: accessing a stored value through an lvalue of char or
>> unsgiend char type is in the list after the cited paragraph.
>> (IIRC, in C, this exception only applies to unsigned char; for
>> some reason, C++ added plain char to the list.)

In C, the wording is "a character type" which covers char and both
signed and unsigned char. As you say, it is odd (at last at first
glance -- I am not a C++ expert) that C++ added char but not signed
char to the list.

>> the original version, which modifies. Â*Is modifying an int
>> through an unsigned char* undefined behavior? Â*What if it
>> results in a trapping representation in the int? Â*Or is it just
>> undefined behavior if you access the int? Â*And of course,
>> modifying all of the bytes in the int, from a bytewise copy of
>> another int, has to be fully defined behavior.

>
> Is the following a well-formed C++ program without UB?
>
> #include <cstdlib>
> using namespace std;
> int main()
> {
> void* p = malloc(sizeof(int) + sizeof(float));
> int*x = (int*) p;
> *x = 1;
> }

You ask below about the C equivalents of these. This one would defined
in C given the obvious alterations required.

>
> #include <cstdlib>
> using namespace std;
> int main()
> {
> void* p = malloc(sizeof(int) + sizeof(float));
> int*x = (int*) p;
> *x = 1;
> float* y = (float*) p;
> *y = 1;
> }

Again, in C, this is well-defined due the definition of effective type.

> issues, and I've gotten 0 replies. It's quite frustrating.
>
> In short, I would argue that both of the above programs have no UB in C
> ++, nor their equivalent program in C.

<snip much more C++ specific questions>
--
Ben.

Joshua Maurice
Guest
Posts: n/a

 01-25-2011
On Jan 24, 7:15*pm, Ben Bacarisse <(E-Mail Removed)> wrote:
> > What about the following?

>
> > * #include <cstdlib>
> > * using namespace std;
> > * int main()
> > * {
> > * * void* p = malloc(sizeof(int) + sizeof(float));
> > * * int*x = (int*) p;
> > * * *x = 1;
> > * * float* y = (float*) p;
> > * * *y = 1;
> > * }

>
> Again, in C, this is well-defined due the definition of effective type.
>
> > issues, and I've gotten 0 replies. It's quite frustrating.

>
> > In short, I would argue that both of the above programs have no UB in C
> > ++, nor their equivalent program in C.

>
> <snip much more C++ specific questions>

I'd like to think that the rest of the questions applied to C as well
as C++. Surely simple things like:
#include <stdlib.h>
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
* ( (float*) p ) = 1;
* ( (int*) p ) = 1;
return * ( (int*) p );
}
either work in C and C++, or work in neither. The above is a well
formed C program and a well formed C++ program. I would be slightly
surprised if it had UB in one and not UB in the other.

I would also like to hear some of the actual people on the committees
weigh in on these particular questions. I've been asking questions
like these for a while now, and I have yet to hear compelling answers
from the people on the committees or those who would know, and/or the
actual compiler writers.

Ben Bacarisse
Guest
Posts: n/a

 01-25-2011
Joshua Maurice <(E-Mail Removed)> writes:

> On Jan 24, 7:15Â*pm, Ben Bacarisse <(E-Mail Removed)> wrote:
>> > What about the following?

>>
>> > Â* #include <cstdlib>
>> > Â* using namespace std;
>> > Â* int main()
>> > Â* {
>> > Â* Â* void* p = malloc(sizeof(int) + sizeof(float));
>> > Â* Â* int*x = (int*) p;
>> > Â* Â* *x = 1;
>> > Â* Â* float* y = (float*) p;
>> > Â* Â* *y = 1;
>> > Â* }

>>
>> Again, in C, this is well-defined due the definition of effective type.
>>
>> > issues, and I've gotten 0 replies. It's quite frustrating.

>>
>> > In short, I would argue that both of the above programs have no UB in C
>> > ++, nor their equivalent program in C.

>>
>> <snip much more C++ specific questions>

>
> I'd like to think that the rest of the questions applied to C as well
> as C++.

At that point you quoted some passages from the C++ standard and started
using phrases like "reused" which seems to be key to the C++ behaviour
but does not crop up in C. I worried that C-specific answers beyond
that point might just confuse matters. I got the feeling your real
worries were about whether C++ defined the code you were posting about.

> Surely simple things like:
> #include <stdlib.h>
> int main()
> {
> void* p = malloc(sizeof(int) + sizeof(float));
> * ( (float*) p ) = 1;
> * ( (int*) p ) = 1;
> return * ( (int*) p );
> }

That's fine in C. It is not really different from the example I did
comment on -- switching to a cast expression from an initialised
variable does not alter the meaning.

> either work in C and C++, or work in neither. The above is a well
> formed C program and a well formed C++ program. I would be slightly
> surprised if it had UB in one and not UB in the other.

So would I, but the C++ standard uses different language from the C
standard about the validity of such accesses so a difference (even an
unintended one) is possible.

> I would also like to hear some of the actual people on the committees
> weigh in on these particular questions. I've been asking questions
> like these for a while now, and I have yet to hear compelling answers
> from the people on the committees or those who would know, and/or the
> actual compiler writers.

Have you got some reason to suspect that there is a problem with any of
these programs in C? The C standard seems quite clear on these specific
questions.

--
Ben.

Johannes Schaub (litb)
Guest
Posts: n/a

 01-25-2011
Joshua Maurice wrote:

> On Jan 24, 7:15 pm, Ben Bacarisse <(E-Mail Removed)> wrote:
>> > What about the following?

>>
>> > #include <cstdlib>
>> > using namespace std;
>> > int main()
>> > {
>> > void* p = malloc(sizeof(int) + sizeof(float));
>> > int*x = (int*) p;
>> > *x = 1;
>> > float* y = (float*) p;
>> > *y = 1;
>> > }

>>
>> Again, in C, this is well-defined due the definition of effective type.
>>
>> > issues, and I've gotten 0 replies. It's quite frustrating.

>>
>> > In short, I would argue that both of the above programs have no UB in C
>> > ++, nor their equivalent program in C.

>>
>> <snip much more C++ specific questions>

>
> I'd like to think that the rest of the questions applied to C as well
> as C++. Surely simple things like:
> #include <stdlib.h>
> int main()
> {
> void* p = malloc(sizeof(int) + sizeof(float));
> * ( (float*) p ) = 1;
> * ( (int*) p ) = 1;
> return * ( (int*) p );
> }
> either work in C and C++, or work in neither. The above is a well
> formed C program and a well formed C++ program. I would be slightly
> surprised if it had UB in one and not UB in the other.
>

The C spec is clear that the above code is well-defined. The object "p"
points to has no declared type, so its effective type is the only measure of
type you have. The effective type by the float lvalue write is changed to
float for that write, and for all subsequent read-only accesses. The later
write by an int lvalue changes the effective type to an int for that write
and for all subsequent reads. Therefore, the last return is valid.

C++ doesn't have the concept of effective types, and the following is my
personal perception of the issue. Rather, in C++ the objects have type
themselfs, while in C types are merely an attribute of the access to
objects. So an object cannot exist without a type in C++; it wouldn't make
sense with the current model. The behavior of the above code is not clearly
defined in C++. Not even that other DR we talked about in comp.std.c++ fixes
it, because we are not really copying an object representation over in this
case.

Lots of people have different perception of the aliasing rules and of the
object model. In the GCC PR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286 they did not find
consensus on whether a plain write would start the lifetime of a new object
or not.

So what does it do? It probably does the same as in C. In the language
theory, the above is exactly the reason why unions work in C++. Writes to
different union members is exactly what starts and stops the lifetime of
respective member objects and what dictates the "active member" of the
union. Different people point to bullet 5 of 3.10/15, explaining that
lifetime stopping or starting would not be needed to make unions work, but
that's not quite correct. If you access an union member even through a class
member access expression, the final lvalue doesn't have the type of the
union. And even if you squint and assume the union lvalue is involved, you
have two lvalues that are involved, with the member lvalue causing an alias
conflict.

In the end here's another interesting difference in C and C++. In C and C++
the following have different meanings, if you go with the above explanation:

int a;
*(float*)&a = 0;

In C, this violates the aliasing rules and is undefined behavior, but in C++
it won't because as my assumption above states, the C++ model is solely
based on reads and writes to start and stop the lifetime of objects. In the
C++ case this is not necessarily undefined behavior. The size of float may
be larger than sizeof int or alignment requirements may be incompatible, in
which case it is undefined behavior. But sizeof(T) and alignment
requirements are implementation-defined, so this need not necessarily result
in undefined behavior *for a specific implementation* ("corresponding
instance" if the abstract machine). So the C++ standard requires the above
to work for certain implementations.

int a = 0;
float f = *(float*)&a;

This is undefined behavior in both C (same reason as above) and C++ (the
read won't stop the lifetime of the 'a' object of type int, and then violate
aliasing rules).

Personally I have given up aliasing and C++, because I feel I lack the
knowledge of platform details and compiler theory to make up a good
understanding. I will wait until this is fixed and then read a consistent
Standard, if that is ever going to happen.

Johannes Schaub (litb)
Guest
Posts: n/a

 01-25-2011

Joshua Maurice wrote:

> On Jan 24, 7:15 pm, Ben Bacarisse <(E-Mail Removed)> wrote:
>> > What about the following?

>>
>> > #include <cstdlib>
>> > using namespace std;
>> > int main()
>> > {
>> > void* p = malloc(sizeof(int) + sizeof(float));
>> > int*x = (int*) p;
>> > *x = 1;
>> > float* y = (float*) p;
>> > *y = 1;
>> > }

>>
>> Again, in C, this is well-defined due the definition of effective type.
>>
>> > issues, and I've gotten 0 replies. It's quite frustrating.

>>
>> > In short, I would argue that both of the above programs have no UB in C
>> > ++, nor their equivalent program in C.

>>
>> <snip much more C++ specific questions>

>
> I'd like to think that the rest of the questions applied to C as well
> as C++. Surely simple things like:
> #include <stdlib.h>
> int main()
> {
> void* p = malloc(sizeof(int) + sizeof(float));
> * ( (float*) p ) = 1;
> * ( (int*) p ) = 1;
> return * ( (int*) p );
> }
> either work in C and C++, or work in neither. The above is a well
> formed C program and a well formed C++ program. I would be slightly
> surprised if it had UB in one and not UB in the other.
>

The C spec is clear that the above code is well-defined. The object "p"
points to has no declared type, so its effective type is the only measure of
type you have. The effective type by the float lvalue write is changed to
float for that write, and for all subsequent read-only accesses. The later
write by an int lvalue changes the effective type to an int for that write
and for all subsequent reads. Therefore, the last return is valid.

C++ doesn't have the concept of effective types, and the following is my
personal perception of the issue. Rather, in C++ the objects have type
themselfs, while in C types are merely an attribute of the access to
objects. So an object cannot exist without a type in C++; it wouldn't make
sense with the current model. The behavior of the above code is not clearly
defined in C++. Not even that other DR we talked about in comp.std.c++ fixes
it, because we are not really copying an object representation over in this
case.

Lots of people have different perception of the aliasing rules and of the
object model. In the GCC PR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286 they did not find
consensus on whether a plain write would start the lifetime of a new object
or not.

So what does it do? It probably does the same as in C. In the language
theory, the above is exactly the reason why unions work in C++. Writes to
different union members is exactly what starts and stops the lifetime of
respective member objects and what dictates the "active member" of the
union. Different people point to bullet 5 of 3.10/15, explaining that
lifetime stopping or starting would not be needed to make unions work, but
that's not quite correct. If you access an union member even through a class
member access expression, the final lvalue doesn't have the type of the
union. And even if you squint and assume the union lvalue is involved, you
have two lvalues that are involved, with the member lvalue causing an alias
conflict.

In the end here's another interesting difference in C and C++. In C and C++
the following have different meanings, if you go with the above explanation:

int a;
*(float*)&a = 0;

In C, this violates the aliasing rules and is undefined behavior, but in C++
it won't because as my assumption above states, the C++ model is solely
based on reads and writes to start and stop the lifetime of objects. In the
C++ case this is not necessarily undefined behavior. The size of float may
be larger than sizeof int or alignment requirements may be incompatible, in
which case it is undefined behavior. But sizeof(T) and alignment
requirements are implementation-defined, so this need not necessarily result
in undefined behavior *for a specific implementation* ("corresponding
instance" if the abstract machine). So the C++ standard requires the above
to work for certain implementations.

int a = 0;
float f = *(float*)&a;

This is undefined behavior in both C (same reason as above) and C++ (the
read won't stop the lifetime of the 'a' object of type int, and then violate
aliasing rules).

Personally I have given up aliasing and C++, because I feel I lack the
knowledge of platform details and compiler theory to make up a good
understanding. I will wait until this is fixed and then read a consistent
Standard, if that is ever going to happen.

Joshua Maurice
Guest
Posts: n/a

 01-25-2011
On Jan 24, 8:10*pm, Ben Bacarisse <(E-Mail Removed)> wrote:
> Joshua Maurice <(E-Mail Removed)> writes:
> > On Jan 24, 7:15*pm, Ben Bacarisse <(E-Mail Removed)> wrote:
> >> > What about the following?

>
> >> > * #include <cstdlib>
> >> > * using namespace std;
> >> > * int main()
> >> > * {
> >> > * * void* p = malloc(sizeof(int) + sizeof(float));
> >> > * * int*x = (int*) p;
> >> > * * *x = 1;
> >> > * * float* y = (float*) p;
> >> > * * *y = 1;
> >> > * }

>
> >> Again, in C, this is well-defined due the definition of effective type..

>
> >> > issues, and I've gotten 0 replies. It's quite frustrating.

>
> >> > In short, I would argue that both of the above programs have no UB in C
> >> > ++, nor their equivalent program in C.

>
> >> <snip much more C++ specific questions>

>
> > I'd like to think that the rest of the questions applied to C as well
> > as C++.

>
> At that point you quoted some passages from the C++ standard and started
> using phrases like "reused" which seems to be key to the C++ behaviour
> but does not crop up in C. *I worried that C-specific answers beyond
> that point might just confuse matters. *I got the feeling your real
> worries were about whether C++ defined the code you were posting about.
>
> > Surely simple things like:
> > * #include <stdlib.h>
> > * int main()
> > * {
> > * * void* p = malloc(sizeof(int) + sizeof(float));
> > * * * ( (float*) p ) = 1;
> > * * * ( (int*) p ) = 1;
> > * * return * ( (int*) p );
> > * }

>
> That's fine in C. *It is not really different from the example I did
> comment on -- switching to a cast expression from an initialised
> variable does not alter the meaning.
>
> > either work in C and C++, or work in neither. The above is a well
> > formed C program and a well formed C++ program. I would be slightly
> > surprised if it had UB in one and not UB in the other.

>
> So would I, but the C++ standard uses different language from the C
> standard about the validity of such accesses so a difference (even an
> unintended one) is possible.
>
> > I would also like to hear some of the actual people on the committees
> > weigh in on these particular questions. *I've been asking questions
> > like these for a while now, and I have yet to hear compelling answers
> > from the people on the committees or those who would know, and/or the
> > actual compiler writers.

>
> Have you got some reason to suspect that there is a problem with any of
> these programs in C? *The C standard seems quite clear on these specific
> questions.

Yes. I've been getting various replies when I tweak the above program
just slightly.

#include <stdlib.h>
void foo(int* a, float* b)
{
*a = 1;
*b = 1;
}
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
foo((int*)p, (float*)p);
}

replies, with little follow up discussion.

One reply was that a piece of memory may have at most one effective
type between calls to malloc and free.

Another reply was that this is a DR in the C and C++ language specs,
known colloquially as the union DR.

Another reply was that the above program has perfectly well defined
behavior, but the following has undefined behavior:
#include <stdlib.h>
int foo(int* a, float* b)
{
*a = 1;
*b = 1;
return *a;
}
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
foo((int*)p, (float*)p);
}
Specifically, this example explains how the compiler might use
aliasing analysis for optimization purposes. A conforming compiler may
not simply assume that an int* and a float* do not alias. However, if
analysis shows that aliasing would result in UB (as it would in the
above program when "return *a;" reads a float object through an int
lvalue) then the compiler is free to do whatever it wants in the face
of the UB, including assume that they don't alias.

I think I like the third option best, but my personal preferences
don't dictate what compilers actually do.

Joshua Maurice
Guest
Posts: n/a

 01-25-2011
On Jan 24, 8:33*pm, "Johannes Schaub (litb)" <(E-Mail Removed)>
wrote:
> Joshua Maurice wrote:
> > I'd like to think that the rest of the questions applied to C as well
> > as C++. Surely simple things like:
> > * #include <stdlib.h>
> > * int main()
> > * {
> > * * void* p = malloc(sizeof(int) + sizeof(float));
> > * * * ( (float*) p ) = 1;
> > * * * ( (int*) p ) = 1;
> > * * return * ( (int*) p );
> > * }
> > either work in C and C++, or work in neither. The above is a well
> > formed C program and a well formed C++ program. I would be slightly
> > surprised if it had UB in one and not UB in the other.

>
> The C spec is clear that the above code is well-defined. The object "p"
> points to has no declared type, so its effective type is the only measure of
> type you have. The effective type by the float lvalue write is changed to
> float for that write, and for all subsequent read-only accesses. The later
> write by an int lvalue changes the effective type to an int for that write
> and for all subsequent reads. Therefore, the last return is valid.

Sure. I get this. This makes sense. A write starts the lifetime of an
object if one didn't already exist, and ends the lifetime of any
object previously existing in that memory.

> C++ doesn't have the concept of effective types, and the following is my
> personal perception of the issue. Rather, in C++ the objects have type
> themselfs, while in C types are merely an attribute of the access to
> objects. So an object cannot exist without a type in C++; it wouldn't make
> sense with the current model. The behavior of the above code is not clearly
> defined in C++. Not even that other DR we talked about in comp.std.c++ fixes
> it, because we are not really copying an object representation over in this
> case.

So, can you write a memory allocator which reuses memory in pure
conforming C++ on top of new and delete, or on top of malloc and free?
I'd like to think that you can, and I don't think that you can unless
the following program has no UB.

#include <stdlib.h>
int main()
{
void* p = malloc(sizeof(int) + sizeof(float));
* ( (float*) p ) = 1;
* ( (int*) p ) = 1;
return * ( (int*) p );
}

> Lots of people have different perception of the aliasing rules and of the
> object model. In the GCC PRhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286they did not find
> consensus on whether a plain write would start the lifetime of a new object
> or not.
>
> So what does it do? It probably does the same as in C. In the language
> theory, the above is exactly the reason why unions work in C++. Writes to
> different union members is exactly what starts and stops the lifetime of
> respective member objects and what dictates the "active member" of the
> union. Different people point to bullet 5 of 3.10/15, explaining that
> lifetime stopping or starting would not be needed to make unions work, but
> that's not quite correct. If you access an union member even through a class
> member access expression, the final lvalue doesn't have the type of the
> union. And even if you squint and assume the union lvalue is involved, you
> have two lvalues that are involved, with the member lvalue causing an alias
> conflict.

Yeah. I've never understood what it means to do a read or a write
through lvalue of a POD class type, in C or in C++. (In C++, due to
lvalues of non-POD class types.) All you can do with an lvalue of POD
class type is to use a member-of expression to get pointers or lvalues
to its members. All of the actual reads and writes are done through
lvalues of primitive type only.

> In the end here's another interesting difference in C and C++. In C and C++
> the following have different meanings, if you go with the above explanation:
>
> * * int a;
> * * *(float*)&a = 0;
>
> In C, this violates the aliasing rules and is undefined behavior,

It does? I must be confused then. Ignoring alignment and size issues,
the write ends the lifetime of the int object, and starts the lifetime
of a float object - the effective type rules.

Any further read of 'a' would be a read of a float object through an
int lvalue, which would be UB.

> but in C++
> it won't because as my assumption above states, the C++ model is solely
> based on reads and writes to start and stop the lifetime of objects.

Now I'm really confused. I thought we just agreed that the read and
write rules, aka the effective type rules, are a C thing and not a C++
thing.

> In the
> C++ case this is not necessarily undefined behavior. The size of float may
> be larger than sizeof int or alignment requirements may be incompatible, in
> which case it is undefined behavior. But sizeof(T) and alignment
> requirements are implementation-defined, so this need not necessarily result
> in undefined behavior *for a specific implementation* ("corresponding
> instance" if the abstract machine). So the C++ standard requires the above
> to work for certain implementations.
>
> * * int a = 0;
> * * float f = *(float*)&a;
>
> This is undefined behavior in both C (same reason as above) and C++ (the
> read won't stop the lifetime of the 'a' object of type int, and then violate
> aliasing rules).

I've always thought about it in the following way. My reading of the
so-called "strict aliasing rules" includes an obvious requirement that
you cannot read an object through a wrongly-typed lvalue. In the above
example, we're reading an int object through a float lvalue, and that
is broken in C and in C++.

> Personally I have given up aliasing and C++, because I feel I lack the
> knowledge of platform details and compiler theory to make up a good
> understanding. I will wait until this is fixed and then read a consistent
> Standard, if that is ever going to happen.

I hope we get such a standard too, but it's not looking likely. Thus
far, I think that the C++ standard is going even further in the wrong
direction with that one suggested fix to the DR. "Allocation types"?
Ugg.