Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Is the aliasing rule symmetric?

Reply
Thread Tools

Is the aliasing rule symmetric?

 
 
Joshua Maurice
Guest
Posts: n/a
 
      02-08-2011
On Feb 7, 7:37*pm, "Johannes Schaub (litb)"
<(E-Mail Removed)> wrote:
> On 08.02.2011 04:30, Johannes Schaub (litb) wrote:
>
>
>
> >> Assuming you're talking about DR 236, they say no such thing.

>
> > Then I encourage you to tell us what else they say by:

>
> >> Committee believes that Example 2 violates the aliasing rules in 6.5
> >> paragraph 7:

>
> >> "an aggregate or union type that includes one of the aforementioned
> >> types among its members (including, recursively, a member of a
> >> subaggregate or contained union)."
> >> In order to not violate the rules, function f in example should be
> >> written as:
> >> * *union tag {
> >> * * * * * * * * int mi;
> >> * * * * * * * * double md;
> >> * * * * } u;
> >> * * * * void f(int *qi, double *qd) {
> >> * * * * * * * * int i = *qi + 2;
> >> * * * * * * * * u.md = 3.1; * // union type must be used when changing
> >> effective type
> >> * * * * * * * * *qd *= i;
> >> * * * * return;
> >> * * * * }

>
> Hm, it seems I may have misunderstood what they say. They actually seems
> to say that a write like "*qd = 0" does *not* chagne the effective type
> of the accessed object.
>
> But that seems wrong, because the aliasing rule says that a write
> changes the effective type *for that access* and for all further read
> accesses. So WTF does the committee say!?


I wish I knew. The entire set of rules is inconsistent as written, and
judging from that one resolution to the DR in the C++ draft standard
(in that thread on comp.lang.c++ or whatever long ago [months?]), they
don't have a good idea of where to go. Apparently the C committee
isn't doing much better. Both committees can leave this unresolved, or
they can sit down together (or the C committee can dictate), and
figure out some equitable solution. There appear to be solutions, just
none of them are currently Rules As Written.

The first thing the standard(s) need to do is clear up any possible
differences for the following. The context is:
typedef struct T { int v; int x; } T;
The four examples are:
a -> x = 1;
and
* ( & (a -> x)) = 1;
and
int* x = & (a -> x);
*x = 1;
and
int* x = (int*) (((char*)a) + offsetof(T, x));
*x = 1;
Which of those allow the following read to be defined?
return a -> x;

Those 4 examples should appear basically verbatim in the standards,
and not as a non-binding notes but as actual (binding) examples which
show how these rules are supposed to work.

My initial naive take is that they should all be entirely equivalent.
Apparently this is not a widely agreed upon conclusion.

Furthermore, if they are all indeed equivalent, then you can't get
"reading a T2 object" through a "T2 lvalue" is UB.

My hope is that the first three are all equivalent, and the last is
where it's different. (Or we just abandon large parts of strict
aliasing, which doesn't seem too likely.) With that, you can have the
resulting pointer value and/or lvalue carry with it semantic
information that it came from a memberof expression on a T lvalue, and
thus the write through that pointer value and/or lvalue can be said to
change the effective type of the object to T. (Then, we just need to
get the C++ standard to come along, and we're all happy.)

PS: Is my rambling too much? I'm trying not to repeat myself. I think
I'm just recapping where I think the conversation is now with those
four examples.
 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      02-08-2011
On Feb 7, 7:59*pm, Joshua Maurice <(E-Mail Removed)> wrote:
> Furthermore, if they are all indeed equivalent, then you can't get
> "reading a T2 object" through a "T2 lvalue" is UB.


Typo. Should read:
> Furthermore, if they are all indeed equivalent, then you can't get "reading a *T1* object" through a "T2 lvalue" is *necessarily* UB.

 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      02-08-2011
On Feb 7, 6:29*pm, "Johannes Schaub (litb)"
<(E-Mail Removed)> wrote:
> On 07.02.2011 21:11, Joshua Maurice wrote:
> > As a naive understanding for the difference between 1 and 2: The
> > addressof operator (&) simply returns the address of the object
> > referred to the lvalue, and then the dereference operator (*) simply
> > takes that pointer value and returns back the same lvalue (which
> > refers to the same object). This isn't operator overloading in C++. I
> > would think that it ought to be a noop. If there is any difference at
> > all between any of 1, 2, and 3 above in this post, then I have a
> > fundamental misunderstanding of the language.

>
> I thought we agreed that "a.b = ..." and "*x = ..." are different in
> that the type of "a" has some influence on the access, in order to deem
> the following UB.
>
> * *typedef struct A { int a; } A;
> * *typedef struct B { int a; } B;
> * *A *x = malloc(sizeof *a);
> * *x->a = 0; // access with effective type A and int
> * *((B*)x)->a = 0; // I thought we agreed this is UB
> * * * * * * * * * *// and committee intent.
>
> I think *I* am misunderstanding the matter rather than you


Well, we agreed that's one possible resolution. I don't think we
agreed that's the Rules As Written. I definitely feel that it's
unintuitive. Sorry for that source of confusion.
 
Reply With Quote
 
Wojtek Lerch
Guest
Posts: n/a
 
      02-08-2011
On 07/02/2011 10:37 PM, Johannes Schaub (litb) wrote:
> But that seems wrong, because the aliasing rule says that a write
> changes the effective type *for that access* and for all further read
> accesses.


Only if the object doesn't have a declared type.

> So WTF does the committee say!?


That's a good question indeed. If you don't mind hearing my opinion
instead of the committee's, I find the whole concept of effective types
hopelessly underspecified. In particular:

* WTF is the declared type of an object? Does a subobject of a declared
object, such as a structure member or an array element, have a declared
type, or does it not? If not, then why is it OK to read the value of
the member without giving it an effective type first, such as in this
example:

struct S { int x, y; } a;
a = (struct S) { 1, 2 };
printf( "%d\n", a.x ); // Oops?

Also, why does the footnote mention allocated objects but not members
and elements as an example of objects without a declared type?

* But if the subobjects do have a declared type, then what happens when
the structure or array is the same object as its member or element
(which happens when the array has only one element, or the structure has
one member and no padding)? Does the object have two declared types at
the same time? Or possibly more, for instance when it's a union? (Or
do the rules for union members differ from those for structure members
and array elements?)

* Similarly, if I assign a structure to an allocated object, the
effective type of the object becomes the structure type; but do all the
subobjects that correspond to structure members acquire the
corresponding effective type? If not, then why is it OK to read a
member back?

* And, in general, when an assignment to an object gives it an effective
type, does it also erase the effective type of any overlapping objects
that had an effective type from previous assignments? If yes, does that
really mean that after I gave an allocated object a structure type by a
structure assigment, assigning a new value to one of the members erases
the effective type of the big object?

* WTF, exactly, does it mean to copy an object "as an array of a
character type"? Does it only cover algorithms that provably copy each
byte value from one object to the same byte position in the other, by a
simple assignment or a chain of such assignments, or does any algorithm
count as long as it can be proven to reconstruct the sequence of bytes,
even if it involves transforming the byte values, perhaps by some
complicated formula such as compression and decompression, encryption
and decryption, writing to a file and reading back (as either binary or
formatted text), or maybe even dictating to a person and having them
enter the values back, under oath?
 
Reply With Quote
 
dSpam@arcor.de
Guest
Posts: n/a
 
      02-08-2011
On 5 Feb., 23:22, Joshua Maurice <(E-Mail Removed)> wrote:
> On Feb 4, 7:32*am, "(E-Mail Removed)" <(E-Mail Removed)> wrote:
> > On 3 Feb., 09:02, Joshua Maurice <(E-Mail Removed)> wrote:
> > > ...
> > > I would kindly ask you to read the rest of this thread, and realize
> > > that I am quite well versed on the issues at hand, and I understand
> > > that the intent of the C standards committee was to make the following
> > > program have undefined behavior.

>
> > > * #include <stdlib.h>
> > > * typedef struct T1 { int x; int y; } T1;
> > > * typedef struct T2 { int x; int y; } T2;
> > > * int main(void)
> > > * { T1 *p = malloc(sizeof *p);
> > > * * p->x = 1;
> > > * * p->y = 2;
> > > * * return ((T2*)p)->y;
> > > * }

>
> > > I don't mean to dispute that's how people understand the standard. I
> > > don't plan on writing any code any time soon that violates the well
> > > understood intent of the standard.

>
> > > However, that's not the rules as written. What I do want to discuss is
> > > if there's any sensible reading of the standard as written which can
> > > give the desired conclusion, while preserving idiomatic usages of C,
> > > such as casting the return of malloc to a struct type pointer, and
> > > assigning to members of that struct.

>
> > For preserving idiomatic usages of C, such as casting the return of
> > malloc to a struct type pointer, and assigning to members of that
> > struct, it is not necessary that the above program's behavior were
> > defined, because the program construct which makes the standard impose
> > no requirements for the above program's behavior is not one of the
> > aforementioned.

>
> I'm sorry, I don't quite follow your English. It is idiomatic usage in
> C to implicitly cast the return of malloc to a struct pointer, then,
> assign to members of the struct through that pointer, then read or
> write those members through the pointer. My program 1 does exactly
> that. (Well, it has an explicit cast instead of implicit to make the
> code valid C++ as well, but minor point.) Thus the C (and C++)
> standard ought to say that program 1 as written has no UB.


Well, the phrase "preserving idiomatic usages of C, such as casting
the return of malloc to a struct type pointer, and assigning to
members of that struct" wasn't my English, it's yours; I just repeated
what you wrote in the cited paragraph above mine. (If something else I
wrote can't be followed, please indicate the clause.)
The actual cause for any misunderstanding here is perhaps that I
didn't comment on your "program 1" that might show up somewhere else
in this discussion, but on the program above exactly as you wrote it
in the message to which I replied.
Anyway, I do think that to cast the return of malloc to a struct
pointer, then, assign to members of the struct through that pointer,
then read or write those members through the pointer has defined
behavior because these basic operations are defined by the standard;
there is no need for the standard to state explicitly that behavior is
not undefined.

> > Unfortunately, at the moment I haven't the time to adress your further
> > concerns - maybe next week.

>
> Thank you for your time.


Never mind - the topic is of interest also to me.

> In short, I think my "interesting" questions are: For a platform where
> T1 and T2 have the same layout (aka this is not a question about
> portable semantics):
>
> - Do programs 1 and 2 have any UB as written? I presume that the
> answer is no UB in either as written.
>
> - Would program 2 have UB if the return was changed to "return
> ((T2*)p)->y;" ? Given no UB before, this change cannot introduce UB.
>
> - Would program 1 have UB if the return was changed to "return
> ((T2*)p)->y;" ? Now, here's my problem.
> ...


I sorry, among the many messages in this thread I didn't find the one
where you specified these programs 1 and 2. If you perhaps restated
them, I'd willingly delve into the questions.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      02-09-2011
On Feb 8, 5:13 am, "(E-Mail Removed)" <(E-Mail Removed)> wrote:
> I sorry, among the many messages in this thread I didn't find the one
> where you specified these programs 1 and 2. If you perhaps restated
> them, I'd willingly delve into the questions.


Sure. I think my current questions are as follows:

To save typing, consider the context:

#include <stddef.h>
#include <stdlib.h>
typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;
int main(void)
{
void* p = 0;
T1* a = 0;
T2* b = 0;
int* c = 0;
p = malloc(sizeof(T1));
/* ... */
}

And consider the subsequent modifications. Where do we go from no UB,
to UB? And why? And is that "why" documented in the C standard and
where?

/*1*/
a = p;
a->x = 1;
a->y = 2;
return a->y;

/*2*/
a = p;
* (int*) (((char*)p) + offsetof(T1, x)) = 1;
* (int*) (((char*)p) + offsetof(T1, y)) = 2;
return a->y;

/*3*/
* (int*) (((char*)p) + offsetof(T1, x)) = 1;
* (int*) (((char*)p) + offsetof(T1, y)) = 2;
return ((T1*)p)->y;

/*4*/
* (int*) (((char*)p) + 0) = 1;
* (int*) (((char*)p) + 4) = 2;
return ((T1*)p)->y;

/*5*/
* (int*) (((char*)p) + offsetof(T2, x)) = 1;
* (int*) (((char*)p) + offsetof(T2, y)) = 2;
return ((T1*)p)->y;

/*6*/
b = p;
* (int*) (((char*)p) + offsetof(T2, x)) = 1;
* (int*) (((char*)p) + offsetof(T2, y)) = 2;
return ((T1*)p)->y;

/*7*/
b = p;
b->x = 1;
b->y = 1;
return ((T1*)p)->y;

Now, I fully understand that program 1 is very idiomatically well
defined, and program 7 is well known to have UB /even if/ T1 and T2
have the same size ala sizeof and same memory layout ala offsetof.

I fully agree that one should not write code like this because it's
not portable. It's not portable precisely because T1 and T2 may not
have the same size and memory layout. (I would think that for such a
simple case like this, a compiler would be perverse to give them
different sizes or memory layouts, but I don't need that for my point
here.)

However, I want to know what reasoning can be thrown at this problem
besides "It has UB on some platforms because T1 and T2 may have
different sizes or memory layouts." Presumably there is some other
reason which gives program 7 UB on all platforms irrespective of the
size and layout of T1 and T2. I think the answer is important to
figuring out if and how one can write a general purpose pooling memory
allocator on top of malloc.

Step 3 to 4, and 4 to 5, exist only to emphasize that offsetof is
simply a macro which evaluates to an integer. I hope that the integer
result of that macro is all that matters, and if two types give the
same result, then it's entirely equivalent. (Yes, bad style, and yes
not portable, but if the offsetof macro invocations expand to the same
integer, there ought to be no difference to the C compiler and with
regards to the language spec.)

--

Now, I also started a second thread of discusion in this thread, which
attacks my problem from a different angle. What difference is there
between the following, if any?
a->x = 1;
and
* ( & (a->x)) = 1;
and
int* x = & (a->x);
*x = 1;
and
/* where *a has type T */
int* x = (int*) (((char*)a) + offsetof(T, x));
*x = 1;
Specifically, what difference is there if any of those 4 above code
fragments with regards to the effective type rules?

My naive estimation is that the first three ought to be entirely
equivalent in every way, whereas the last one with offsetof may have
different behavior. I would stil be a little surprised if the offsetof
had different behavior, but I could accept that much more easily than
if there was a distinction drawn between any of the first three.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      02-09-2011
On Feb 8, 5:07*pm, Joshua Maurice <(E-Mail Removed)> wrote:
> On Feb 8, 5:13 am, "(E-Mail Removed)" <(E-Mail Removed)> wrote:
>
> > I sorry, among the many messages in this thread I didn't find the one
> > where you specified these programs 1 and 2. If you perhaps restated
> > them, I'd willingly delve into the questions.

>
> Sure. I think my current questions are as follows:
>
> To save typing, consider the context:
>
> * #include <stddef.h>
> * #include <stdlib.h>
> * typedef struct T1 { int x; int y; } T1;
> * typedef struct T2 { int x; int y; } T2;
> * int main(void)
> * {
> * * void* p = 0;
> * * T1* a = 0;
> * * T2* b = 0;
> * * int* c = 0;
> * * p = malloc(sizeof(T1));
> * * /* ... */
> * }
>
> And consider the subsequent modifications. Where do we go from no UB,
> to UB? And why? And is that "why" documented in the C standard and
> where?
>
> * /*1*/
> * a = p;
> * a->x = 1;
> * a->y = 2;
> * return a->y;
>
> * /*2*/
> * a = p;
> * * (int*) (((char*)p) + offsetof(T1, x)) = 1;
> * * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> * return a->y;
>
> * /*3*/
> * * (int*) (((char*)p) + offsetof(T1, x)) = 1;
> * * (int*) (((char*)p) + offsetof(T1, y)) = 2;
> * return ((T1*)p)->y;
>
> * /*4*/
> * * (int*) (((char*)p) + 0) = 1;
> * * (int*) (((char*)p) + 4) = 2;
> * return ((T1*)p)->y;
>
> * /*5*/
> * * (int*) (((char*)p) + offsetof(T2, x)) = 1;
> * * (int*) (((char*)p) + offsetof(T2, y)) = 2;
> * return ((T1*)p)->y;
>
> * /*6*/
> * b = p;
> * * (int*) (((char*)p) + offsetof(T2, x)) = 1;
> * * (int*) (((char*)p) + offsetof(T2, y)) = 2;
> * return ((T1*)p)->y;
>
> * /*7*/
> * b = p;
> * b->x = 1;
> * b->y = 1;
> * return ((T1*)p)->y;
>
> Now, I fully understand that program 1 is very idiomatically well
> defined, and program 7 is well known to have UB /even if/ T1 and T2
> have the same size ala sizeof and same memory layout ala offsetof.
>
> I fully agree that one should not write code like this because it's
> not portable. It's not portable precisely because T1 and T2 may not
> have the same size and memory layout. (I would think that for such a
> simple case like this, a compiler would be perverse to give them
> different sizes or memory layouts, but I don't need that for my point
> here.)
>
> However, I want to know what reasoning can be thrown at this problem
> besides "It has UB on some platforms because T1 and T2 may have
> different sizes or memory layouts." Presumably there is some other
> reason which gives program 7 UB on all platforms irrespective of the
> size and layout of T1 and T2. I think the answer is important to
> figuring out if and how one can write a general purpose pooling memory
> allocator on top of malloc.
>
> Step 3 to 4, and 4 to 5, exist only to emphasize that offsetof is
> simply a macro which evaluates to an integer. I hope that the integer
> result of that macro is all that matters, and if two types give the
> same result, then it's entirely equivalent. (Yes, bad style, and yes
> not portable, but if the offsetof macro invocations expand to the same
> integer, there ought to be no difference to the C compiler and with
> regards to the language spec.)
>
> --
>
> Now, I also started a second thread of discusion in this thread, which
> attacks my problem from a different angle. What difference is there
> between the following, if any?
> * a->x = 1;
> and
> * * ( & (a->x)) = 1;
> and
> * int* x = & (a->x);
> * *x = 1;
> and
> * /* where *a has type T */
> * int* x = (int*) (((char*)a) + offsetof(T, x));
> * *x = 1;
> Specifically, what difference is there if any of those 4 above code
> fragments with regards to the effective type rules?
>
> My naive estimation is that the first three ought to be entirely
> equivalent in every way, whereas the last one with offsetof may have
> different behavior. I would stil be a little surprised if the offsetof
> had different behavior, but I could accept that much more easily than
> if there was a distinction drawn between any of the first three.


http://www.open-std.org/jtc1/sc22/wg...ocs/dr_236.htm

This defect report, and it's resolution, is what triggered most of
this discussion for me. Let me phrase another reinterpretation of the
same problem.

Consider the following program. myMalloc and myFree are intended to be
drop in replacements of malloc and free. They are written entirely in
portable C code, in userspace, and are written on top of malloc and
free. I hope that and would argue that such functions ought to be
writable.

#include <stdio.h>
void* myMalloc(size_t );
void myFree(void* );
int main()
{
int* a = 0;
float* b = 0;

a = myMalloc(sizeof *a);
*a = 1;
printf("%d\n", *a);
myFree(a);

b = myMalloc(sizeof *b);
*b = 1;
printf("%f\n", *b);
myFree(b);
}

It is possible that myMalloc will return the same pointer value for
both requests. In that case, the program behaves basically exactly as
if:

#include <stdio.h>
#include <stdlib.h>
int main()
{
void* myMemoryPool = 0;
int* a = 0;
float* b = 0;

myMemoryPool = malloc(sizeof(int) + sizeof(float));

a = myMemoryPool;
*a = 1;
printf("%d\n", *a);

b = myMemoryPool;
*b = 1;
printf("%f\n", *b);
}

Which one may (?) be able to write as:

#include <stdio.h>
#include <stdlib.h>
int main()
{
void* myMemoryPool = 0;
int* a = 0;
float* b = 0;

myMemoryPool = malloc(sizeof(int) + sizeof(float));

a = myMemoryPool;
b = myMemoryPool;

*a = 1;
printf("%d\n", *a);

*b = 1;
printf("%f\n", *b);
}

Which one may (?) be able to rewrite as:

#include <stdio.h>
#include <stdlib.h>
void foo(int* a, float* b)
{
*a = 1;
printf("%d\n", *a);

*b = 1;
printf("%f\n", *b);
}
int main()
{
void* myMemoryPool = 0;
int* a = 0;
float* b = 0;

myMemoryPool = malloc(sizeof(int) + sizeof(float));

a = myMemoryPool;
b = myMemoryPool;
foo(a, b);
}

Which is the union DR, aka
http://www.open-std.org/jtc1/sc22/wg...ocs/dr_236.htm

Which step broke it? When we introduced the function foo, or an
earlier step? I await a more formalized answer to that DR.

The other questions (of my earlier quoted post) came up in discussions
on comp.lang.c++ and comp.lang.c++.moderated while discussing the
problems described in this post. So, I'm curious what the C people
think of it, as that's what C++ ought to do.
 
Reply With Quote
 
dSpam@arcor.de
Guest
Posts: n/a
 
      02-10-2011
On 9 Feb., 02:07, Joshua Maurice <(E-Mail Removed)> wrote:
> To save typing, consider the context:
>
> * #include <stddef.h>
> * #include <stdlib.h>
> * typedef struct T1 { int x; int y; } T1;
> * typedef struct T2 { int x; int y; } T2;
> * int main(void)
> * {
> * * void* p = 0;
> * * T1* a = 0;
> * * T2* b = 0;
> * * int* c = 0;
> * * p = malloc(sizeof(T1));
> * * /* ... */
> * }
>
> And consider the subsequent modifications. Where do we go from no UB,
> to UB? And why? And is that "why" documented in the C standard and
> where?
>
> ...
>
> * /*4*/


For defined behavior, let's
assert(offsetof(T1, y) == 4);
here (inclusion of assert.h assumed).

> * * (int*) (((char*)p) + 0) = 1;
> * * (int*) (((char*)p) + 4) = 2;
> * return ((T1*)p)->y;
>
> * /*5*/


For defined behavior, let's
assert(offsetof(T1, y) == offsetof(T2, y));
here (inclusion of assert.h assumed).

> * * (int*) (((char*)p) + offsetof(T2, x)) = 1;
> * * (int*) (((char*)p) + offsetof(T2, y)) = 2;
> * return ((T1*)p)->y;
>
> ...
>
> * /*7*/
> * b = p;
> * b->x = 1;
> * b->y = 1;


For defined behavior, let's
assert(offsetof(T1, y) == offsetof(T2, y));
here (inclusion of assert.h assumed).

> * return ((T1*)p)->y;
>
> Now, I fully understand that program 1 is very idiomatically well
> defined, and program 7 is well known to have UB /even if/ T1 and T2
> have the same size ala sizeof and same memory layout ala offsetof.


I hold that program 7 has, given the above assertion, defined
behavior.
I follow the reasoning of Ben Bacarisse in his message of 26 Jan.,
03:15 and Johannes Schaub in his message of 6 Feb., 17:27 (if I
understood them right).

> ...
>
> Now, I also started a second thread of discusion in this thread, which
> attacks my problem from a different angle. What difference is there
> between the following, if any?
> * a->x = 1;
> and
> * * ( & (a->x)) = 1;
> and
> * int* x = & (a->x);
> * *x = 1;
> and
> * /* where *a has type T */
> * int* x = (int*) (((char*)a) + offsetof(T, x));
> * *x = 1;
> Specifically, what difference is there if any of those 4 above code
> fragments with regards to the effective type rules?


I hold that there is no difference; I see no issues with regard to
N1256 section 6.5, paragraphs 6 and 7.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      02-11-2011
On Feb 10, 1:10*am, "(E-Mail Removed)" <(E-Mail Removed)> wrote:
> On 9 Feb., 02:07, Joshua Maurice <(E-Mail Removed)> wrote:
>
>
>
> > To save typing, consider the context:

>
> > * #include <stddef.h>
> > * #include <stdlib.h>
> > * typedef struct T1 { int x; int y; } T1;
> > * typedef struct T2 { int x; int y; } T2;
> > * int main(void)
> > * {
> > * * void* p = 0;
> > * * T1* a = 0;
> > * * T2* b = 0;
> > * * int* c = 0;
> > * * p = malloc(sizeof(T1));
> > * * /* ... */
> > * }

>
> > And consider the subsequent modifications. Where do we go from no UB,
> > to UB? And why? And is that "why" documented in the C standard and
> > where?

>
> > ...

>
> > * /*4*/

>
> For defined behavior, let's
> * * assert(offsetof(T1, y) == 4);
> here (inclusion of assert.h assumed).
>
> > * * (int*) (((char*)p) + 0) = 1;
> > * * (int*) (((char*)p) + 4) = 2;
> > * return ((T1*)p)->y;

>
> > * /*5*/

>
> For defined behavior, let's
> * * assert(offsetof(T1, y) == offsetof(T2, y));
> here (inclusion of assert.h assumed).
>
> > * * (int*) (((char*)p) + offsetof(T2, x)) = 1;
> > * * (int*) (((char*)p) + offsetof(T2, y)) = 2;
> > * return ((T1*)p)->y;

>
> > ...

>
> > * /*7*/
> > * b = p;
> > * b->x = 1;
> > * b->y = 1;

>
> For defined behavior, let's
> * * assert(offsetof(T1, y) == offsetof(T2, y));
> here (inclusion of assert.h assumed).
>
> > * return ((T1*)p)->y;

>
> > Now, I fully understand that program 1 is very idiomatically well
> > defined, and program 7 is well known to have UB /even if/ T1 and T2
> > have the same size ala sizeof and same memory layout ala offsetof.

>
> I hold that program 7 has, given the above assertion, defined
> behavior.
> I follow the reasoning of Ben Bacarisse in his message of 26 Jan.,
> 03:15 and Johannes Schaub in his message of 6 Feb., 17:27 (if I
> understood them right).
>
>
>
> > ...

>
> > Now, I also started a second thread of discusion in this thread, which
> > attacks my problem from a different angle. What difference is there
> > between the following, if any?
> > * a->x = 1;
> > and
> > * * ( & (a->x)) = 1;
> > and
> > * int* x = & (a->x);
> > * *x = 1;
> > and
> > * /* where *a has type T */
> > * int* x = (int*) (((char*)a) + offsetof(T, x));
> > * *x = 1;
> > Specifically, what difference is there if any of those 4 above code
> > fragments with regards to the effective type rules?

>
> I hold that there is no difference; I see no issues with regard to
> N1256 section 6.5, paragraphs 6 and 7.


To be clear, you argue that the following program has no UB on all
conforming implementations?

#include <stdlib.h>
#include <stddef.h>
int main()
{
typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;

void* p = 0;
T1* a = 0;
T2* b = 0;

if (sizeof(T1) != sizeof(T2))
return 1;
if (offsetof(T1, y) != offsetof(T2, y))
return 1;

p = malloc(sizeof(T1));
a = p;
b = p;
a->y = 1;
return b->y;
}

That greatly surprises me. It was my understanding that the following
program clearly violates the intent of the C standard committee w.r.t
the strict aliasing rules and well understood meaning of the strict
aliasing rules. As best as I can follow the current thought processes
of the C standard's committee, from perusing the link
http://www.open-std.org/jtc1/sc22/wg...ocs/dr_236.htm
and linked meeting minutes, it seems to be the case they really want a
naive aliasing rule, "two sufficiently differently typed pointer can
be assumed by the compiler to not alias unless there's 'something' in
scope to make them alias, like a union type definition /
declaration".

Following that simple reasoning of the naive aliasing rule, the above
program has to have UB. It has UB because the compiler may legally
assume that a T1* and a T2* do not alias, and so it can transform
a->y = 1;
return b->y;
to
int tmp = b->y;
a->y = 1;
return tmp;
which leads to a read of an uninitialized / indeterminate value. At
least, that's how I'm currently understanding the direction of the C
standard committee. Arguably, what they have in those linked to notes
are rather incomplete, and that is because even they are unsure what
they want to do with this.

One proposal was to treat:
u.x = 1;
as different than
int* x = & u.x;
*x = 1;
for union types, which really irks me, but so do unions + strict
aliasing rules in general.

Another was to require that arguments to functions of sufficiently
different type may not alias. That may be a good start, but I think
it's woefully underspecified. I think you'd be better off requiring
some sort of "Sufficiently differently typed pointers may alias only
if there's something in scope which legally lets them alias, like a
union type declaration / definition."

Also, is this the current Rules As Written? Arguable, which is
precisely why it's a Defect Report under review. I guess I should stop
asking these questions and delay until the C standard committee
resolves these issues.

I would much prefer if they kept in mind how a general purpose
userspace pooling memory allocator is supposed to work. Specifically
my myMalloc and myFree example else-thread. Either you make malloc and
pals special and outlaw general purpose userspace pooling memory
allocators, or you need some rules which allow "temporary" aliasing of
pointers of sufficiently differently typed pointers as exists during
the usage of any general purpose userspace pooling memory allocator.

How would I forward my questions and examples on usespace memory
allocators to the committee for consideration? I think this is a very
important and not esoteric part of this DR which does need to be
resolved.
 
Reply With Quote
 
dSpam@arcor.de
Guest
Posts: n/a
 
      02-15-2011
On 11 Feb., 01:17, Joshua Maurice <(E-Mail Removed)> wrote:
> To be clear, you argue that the following program has no UB on all
> conforming implementations?
>
> * #include <stdlib.h>
> * #include <stddef.h>
> * int main()
> * {
> * * typedef struct T1 { int x; int y; } T1;
> * * typedef struct T2 { int x; int y; } T2;
>
> * * void* p = 0;
> * * T1* a = 0;
> * * T2* b = 0;
>
> * * if (sizeof(T1) != sizeof(T2))
> * * * return 1;
> * * if (offsetof(T1, y) != offsetof(T2, y))
> * * * return 1;
>
> * * p = malloc(sizeof(T1));
> * * a = p;
> * * b = p;
> * * a->y = 1;
> * * return b->y;
> * }


Yes, I do.

> That greatly surprises me. It was my understanding that the following
> program clearly violates the intent of the C standard committee w.r.t
> the strict aliasing rules and well understood meaning of the strict
> aliasing rules. As best as I can follow the current thought processes
> of the C standard's committee, from perusing the link
> *http://www.open-std.org/jtc1/sc22/wg...ocs/dr_236.htm
> and linked meeting minutes, it seems to be the case they really want a
> naive aliasing rule, "two sufficiently differently typed pointer can
> be assumed by the compiler to not alias unless there's 'something' in
> scope to make them alias, like a union type definition /
> declaration".


To me it seems that in the above program there is very well
'something' in scope to show that a->y and b->y occupy the same region
of storage. A translator which can't see that should better refrain
from rearranging accesses.

> Following that simple reasoning of the naive aliasing rule, the above
> program has to have UB. It has UB because the compiler may legally
> assume that a T1* and a T2* do not alias, and so it can transform
> * * a->y = 1;
> * * return b->y;
> to
> * * int tmp = b->y;
> * * a->y = 1;
> * * return tmp;
> which leads to a read of an uninitialized / indeterminate value. At
> least, that's how I'm currently understanding the direction of the C
> standard committee. Arguably, what they have in those linked to notes
> are rather incomplete, and that is because even they are unsure what
> they want to do with this.


As far as I can see, DR 236 focuses on arguments to functions, as in
http://www.open-std.org/JTC1/SC22/WG.../docs/n973.txt
Keaton
no objection to invalidating some programs (with new rules),
those
programs were written in a problematic coding style (passing
function arguments that are aliased to one another)

> One proposal was ...
>
> Another was to require that arguments to functions of sufficiently
> different type may not alias. That may be a good start, but I think
> it's woefully underspecified. I think you'd be better off requiring
> some sort of "Sufficiently differently typed pointers may alias only
> if there's something in scope which legally lets them alias, like a
> union type declaration / definition."


I'd stop after the good start. In my opinion, more complicated or
vague new rules bear to high a risk of invalidating existing programs
or tempting implementors to make unjust optimisations.

> Also, is this the current Rules As Written? Arguable, which is
> precisely why it's a Defect Report under review. I guess I should stop
> asking these questions and delay until the C standard committee
> resolves these issues.


I'm not sure what you mean by the term "this".
Delaying might indeed be a good idea.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is the aliasing rule symmetric? Johannes Schaub (litb) C++ 68 02-06-2011 09:33 PM
Is the aliasing rule symmetric? Johannes Schaub (litb) C++ 2 01-21-2011 11:30 PM
how to add validation rule for url in the validation-rule.xml ,I added some thing like this but......... shailajabtech@gmail.com Java 0 10-12-2006 08:36 AM
Anti-aliasing GIF Images Kevin Bertman Java 4 11-29-2004 05:46 AM
LCD anti-aliasing in Java Tim Tyler Java 2 09-05-2003 09:01 AM



Advertisments