Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > strict aliasing rules in ISO C, someone understands them ?

Reply
Thread Tools

strict aliasing rules in ISO C, someone understands them ?

 
 
nicolas.riesch@genevoise.ch
Guest
Posts: n/a
 
      10-13-2005

I try to understand strict aliasing rules that are in the C Standard.
As gcc applies these rules by default, I just want to be sure to
understand fully this issue.

For questions (1), (2) and (3), I think that the answers are all "yes",
but I would be glad to have strong confirmation.

About questions (4), (5) and (6), I really don't know. Please help ! !
!

--------

The Standard says (
http://www.open-std.org/jtc1/sc22/wg...docs/n1124.pdf chapter 6.5
):

An object shall have its stored value accessed only by an lvalue
expression that has one of
the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type
of the object,
- a type that is the signed or unsigned type corresponding to the
effective type of the object,
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned
types among its members
(including, recursively, a member of a subaggregate or contained
union), or
- a character type.


***** Question (1) *****

Let's have two struct having different tag names, like:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

The compiler is free to assume that p1 and p2 point to different memory
locations and don't alias.
Two struct having different names are considered to be different types.

In the standard, we read the wording "effective type of the object"
many times.

This "effective type of the object" may be an "int", "double", etc, but
may also be a "struct" type, right ???

And I suppose it may also be an "array" type or an "union" type as
well, is it correct ???


***** Question (2) *****

In the little program that follows, the line "printf("%d\n", *x);"
normally returns 123,
but an optimizing compiler can return garbage instead of 123.
Is my reasoning correct ???

On the other side, the line "printf("%d\n", p1->i);" always returns 999
as expected, right ???

----

#include <stdio.h>
#include <stdlib.h>

struct s1 { int i; double f; };


int main(void)
{
struct s1* p1;
int* x;

p1 = malloc(sizeof(*p1));
p1->i = 123; // object of type 'struct s1' contains 123

x = &(p1->i);

printf("%d\n", *x); // I try to access a value stored in an
object of type 'struct s1'
// through *x which is of type 'int'.
// I think this is not allowed by the
standard !

*x = 999; // I store 999 in *x, which is of type 'int'

printf("%d\n", p1->i); // I access a value stored in *x which is of
type 'int'
// by *p1 ( as p1->i is a shortcut for
(*p1).i )
// which is of type 'struct s1',
// but contains a member of type 'int'.
// I think this is allowed by the standard.


return 0;
}


***** Question (3) *****

The Standard forbids ( if I am not mistaken ) pointer of type "struct A
*" to access data written by a pointer of type "struct B *", as the are
different types.

This means that the common usage of faking inheritance in C like in
this code sniplet is now utterly wrong, is it correct ???


--- myfile.c ---

#include <stdio.h>
#include <stdlib.h>

typedef enum { RED, BLUE, GREEN } Color;

struct Point { int x;
int y;
};

struct Color_Point { int x;
int y;
Color color;
};

struct Color_Point2{ struct Point point;
Color color;
};

int main(int argc, char* argv[])
{

struct Point* p;

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point));
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;

printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point" object using a "struct Point*" pointer is
forbidden by the Standard ???


struct Color_Point2* my_color_point2 = malloc(sizeof(struct
Color_Point2));
my_color_point2->point.x = 100;
my_color_point2->point.y = 200;
my_color_point2->color = RED;

p = (struct Point*)my_color_point2;

printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
a "struct Color_Point2" object using a "struct Point*" pointer is
forbidden by the Standard ???


p = &my_color_point2->point;

printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???


return 0;
}


Is the line "p = (struct Point*)my_color_point" also a case of what is
called "type-punning" ???


***** Question (4) *****

In the Standard, chapter 6.5.2.3, it is written:

One special guarantee is made in order to simplify the use of unions:
if a union contains
several structures that share a common initial sequence (see below),
and if the union
object currently contains one of these structures, it is permitted to
inspect the common
initial part of any of them anywhere that a declaration of the complete
type of the union is
visible. Two structures share a common initial sequence if
corresponding members have
compatible types (and, for bit-fields, the same widths) for a sequence
of one or more
initial members.

I find this statement completely obscure.

Let's have:

struct s1 {int i;};
struct s2 {int i;};

struct s1 *p1;
struct s2 *p2;

A compiler is free to assume that *p1 and *p2 don't alias.

If we just put a union declaration like this before this code, then it
acts like a flag to the compiler, indicating that pointers to "struct
s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
to the same location.

union p1_p2_alias_flag { struct s1 st1;
struct s2 st2;
};

There is no need to use "union p1_p2_alias_flag" for accessing data,
and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
anywhere else.
I mean, it is possible to access data using directly p1 and p2.

Do you agree, everybody ???


***** Question (5) *****

This question is really hard.

Let's have this code sniplet:

---------
#include <stdio.h>

int main (void)
{

struct s1 {int i;
};

struct s1 s = {77};

unsigned char* x = (unsigned char*)&s;
printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
// Standard says data stored in "struct s1" type can be read by pointer
to "char"

x[0] = 100; // here, I write data in "char" objects !!!
x[1] = 101;
x[2] = 102;
x[3] = 103;

printf("%d\n", s.i); // but data stored in "char" objects cannot be
read by pointer to "struct s1" ???

return 0;
}
-----------

For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
(int)x[3]);", I can rewrite the Standard clause like this:

An object [ here, s of type "struct s1" ] shall have its stored value
accessed only by an lvalue expression that has one of
the following types:
[ blah blah blah ]
- a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
it is our case, so everything is OK so far !


But what about the line "printf("%d\n", s.i);" ??????
I read the Standard again and again, but I cannot express how is can
work.
If I rewrite the Standard clause, it gives:

An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
stored value accessed only by an lvalue expression that has one of
the following types:
- a type compatible with the effective type of the object, [ this is
not our case ]
- a qualified version of a type compatible with the effective type
of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to the
effective type of the object, [ still not our case ]
- a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object, [ still not our
case ]
- an aggregate or union type that includes one of the aforementioned
types among its members [ we read through "s" which is of type "struct
s1", but it does not contain a member of type "char" ]
(including, recursively, a member of a subaggregate or contained
union), or
- a character type. [ definitely not our case ]

We see that none of these conditions applies in our case.

Where is the flaw in my reasoning ???
Does the last "printf" line of this code sniplet work or not ??? and
why ???


***** Question (6) *****

I often see this code used with socket programming:

struct sockaddr_in my_addr;
...
bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));

The function bind(...) needs a pointer to "struct sockaddr", but
my_addr is a "struct sockaddr_in".
So, in my opinion, the function bind is not guaranteed to access safely
the content of object my_addr.

Someone knows why this code is not broken ( or if it is ) ???

 
Reply With Quote
 
 
 
 
Christian Bau
Guest
Posts: n/a
 
      10-13-2005
In article <(E-Mail Removed) .com>,
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> ***** Question (2) *****
>
> In the little program that follows, the line "printf("%d\n", *x);"
> normally returns 123,
> but an optimizing compiler can return garbage instead of 123.
> Is my reasoning correct ???
>
> On the other side, the line "printf("%d\n", p1->i);" always returns 999
> as expected, right ???
>
> ----
>
> #include <stdio.h>
> #include <stdlib.h>
>
> struct s1 { int i; double f; };
>
>
> int main(void)
> {
> struct s1* p1;
> int* x;
>
> p1 = malloc(sizeof(*p1));
> p1->i = 123; // object of type 'struct s1' contains 123
>
> x = &(p1->i);
>
> printf("%d\n", *x); // I try to access a value stored in an
> object of type 'struct s1'
> // through *x which is of type 'int'.
> // I think this is not allowed by the
> standard !
>
> *x = 999; // I store 999 in *x, which is of type 'int'
>
> printf("%d\n", p1->i); // I access a value stored in *x which is of
> type 'int'
> // by *p1 ( as p1->i is a shortcut for
> (*p1).i )
> // which is of type 'struct s1',
> // but contains a member of type 'int'.
> // I think this is allowed by the standard.
>
>
> return 0;
> }


This is all ok. The only unusual thing with structs is that there can be
padding, and that storing into any struct member could modify any
padding in the struct. If there is padding between int i and double f,
then p1->i = 123 could modify the padding, while *x = 999 couldn't.


> ***** Question (3) *****
>
> The Standard forbids ( if I am not mistaken ) pointer of type "struct A
> *" to access data written by a pointer of type "struct B *", as the are
> different types.
>
> This means that the common usage of faking inheritance in C like in
> this code sniplet is now utterly wrong, is it correct ???
>
>
> --- myfile.c ---
>
> #include <stdio.h>
> #include <stdlib.h>
>
> typedef enum { RED, BLUE, GREEN } Color;
>
> struct Point { int x;
> int y;
> };
>
> struct Color_Point { int x;
> int y;
> Color color;
> };
>
> struct Color_Point2{ struct Point point;
> Color color;
> };
>
> int main(int argc, char* argv[])
> {
>
> struct Point* p;
>
> struct Color_Point* my_color_point = malloc(sizeof(struct
> Color_Point));
> my_color_point->x = 10;
> my_color_point->y = 20;
> my_color_point->color = GREEN;
>
> p = (struct Point*)my_color_point;


This is undefined behavior. There is no guarantee that my_color_point is
correctly aligned for a pointer of type (struct Point *).

> printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
> a "struct Color_Point" object using a "struct Point*" pointer is
> forbidden by the Standard ???


Yes. There is an exception: If the compiler has seen a declaration of a
union with members of type "struct Point" and "struct Color_Point", then
accessing the common members initial members of both structs is legal;
even writing to a member of one struct and reading as a member of
another struct.

> struct Color_Point2* my_color_point2 = malloc(sizeof(struct
> Color_Point2));
> my_color_point2->point.x = 100;
> my_color_point2->point.y = 200;
> my_color_point2->color = RED;
>
> p = (struct Point*)my_color_point2;


Yes, you can always cast a pointer to struct to a pointer of the first
member.

> printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in
> a "struct Color_Point2" object using a "struct Point*" pointer is
> forbidden by the Standard ???


That's fine.

> p = &my_color_point2->point;
>
> printf("x:%d, y:%d\n", p->x, p->y); // but this is correct, right ???
>
>
> return 0;
> }



> Is the line "p = (struct Point*)my_color_point" also a case of what is
> called "type-punning" ???
>
>
> ***** Question (4) *****
>
> In the Standard, chapter 6.5.2.3, it is written:
>
> One special guarantee is made in order to simplify the use of unions:
> if a union contains
> several structures that share a common initial sequence (see below),
> and if the union
> object currently contains one of these structures, it is permitted to
> inspect the common
> initial part of any of them anywhere that a declaration of the complete
> type of the union is
> visible. Two structures share a common initial sequence if
> corresponding members have
> compatible types (and, for bit-fields, the same widths) for a sequence
> of one or more
> initial members.
>
> I find this statement completely obscure.
>
> Let's have:
>
> struct s1 {int i;};
> struct s2 {int i;};
>
> struct s1 *p1;
> struct s2 *p2;
>
> A compiler is free to assume that *p1 and *p2 don't alias.


Exactly.

> If we just put a union declaration like this before this code, then it
> acts like a flag to the compiler, indicating that pointers to "struct
> s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
> to the same location.
>
> union p1_p2_alias_flag { struct s1 st1;
> struct s2 st2;
> };
>
> There is no need to use "union p1_p2_alias_flag" for accessing data,
> and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
> anywhere else.
> I mean, it is possible to access data using directly p1 and p2.


Yes, that is right.


> ***** Question (5) *****
>
> This question is really hard.
>
> Let's have this code sniplet:
>
> ---------
> #include <stdio.h>
>
> int main (void)
> {
>
> struct s1 {int i;
> };
>
> struct s1 s = {77};
>
> unsigned char* x = (unsigned char*)&s;
> printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
> // Standard says data stored in "struct s1" type can be read by pointer
> to "char"


That is if sizeof (int) >= 4, which is nowhere guaranteed.


> x[0] = 100; // here, I write data in "char" objects !!!
> x[1] = 101;
> x[2] = 102;
> x[3] = 103;
>
> printf("%d\n", s.i); // but data stored in "char" objects cannot be
> read by pointer to "struct s1" ???


Assuming that sizeof (int) == 4, you have changed exactly every bit in
the representation of x. If the representation is not a trap
representation, you are fine. And it is even ok if for example the
result after storing three bytes, combined with the last remaining byte
of the number 77 were a trap representation, because you never access
that value.



> return 0;
> }





> For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
> (int)x[3]);", I can rewrite the Standard clause like this:
>
> An object [ here, s of type "struct s1" ] shall have its stored value
> accessed only by an lvalue expression that has one of
> the following types:
> [ blah blah blah ]
> - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
> it is our case, so everything is OK so far !
>
>
> But what about the line "printf("%d\n", s.i);" ??????
> I read the Standard again and again, but I cannot express how is can
> work.


If the bytes stored are a valid representation of an int, then that is
what it prints. If not, it is undefined behavior. A specific compiler
might guarantee that int's have no trap representations.

> ***** Question (6) *****
>
> I often see this code used with socket programming:
>
> struct sockaddr_in my_addr;
> ...
> bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
>
> The function bind(...) needs a pointer to "struct sockaddr", but
> my_addr is a "struct sockaddr_in".
> So, in my opinion, the function bind is not guaranteed to access safely
> the content of object my_addr.
>
> Someone knows why this code is not broken ( or if it is ) ???


Depends on the declarations of the types involved. And remember that the
C Standard is not the only standard. For example, C Standard doesn't
guarantee that 'a' + 1 == 'b', but if your C implementation uses ASCII
or Unicode for its character set, then the ASCII standard or the Unicode
standard would give you that guarantee.

In your case, it could be that POSIX guarantees that the code is
correct. So it will work on any implementation that conforms to the
POSIX standard (no matter whether it conforms to the C Standard or not),
even though it might not work on an implementation that conforms to the
C Standard but not to POSIX.
 
Reply With Quote
 
 
 
 
Jack Klein
Guest
Posts: n/a
 
      10-14-2005
On 13 Oct 2005 07:39:48 -0700, (E-Mail Removed) wrote in
comp.lang.c:

>
> I try to understand strict aliasing rules that are in the C Standard.
> As gcc applies these rules by default, I just want to be sure to
> understand fully this issue.
>
> For questions (1), (2) and (3), I think that the answers are all "yes",
> but I would be glad to have strong confirmation.
>
> About questions (4), (5) and (6), I really don't know. Please help ! !
> !
>
> --------
>
> The Standard says (
> http://www.open-std.org/jtc1/sc22/wg...docs/n1124.pdf chapter 6.5
> ):
>
> An object shall have its stored value accessed only by an lvalue
> expression that has one of
> the following types:
> - a type compatible with the effective type of the object,
> - a qualified version of a type compatible with the effective type
> of the object,
> - a type that is the signed or unsigned type corresponding to the
> effective type of the object,
> - a type that is the signed or unsigned type corresponding to a
> qualified version of the effective type of the object,
> - an aggregate or union type that includes one of the aforementioned
> types among its members
> (including, recursively, a member of a subaggregate or contained
> union), or
> - a character type.
>
>
> ***** Question (1) *****
>
> Let's have two struct having different tag names, like:
>
> struct s1 {int i;};
> struct s2 {int i;};
>
> struct s1 *p1;
> struct s2 *p2;
>
> The compiler is free to assume that p1 and p2 point to different memory
> locations and don't alias.
> Two struct having different names are considered to be different types.
>
> In the standard, we read the wording "effective type of the object"
> many times.
>
> This "effective type of the object" may be an "int", "double", etc, but
> may also be a "struct" type, right ???
>
> And I suppose it may also be an "array" type or an "union" type as
> well, is it correct ???


Yes.

> ***** Question (2) *****
>
> In the little program that follows, the line "printf("%d\n", *x);"
> normally returns 123,
> but an optimizing compiler can return garbage instead of 123.


No, an optimizing compiler must still output "123" for this line.

> Is my reasoning correct ???
>
> On the other side, the line "printf("%d\n", p1->i);" always returns 999
> as expected, right ???
>
> ----
>
> #include <stdio.h>
> #include <stdlib.h>
>
> struct s1 { int i; double f; };
>
>
> int main(void)
> {
> struct s1* p1;
> int* x;
>
> p1 = malloc(sizeof(*p1));
> p1->i = 123; // object of type 'struct s1' contains 123
>
> x = &(p1->i);
>
> printf("%d\n", *x); // I try to access a value stored in an
> object of type 'struct s1'
> // through *x which is of type 'int'.
> // I think this is not allowed by the
> standard !


The effective type of *p1 is 'struct s1'. The effective type of s1.i
is 'int'. 'x' is a pointer to int, and you have initialized it with a
pointer to an int. This is perfectly legal.

Since the int contains the value 123, and 'x' quite properly points to
that int, *x must retrieve the int value 123. It can't do anything
else.

> *x = 999; // I store 999 in *x, which is of type 'int'
>
> printf("%d\n", p1->i); // I access a value stored in *x which is of
> type 'int'
> // by *p1 ( as p1->i is a shortcut for
> (*p1).i )
> // which is of type 'struct s1',
> // but contains a member of type 'int'.
> // I think this is allowed by the standard.
>
>
> return 0;
> }
>
>
> ***** Question (3) *****
>
> The Standard forbids ( if I am not mistaken ) pointer of type "struct A
> *" to access data written by a pointer of type "struct B *", as the are
> different types.
>
> This means that the common usage of faking inheritance in C like in
> this code sniplet is now utterly wrong, is it correct ???
>
>
> --- myfile.c ---
>
> #include <stdio.h>
> #include <stdlib.h>
>
> typedef enum { RED, BLUE, GREEN } Color;
>
> struct Point { int x;
> int y;
> };
>
> struct Color_Point { int x;
> int y;
> Color color;
> };
>
> struct Color_Point2{ struct Point point;
> Color color;
> };
>
> int main(int argc, char* argv[])
> {
>
> struct Point* p;
>
> struct Color_Point* my_color_point = malloc(sizeof(struct
> Color_Point));
> my_color_point->x = 10;
> my_color_point->y = 20;
> my_color_point->color = GREEN;
>
> p = (struct Point*)my_color_point;
>
> printf("x:%d, y:%d\n", p->x, p->y); // trying to access data stored in


This is undefined behavior, pure and simple. It works on many
implementations, but is not guaranteed at all.

[snip]

> Is the line "p = (struct Point*)my_color_point" also a case of what is
> called "type-punning" ???


Type punning is not a term defined by the standard, but I would say
that the act of assigning the pointer via a cast is not type punning.
Accessing a member of the foreign structure type through the pointer
is.

> ***** Question (4) *****
>
> In the Standard, chapter 6.5.2.3, it is written:
>
> One special guarantee is made in order to simplify the use of unions:
> if a union contains
> several structures that share a common initial sequence (see below),
> and if the union
> object currently contains one of these structures, it is permitted to
> inspect the common
> initial part of any of them anywhere that a declaration of the complete
> type of the union is
> visible. Two structures share a common initial sequence if
> corresponding members have
> compatible types (and, for bit-fields, the same widths) for a sequence
> of one or more
> initial members.
>
> I find this statement completely obscure.
>
> Let's have:
>
> struct s1 {int i;};
> struct s2 {int i;};
>
> struct s1 *p1;
> struct s2 *p2;
>
> A compiler is free to assume that *p1 and *p2 don't alias.
>
> If we just put a union declaration like this before this code, then it
> acts like a flag to the compiler, indicating that pointers to "struct
> s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
> to the same location.
>
> union p1_p2_alias_flag { struct s1 st1;
> struct s2 st2;
> };
>
> There is no need to use "union p1_p2_alias_flag" for accessing data,
> and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
> anywhere else.
> I mean, it is possible to access data using directly p1 and p2.


It seems unlikely that a compiler could find a way to prevent it from
working in general, even if the implementer tried, but such behavior
would not render the compiler non-conforming.

On the other hand, since your structure only contains a single member,
and the first member always begins at the same address as the
structure itself, this particular usage can't fail.

Still, the behavior is undefined. Which means the language standard
places no requirements on it at all.
>
> Do you agree, everybody ???
>
>
> ***** Question (5) *****
>
> This question is really hard.
>
> Let's have this code sniplet:
>
> ---------
> #include <stdio.h>
>
> int main (void)
> {
>
> struct s1 {int i;
> };
>
> struct s1 s = {77};
>
> unsigned char* x = (unsigned char*)&s;
> printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
> // Standard says data stored in "struct s1" type can be read by pointer
> to "char"
>
> x[0] = 100; // here, I write data in "char" objects !!!
> x[1] = 101;
> x[2] = 102;
> x[3] = 103;


The standard does not say that you can do this. You are assuming that
sizeof(int) is at least 4, and there are implementations where that is
not true. Accessing, let alone writing to, x[1], x[2], or x[3] might
be outside the bounds of the int and the struct, producing undefined
behavior.

> printf("%d\n", s.i); // but data stored in "char" objects cannot be
> read by pointer to "struct s1" ???
>
> return 0;
> }


No, the point is that accessing s.i, an int, after storing data into
that memory using a different object type, is undefined. You might
have created a bit pattern that does not represent a valid value for
the int, called a trap representation.

> -----------
>
> For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
> (int)x[3]);", I can rewrite the Standard clause like this:
>
> An object [ here, s of type "struct s1" ] shall have its stored value
> accessed only by an lvalue expression that has one of
> the following types:
> [ blah blah blah ]
> - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
> it is our case, so everything is OK so far !


I have worked on a platform where sizeof(int) is 1, and several where
sizeof(int) is 2. I have never worked on a platform where sizeof(int)
is 3, but C allows it. On any of these platforms you would be
invoking undefined behavior.

> But what about the line "printf("%d\n", s.i);" ??????


Even assuming that sizeof(int) >= 4 on your implementation, you have
to understand that all types, other than unsigned char, can have trap
representations, that is bit patterns that do not represent a valid
value for the type. By writing arbitrary bit patterns into an int,
you may have created an invalid bit pattern in that int. When you
access that invalid bit pattern as an int, the behavior is undefined.

> I read the Standard again and again, but I cannot express how is can
> work.
> If I rewrite the Standard clause, it gives:
>
> An object [ in our example, x[0], x[1], x[2], and x[3] ] shall have its
> stored value accessed only by an lvalue expression that has one of
> the following types:
> - a type compatible with the effective type of the object, [ this is
> not our case ]
> - a qualified version of a type compatible with the effective type
> of the object, [ still not our case ]
> - a type that is the signed or unsigned type corresponding to the
> effective type of the object, [ still not our case ]
> - a type that is the signed or unsigned type corresponding to a
> qualified version of the effective type of the object, [ still not our
> case ]
> - an aggregate or union type that includes one of the aforementioned
> types among its members [ we read through "s" which is of type "struct
> s1", but it does not contain a member of type "char" ]
> (including, recursively, a member of a subaggregate or contained
> union), or
> - a character type. [ definitely not our case ]
>
> We see that none of these conditions applies in our case.


The standard provides a specific list of what is allowed. Lists like
this are always exhaustive. That means anything on the list is
specifically undefined.

> Where is the flaw in my reasoning ???


There is no flaw in your reasoning, the code produces undefined
behavior.

> Does the last "printf" line of this code sniplet work or not ??? and
> why ???


There is no question of "work". Whatever it does is just as right or
wrong as anything else that might happen as far as the language is
concerned. That's what undefined behavior means. The C standard does
not know or care what happens.

> ***** Question (6) *****
>
> I often see this code used with socket programming:
>
> struct sockaddr_in my_addr;
> ...
> bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
>
> The function bind(...) needs a pointer to "struct sockaddr", but
> my_addr is a "struct sockaddr_in".
> So, in my opinion, the function bind is not guaranteed to access safely
> the content of object my_addr.
>
> Someone knows why this code is not broken ( or if it is ) ???


That depends on the definition of 'struct sockaddr_in'. If its first
member is a 'struct sockaddr', the code is legal and well defined
because a pointer to a structure can always be converted to a pointer
to its first member. If not, then the code produces undefined
behavior if the called function actually uses the pointer to access
members of a 'struct sockaddr'.

You use terms like "broken" and "work", which do not really apply as
far as undefined behavior in C is concerned. They are subjective
terms at best. Code is "broken" if it does not do what you want, you
consider it to "work" if it does. If it produces undefined behavior,
it may "work" on one compiler but be "broken" on another, and both
compilers can be standard conforming.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
 
Reply With Quote
 
S.Tobias
Guest
Posts: n/a
 
      10-14-2005
Christian Bau <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed) .com>,
> (E-Mail Removed) wrote:
>

[snip]
>> ***** Question (5) *****
>>
>> This question is really hard.
>>
>> Let's have this code sniplet:
>>
>> ---------
>> #include <stdio.h>
>>
>> int main (void)
>> {
>>
>> struct s1 {int i;
>> };
>>
>> struct s1 s = {77};
>>
>> unsigned char* x = (unsigned char*)&s;
>> printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2], (int)x[3]);
>> // Standard says data stored in "struct s1" type can be read by pointer
>> to "char"

>
> That is if sizeof (int) >= 4, which is nowhere guaranteed.
>
>
>> x[0] = 100; // here, I write data in "char" objects !!!
>> x[1] = 101;
>> x[2] = 102;
>> x[3] = 103;
>>

Let's suppose that we copy value from another int:
int i = 42;
unsigned char *y = (void*)&i;
assert(sizeof(int) == 4);
x[0] = y[0];
//...etc.
>> printf("%d\n", s.i); // but data stored in "char" objects cannot be
>> read by pointer to "struct s1" ???


Storing values through character lvalues did not change the effective
type of the struct, or it's member, therefore it's okay (compiler must
reread the value from memory).

Effective type for declared objects is always the declared type.
Effective type for allocated objects is the last imprinted by
storing a value, by copying (memcpy, memmove, char array), or, if
none, is the type of the lvalue it is accessed with.

> Assuming that sizeof (int) == 4, you have changed exactly every bit in
> the representation of x. If the representation is not a trap
> representation, you are fine. And it is even ok if for example the
> result after storing three bytes, combined with the last remaining byte
> of the number 77 were a trap representation, because you never access
> that value.


(all agreed)

[snip]
>> For the line "printf("%d %d %d %d\n", (int)x[0], (int)x[1], (int)x[2],
>> (int)x[3]);", I can rewrite the Standard clause like this:
>>
>> An object [ here, s of type "struct s1" ] shall have its stored value
>> accessed only by an lvalue expression that has one of
>> the following types:
>> [ blah blah blah ]
>> - a character type [ in our example, x[0], x[1], x[2], x[3] ]. //
>> it is our case, so everything is OK so far !
>>
>>
>> But what about the line "printf("%d\n", s.i);" ??????
>> I read the Standard again and again, but I cannot express how is can
>> work.


It means this: struct s1 object can be legally accessed with a character
lvalue (including writing data to the struct). Since it's legal,
the compiler must take it into consideration when later accessing
struct s1. Either it can prove that character lvalues did not refer
to the struct object, or it must re-read the struct value from memory.

This is not the case with other types:
assert(sizeof(int) == sizeof(short))
int i = 42;
short *ps = &i; //assume that alignment is the same
*ps = 54; //this access is UB; since it is not legal to access int object
//with short lvalue, compiler need not assume that object `i'
//was actually changed
printf("%d\n", i); //may print cached value 42
//(the Std says it can do or not do virtually anything)

For another example: when a value is stored through `short' lvalue,
the compiler need not assume that `struct s1' object was changed,
because `struct s1' does not contain a `short' member.

--
Stan Tobias
mailx `echo (E-Mail Removed)LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
S.Tobias
Guest
Posts: n/a
 
      10-14-2005
Christian Bau <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed) .com>,
> (E-Mail Removed) wrote:


>> ***** Question (4) *****
>>
>> In the Standard, chapter 6.5.2.3, it is written:
>>
>> One special guarantee is made in order to simplify the use of unions:
>> if a union contains
>> several structures that share a common initial sequence (see below),
>> and if the union
>> object currently contains one of these structures, it is permitted to
>> inspect the common
>> initial part of any of them anywhere that a declaration of the complete
>> type of the union is
>> visible. Two structures share a common initial sequence if
>> corresponding members have
>> compatible types (and, for bit-fields, the same widths) for a sequence
>> of one or more
>> initial members.
>>
>> I find this statement completely obscure.
>>
>> Let's have:
>>
>> struct s1 {int i;};
>> struct s2 {int i;};
>>
>> struct s1 *p1;
>> struct s2 *p2;
>>
>> A compiler is free to assume that *p1 and *p2 don't alias.

>
> Exactly.
>

What's more important: `p1->i' and `p2->i' don't alias, despite that they
have the same type!

However p1 and p2 _may_ point at the same object.
((char*)p1)[0] = 0;
At this point the compiler cannot blindly assume that `*p2' wasn't modified.

>> If we just put a union declaration like this before this code, then it
>> acts like a flag to the compiler, indicating that pointers to "struct
>> s1" and pointers to "struct s2" ( here, p1 and p2 ) may alias and point
>> to the same location.


(As I said above, they may point to the same location.)

>>
>> union p1_p2_alias_flag { struct s1 st1;
>> struct s2 st2;
>> };
>>
>> There is no need to use "union p1_p2_alias_flag" for accessing data,
>> and "p1_p2_alias_flag", "st1" and "st2" are just dummy names, not used
>> anywhere else.

(I don't quite understand what you mean here.)
>> I mean, it is possible to access data using directly p1 and p2.


After the compiler sees the union declaration, it is obliged to assume
that `p1->i' and `p2->i' may refer to (alias) the same object.
(However, it still need not assume that expressions `*p1' and `*p2' alias
the same object, since they are incompatible types).

--
Stan Tobias
mailx `echo (E-Mail Removed)LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
Dik T. Winter
Guest
Posts: n/a
 
      10-14-2005
In article <(E-Mail Removed)> "S.Tobias" <(E-Mail Removed)> writes:
....
> >> struct s1 {int i;};
> >> struct s2 {int i;};
> >>
> >> struct s1 *p1;
> >> struct s2 *p2;
> >>
> >> A compiler is free to assume that *p1 and *p2 don't alias.

> >
> > Exactly.


With a caveat. It is free to assume that as long as nothing is assigned
to either p1 or p2.

> However p1 and p2 _may_ point at the same object.


In that case the compiler can not assume that *p1 and *p2 don't alias.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
 
Reply With Quote
 
S.Tobias
Guest
Posts: n/a
 
      10-14-2005
Dik T. Winter <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed)> "S.Tobias" <(E-Mail Removed)> writes:
> ...
> > >> struct s1 {int i;};
> > >> struct s2 {int i;};
> > >>
> > >> struct s1 *p1;
> > >> struct s2 *p2;
> > >>
> > >> A compiler is free to assume that *p1 and *p2 don't alias.

[snip]
> > However p1 and p2 _may_ point at the same object.

>
> In that case the compiler can not assume that *p1 and *p2 don't alias.


I don't agree, otherwise aliasing rules would have no purpose.
Since `*p1' and `*p2' have incompatible types, the compiler may assume
(act as if) they don't refer to the same object, it doesn't have to prove
that both pointers don't point at the same location.
I believe that the compiler even needn't assume that these two alias
the same object:
*p1
*(struct s2 *)p1
The decision whether to alias or not to alias can be based on
the type of lvalue (mainly).

Can you give an example where `*p1' and `*p2' alias the same object
while the behaviour is defined? (...And where the aliasing is actually
relevant, eg.: `&*p1' and `&*p2' doesn't count.)
Perhaps reading from allocated and separately initialized object, but
this is not a situation when aliasing rules are very important.

--
Stan Tobias
mailx `echo (E-Mail Removed)LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
Dik T. Winter
Guest
Posts: n/a
 
      10-15-2005
In article <(E-Mail Removed)> "S.Tobias" <(E-Mail Removed)> writes:
> Dik T. Winter <(E-Mail Removed)> wrote:
> > In article <(E-Mail Removed)> "S.Tobias" <(E-Mail Removed)> writes:
> > ...
> > > >> struct s1 {int i;};
> > > >> struct s2 {int i;};
> > > >>
> > > >> struct s1 *p1;
> > > >> struct s2 *p2;
> > > >>
> > > >> A compiler is free to assume that *p1 and *p2 don't alias.

> [snip]
> > > However p1 and p2 _may_ point at the same object.

> >
> > In that case the compiler can not assume that *p1 and *p2 don't alias.

>
> I don't agree,


Sorry, I missed that p1 and p2 have different types. Indeed, p1 and p2
_may_ point at the same object, but the only way to let that happen is
by either undefined or implementation defined behaviour. So you were
right.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
 
Reply With Quote
 
Thad Smith
Guest
Posts: n/a
 
      10-16-2005
Christian Bau wrote:
> In article <(E-Mail Removed) .com>,
> (E-Mail Removed) wrote:


>>--- myfile.c ---
>>
>>#include <stdio.h>
>>#include <stdlib.h>
>>
>>typedef enum { RED, BLUE, GREEN } Color;
>>
>>struct Point { int x;
>> int y;
>> };
>>
>>struct Color_Point { int x;
>> int y;
>> Color color;
>> };
>>
>>struct Color_Point2{ struct Point point;
>> Color color;
>> };
>>
>>int main(int argc, char* argv[])
>>{
>>
>>struct Point* p;
>>
>>struct Color_Point* my_color_point = malloc(sizeof(struct
>>Color_Point));
>>my_color_point->x = 10;
>>my_color_point->y = 20;
>>my_color_point->color = GREEN;
>>
>>p = (struct Point*)my_color_point;

>
>
> This is undefined behavior. There is no guarantee that my_color_point is
> correctly aligned for a pointer of type (struct Point *).


Doesn't the fact that the value of my_color_point was returned by malloc
guarantee correct alignment?

Thad

 
Reply With Quote
 
Christian Bau
Guest
Posts: n/a
 
      10-16-2005
In article <4351bc52$0$27308$(E-Mail Removed) s.com>,
Thad Smith <(E-Mail Removed)> wrote:

> Christian Bau wrote:
> > In article <(E-Mail Removed) .com>,
> > (E-Mail Removed) wrote:

>
> >>--- myfile.c ---
> >>
> >>#include <stdio.h>
> >>#include <stdlib.h>
> >>
> >>typedef enum { RED, BLUE, GREEN } Color;
> >>
> >>struct Point { int x;
> >> int y;
> >> };
> >>
> >>struct Color_Point { int x;
> >> int y;
> >> Color color;
> >> };
> >>
> >>struct Color_Point2{ struct Point point;
> >> Color color;
> >> };
> >>
> >>int main(int argc, char* argv[])
> >>{
> >>
> >>struct Point* p;
> >>
> >>struct Color_Point* my_color_point = malloc(sizeof(struct
> >>Color_Point));
> >>my_color_point->x = 10;
> >>my_color_point->y = 20;
> >>my_color_point->color = GREEN;
> >>
> >>p = (struct Point*)my_color_point;

> >
> >
> > This is undefined behavior. There is no guarantee that my_color_point is
> > correctly aligned for a pointer of type (struct Point *).

>
> Doesn't the fact that the value of my_color_point was returned by malloc
> guarantee correct alignment?


In this case, yes.

If you use

struct Color_Point* my_color_point = malloc(sizeof(struct
Color_Point) * 2);
++my_color_point;
my_color_point->x = 10;
my_color_point->y = 20;
my_color_point->color = GREEN;

p = (struct Point*)my_color_point;

you get undefined behavior.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
warning of breaking strict-aliasing rules Noob C Programming 9 05-07-2012 10:48 PM
char and strict aliasing Paul Brettschneider C++ 4 07-18-2008 12:22 PM
-fno-strict-aliasing turned off when cross compiling Squat'n Dive Python 3 01-17-2008 08:26 PM
dereferencing type-punned pointer will break strict-aliasing rules David Mathog C Programming 3 07-05-2007 12:04 AM
Strict Pointer Aliasing Question Bryan Parkoff C++ 2 01-15-2004 06:43 PM



Advertisments