Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Repeated types in union

Reply
Thread Tools

Repeated types in union

 
 
Edward Rutherford
Guest
Posts: n/a
 
      12-10-2011
Hello :

Is the following code an undefined behavior?


union {
int a;
int b;
} u;
u.a = 3;
printf("%d\n", u.b);


Cheers

Edward
 
Reply With Quote
 
 
 
 
Jens Gustedt
Guest
Posts: n/a
 
      12-10-2011
Am 12/10/2011 11:06 PM, schrieb Edward Rutherford:
> Is the following code an undefined behavior?
>
>
> union {
> int a;
> int b;
> } u;
> u.a = 3;
> printf("%d\n", u.b);


not that I see. Acessing a different member than the one was last stored
is only undefined behavior if the bit pattern results in a trap
representation for the new type.

If the first member has padding bytes that the other type uses for its
data representation, the value of these bytes is *unspecified* which is
not the same thing as UB.

In any case, none of these things can happen for your example.

Jens


 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      12-10-2011
On 12/10/2011 5:06 PM, Edward Rutherford wrote:
> Hello :
>
> Is the following code an undefined behavior?
>
>
> union {
> int a;
> int b;
> } u;
> u.a = 3;
> printf("%d\n", u.b);


(I rush in where angels fear to tread...)

First, there's no problem with the issue mentioned in your
subject line: It's perfectly all right to have several union members
with distinct names but the same type. If that were not so, even
something as simple as `union { int i; time_t t; } u;' could be in
trouble. See also 6.2.5p20, which says that union members have
"possibly distinct" types.

The "write one member, read another" question has been discussed
more than once, and my impression of the debates is that there have
been two camps: Not "It's legal" and "It's illegal," but "It's legal"
and "You'll probably get away with it, but it might not be squeaky-
clean, and my head hurts can we talk about something else, please?"
(I'm in the latter camp.)

It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
represent `3', and that `u.b' thereby receives the same bytes. No
argument there: The storage allocated to `u.b' holds a representation
of `3'.

The part that makes my head ache is figuring out whether the
compiler is required to "notice" that storing to `u.a' affects the
value of `u.b'. If the compiler has already loaded `u.b' into a
register, say, is it required to re-fetch because `u.a' was changed?
Is the compiler allowed to consider `u.b' uninitialized because it
has never been stored to, despite the store to `u.a'?

To those in the "It's legal" camp, I offer a few puzzling and
possibly disturbing points:

- The footnote to 6.2.5p21 points out that "an object with union
type can only contain one member at a time" -- meaning that if
`u' contains `u.a', it does not contain `u.b'. Footnotes, of
course, are suggestive but non-normative.

- The footnote to 6.5.2.3p3 supports the "It's legal" camp by
describing the mechanism of type punning. Footnotes, of course,
are suggestive but non-normative.

- 6.5.2.3p5 gives a "special guarantee" for union members that
are structs, but does not extend a similar guarantee for other
member types.

- 6.7.2.1p14 has the normative language for the first footnote
mentioned above: "The value of at most one of the members can be
stored in a union object at any time." Your `u' can hold `u.a'
or `u.b', but not both at once.

Those are the citations I can find (if I've missed any I'm sure
others will point them out). Their cumulative impression on me is
that the matter is not settled beyond doubt, but the aforementioned
angels may see things differently.

As a practical matter, it's not all that important what I think
or what the angels think, but what the providers of your compilers
think. If a compiler does something unfortunate with your code you
will find yourself retracing this same argument with implementors
who are trying to stamp NOT A BUG on your complaint. If the angels
weigh in on your side, the implementors of the offending compiler
may eventually accede and agree to ship a fix -- "In a forthcoming
release," oh joy, oh joy. I think you might choose better battles:
Fight over things you Really Really Need and are Really Solid Bugs,
and don't waste troops trying to subjugate the unpopulated hinterland.

--
Eric Sosman
d
 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      12-11-2011
On Sat, 10 Dec 2011 22:06:08 +0000 (UTC), Edward Rutherford
<> wrote:

>Hello :
>
>Is the following code an undefined behavior?
>
>
> union {
> int a;
> int b;
> } u;
> u.a = 3;
> printf("%d\n", u.b);


In C89, paragraph 3.3.2.3 states "With one exception, if a member of a
union object is accessed after a value has been stored in a different
member of the object, the behavior is implementation-defined." The
exception referred to is not related to your example. So the answer
to your question is: yes if the implementation says it is and no if
the implementation says something else.

In C99, the reference to implementation defined is removed.
Furthermore, paragraph 6.2.6.1-7 states "When a value is stored in a
member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond
to other members take unspecified values." Since a and b occupy the
same bytes, none of those byte become unspecified. And footnote 82
indicates the intended behavior is for the bits of b to
"reinterpreted" for the type of b. Since both a and b have the same
type, it seems to me the intention is to retrieve the same value.

--
Remove del for email
 
Reply With Quote
 
Jens Gustedt
Guest
Posts: n/a
 
      12-12-2011
Am 12/12/2011 06:49 PM, schrieb christian.bau:
> On Dec 11, 6:04 am, Barry Schwarz <schwa...@dqel.com> wrote:
> You are right, but that seems to have some awful consequences. Take
> this code:
>
> union {
> int a;
> long b;
> } u;
> u.a = 3;
> printf("%ld\n", u.b);
>
> So on an implementation where int and long have the same size and
> representation, this code would be well-defined and print "3"?
>
> Now take this code:
>
> void f (int* a, long* b) { *a = 3; *b = 4; *a = *a + 2; }
>
> If I call f (&u.a, &u.b) is this required to set both to 6?
> And since the compiler doesn't know that I'm going to make this call,
> lots of optimization goes out of the window?


If I remember correctly the aliasing rules state that the compiler is
allowed to assume that a and b (insided the function) point to different
objects because they are of different types. Thus in the second
assignment to *a the compiler can assume that *a is still 3 and store 5
in place.

Jens


 
Reply With Quote
 
Edward Rutherford
Guest
Posts: n/a
 
      12-12-2011
Eric Sosman wrote:

> On 12/10/2011 5:06 PM, Edward Rutherford wrote:
>> Hello :
>>
>> Is the following code an undefined behavior?
>>
>>
>> union {
>> int a;
>> int b;
>> } u;
>> u.a = 3;
>> printf("%d\n", u.b);

>
> (I rush in where angels fear to tread...)
>
> First, there's no problem with the issue mentioned in your
> subject line: It's perfectly all right to have several union members
> with distinct names but the same type. If that were not so, even
> something as simple as `union { int i; time_t t; } u;' could be in
> trouble. See also 6.2.5p20, which says that union members have
> "possibly distinct" types.
>
> The "write one member, read another" question has been discussed
> more than once, and my impression of the debates is that there have been
> two camps: Not "It's legal" and "It's illegal," but "It's legal" and
> "You'll probably get away with it, but it might not be squeaky- clean,
> and my head hurts can we talk about something else, please?" (I'm in the
> latter camp.)
>
> It's clear (from 6.2.6.1) that writing `u.a' deposits bytes that
> represent `3', and that `u.b' thereby receives the same bytes. No
> argument there: The storage allocated to `u.b' holds a representation of
> `3'.
>
> The part that makes my head ache is figuring out whether the
> compiler is required to "notice" that storing to `u.a' affects the value
> of `u.b'. If the compiler has already loaded `u.b' into a register,
> say, is it required to re-fetch because `u.a' was changed? Is the
> compiler allowed to consider `u.b' uninitialized because it has never
> been stored to, despite the store to `u.a'?
>
> To those in the "It's legal" camp, I offer a few puzzling and
> possibly disturbing points:
>
> - The footnote to 6.2.5p21 points out that "an object with union
> type can only contain one member at a time" -- meaning that if
> `u' contains `u.a', it does not contain `u.b'. Footnotes, of
> course, are suggestive but non-normative.
>
> - The footnote to 6.5.2.3p3 supports the "It's legal" camp by
> describing the mechanism of type punning. Footnotes, of course,
> are suggestive but non-normative.
>
> - 6.5.2.3p5 gives a "special guarantee" for union members that
> are structs, but does not extend a similar guarantee for other
> member types.
>
> - 6.7.2.1p14 has the normative language for the first footnote
> mentioned above: "The value of at most one of the members can be
> stored in a union object at any time." Your `u' can hold `u.a'
> or `u.b', but not both at once.
>
> Those are the citations I can find (if I've missed any I'm sure
> others will point them out). Their cumulative impression on me is that
> the matter is not settled beyond doubt, but the aforementioned angels
> may see things differently.
>
> As a practical matter, it's not all that important what I think
> or what the angels think, but what the providers of your compilers
> think. If a compiler does something unfortunate with your code you will
> find yourself retracing this same argument with implementors who are
> trying to stamp NOT A BUG on your complaint. If the angels weigh in on
> your side, the implementors of the offending compiler may eventually
> accede and agree to ship a fix -- "In a forthcoming release," oh joy, oh
> joy. I think you might choose better battles: Fight over things you
> Really Really Need and are Really Solid Bugs, and don't waste troops
> trying to subjugate the unpopulated hinterland.


Thanks for the explanation, Eric.

Does that mean the "It's Legal" brigade would say it's always legal to
read an unsigned char from an union, whatever was previously stored in
it, on the grounds that an unsigned char cannot contain a trap
representation?
 
Reply With Quote
 
ralph
Guest
Posts: n/a
 
      12-12-2011
On Sat, 10 Dec 2011 18:14:08 -0500, Eric Sosman
<> wrote:

>
> As a practical matter, it's not all that important what I think
>or what the angels think, but what the providers of your compilers
>think. If a compiler does something unfortunate with your code you
>will find yourself retracing this same argument with implementors
>who are trying to stamp NOT A BUG on your complaint. If the angels
>weigh in on your side, the implementors of the offending compiler
>may eventually accede and agree to ship a fix -- "In a forthcoming
>release," oh joy, oh joy. I think you might choose better battles:
>Fight over things you Really Really Need and are Really Solid Bugs,
>and don't waste troops trying to subjugate the unpopulated hinterland.


Here! Here!

Consider that quoted and stolen. <bg>

-ralph
 
Reply With Quote
 
Jens Gustedt
Guest
Posts: n/a
 
      12-12-2011
Am 12/12/2011 09:09 PM, schrieb Edward Rutherford:

> Thanks for the explanation, Eric.
>
> Does that mean the "It's Legal" brigade would say it's always legal to
> read an unsigned char from an union, whatever was previously stored in
> it, on the grounds that an unsigned char cannot contain a trap
> representation?


One thing is sure, the standard explicitly mandates to copy any object
(with memcpy) to an array of `unsigned char`. This is even the way the
term object representation is introduced.

So first of all this means that we are allowed to read all the bytes of
a union. Second it means that all bytes of of the object representation
can be interpreted as unsigned char.

Jens
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      12-13-2011
On 12/12/2011 3:09 PM, Edward Rutherford wrote:
> [...]
> Does that mean the "It's Legal" brigade would say it's always legal to
> read an unsigned char from an union, whatever was previously stored in
> it, on the grounds that an unsigned char cannot contain a trap
> representation?


The varieties of `char' are something of a special case, because
C has always had the notion that it's possible to inspect and maybe
fiddle with the individual bytes of a multi-byte object. At your
peril, of course, since you might invalidate the multi-byte thing.
But still: Things like memcpy() are defined in terms of copying the
individual bytes, and the copy of a valid object must itself be
valid.

The Standard tightens this just a trifle, by allowing the `char'
flavors other than `unsigned' to have trap representations. Still,
`unsigned char' remains as the "atom" of C memory: Its mapping between
representations and values is one-to-one, which guarantees fidelity
both in value and in representation when copying or comparing, and
also guarantees that there are no trap representations.

But back to the `union' issue: I'm still not 100% comfortable
with the idea of writing to one member and reading another. It sort
of looks like it should work, but I've not heard a watertight argument
that it *must* work, even in the face of a ferociously aggressive
optimizer. I think the "It's legal" faction have found arguments they
deem satisfactory; perhaps they've looked more diligently than I have.

Down to nuts and bolts: Is this a theoretical question, or do you
have an actual use case in mind? If the latter, could you describe it?
Maybe someone will be able to say "Well, in *that* case it works" or
"If you did it *this other* way you wouldn't care."

--
Eric Sosman
d
 
Reply With Quote
 
Jens Gustedt
Guest
Posts: n/a
 
      12-13-2011
Hello,

Am 12/14/2011 12:10 AM, schrieb christian.bau:
> On Dec 12, 7:09 pm, Jens Gustedt <jens.gust...@loria.fr> wrote:
>
>> If I remember correctly the aliasing rules state that the compiler is
>> allowed to assume that a and b (insided the function) point to different
>> objects because they are of different types. Thus in the second
>> assignment to *a the compiler can assume that *a is still 3 and store 5
>> in place.

>
> You are right. On the other hand, footnote 82 says:
>
> "If the member used to access the contents of a union object is not
> the same as the member last used to store a value in the object, the
> appropriate part of the object representation of the value is
> reinterpreted as an object representation in the new type as described
> in 6.2.6 (a process sometimes called "type punning"). "
>
> Which is a direct contradiction. I am assuming that the rules for
> union members apply in the same way whether the compiler knows that it
> is accessing different members of the same union or not.


I think this assumption can't be made. Generally, inside a function the
compiler has no way to know that the pointers originate from the same
object. In the contrary the aliasing rules were invented to assure that
under the given circumstances the *must* point to different objects.

And these things happen. gcc assumes (or at least there has been some
version of gcc) that they are different, even if the function is inlined
and it could deduce that both point to the same address.

(Also, footnotes in the standard are not normative)

Jens

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Any optimization technique regarding efficient use of memory spacefor class/Union types? Good Guy C++ 4 10-19-2010 01:32 PM
union field access and compatible types nicolas.sitbon C Programming 6 01-13-2010 09:16 AM
Union like mechanism for complex data types? Andrey Brozhko XML 0 12-21-2004 12:03 PM
union in struct without union name Peter Dunker C Programming 2 04-26-2004 07:23 PM
map XML union to C union (and vice-versa) Matt Garman XML 1 04-25-2004 12:40 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57