Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Function casting - UB?

Reply
Thread Tools

Function casting - UB?

 
 
Jens Gustedt
Guest
Posts: n/a
 
      06-28-2012
Am 28.06.2012 09:27, schrieb Johannes Bauer:
> On 28.06.2012 09:23, Johannes Bauer wrote:
>> On 27.06.2012 19:37, Tim Rentsch wrote:
>>> The other example you give,
>>> assigning to one member of a union and reading from another, is
>>> actually defined behavior, _not_ undefined behavior.

>>
>> I also was under the impression that writing to member x and reading
>> member y of a union is UB. Wikipedia says "This is not, however, a safe
>> use of unions in general.", which is pretty vague (i.e. it's not clear
>> which cases are safe and which are not).
>>
>> Could you elaborate on why you think this is well-defined?

>
> Ah, I just read James' response further down. Interesting. Really
> thought this was undefined. Is this a recent change?


I think this was not considered as a change in contents but given as
more precision on the intent. n1256.pdf has modification marks in this
region so I suppose that these came with TC3. They state

> When a value is stored in a member of an object of union type, the
> bytes of the object representation that do not correspond to that
> member but do correspond to other members take unspecified values.


which in terms of the standard means that it is only UB if these
unspecific values are "forbidden" values for that type, in particular
trap representations.

This means that for most of modern architectures manipulating integer
values (except of _Bool) through unions is completely ok. Floating
point values, _Bool, and pointer types must be treated with more care.

Jens
 
Reply With Quote
 
 
 
 
Tim Rentsch
Guest
Posts: n/a
 
      06-29-2012
Johannes Bauer <(E-Mail Removed)> writes:

> On 27.06.2012 19:37, Tim Rentsch wrote:
>> The other example you give,
>> assigning to one member of a union and reading from another, is
>> actually defined behavior, _not_ undefined behavior.

>
> I also was under the impression that writing to member x and reading
> member y of a union is UB. Wikipedia says "This is not, however, a safe
> use of unions in general.", which is pretty vague (i.e. it's not clear
> which cases are safe and which are not).
>
> Could you elaborate on why you think this is well-defined?


Besides the particular footnote (which you already mentioned in
your own followup), there is just the normative text pertaining to
types and storage access. If you read through the two main
sections on types (6.2.5 and 6.2.6), and also the description of
what happens on lvalue-to-value conversion, I think it's pretty
easy to see that the definition is there (although I freely admit
it isn't expressed as directly as one might like). Basically, the
same passages that explain how ordinary (ie, non-union-member)
access works also explain how access to union members work; the
only thing that's missing is knowing that the respective memories
overlap, which is stated in 6.2.5. There is another detail having
to do with effective type rules, but that doesn't contribute to
defining the semantics; it just needs to be checked to make sure
the effective type rules don't _un_define the semantics (and they
don't, but if you're interested look at 6.5 p6&7).
 
Reply With Quote
 
 
 
 
Tim Rentsch
Guest
Posts: n/a
 
      06-29-2012
Jens Gustedt <(E-Mail Removed)> writes:

> Am 28.06.2012 09:27, schrieb Johannes Bauer:
>> On 28.06.2012 09:23, Johannes Bauer wrote:
>>> On 27.06.2012 19:37, Tim Rentsch wrote:
>>>> The other example you give,
>>>> assigning to one member of a union and reading from another, is
>>>> actually defined behavior, _not_ undefined behavior.
>>>
>>> I also was under the impression that writing to member x and reading
>>> member y of a union is UB. Wikipedia says "This is not, however, a safe
>>> use of unions in general.", which is pretty vague (i.e. it's not clear
>>> which cases are safe and which are not).
>>>
>>> Could you elaborate on why you think this is well-defined?

>>
>> Ah, I just read James' response further down. Interesting. Really
>> thought this was undefined. Is this a recent change?

>
> I think this was not considered as a change in contents but given as
> more precision on the intent. n1256.pdf has modification marks in this
> region so I suppose that these came with TC3. [snip]


Yes, if you read the Defect Report that prompted the change I
think you'll find that the intention was that the behavior
required was supposed to be the same all along (ie, since C90 and
presumably also before that), but changes in wording in other
places raised a concern that this (unchanged) requirement was not
evident enough without the footnote.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      07-07-2012
On Jun 27, 5:44 pm, Tim Rentsch <(E-Mail Removed)> wrote:
> "christian.bau" <(E-Mail Removed)> writes:
> > On Jun 27, 7:45 pm, James Kuyper <(E-Mail Removed)> wrote:

>
> >> A footnote in the current version of the standard says that the result
> >> of reading from a different member of a union than the one last written
> >> is that the bit pattern stored in that memory is reinterpreted according
> >> to the type of the member being read; it would therefore have defined
> >> behavior, so long as that bit pattern is a valid one for that type.
> >> Some have claimed that this conclusion can be derived from the normative
> >> text of the standard, but I find the argument supporting that claim
> >> weak. There's certainly no normative text that says so directly.
> >> However, that is how unions were always intended to work, whether or not
> >> the normative text of the standard has ever actually said so.

>
> > I had to check that, and you are right (footnote 95 in the N1570
> > draft). I think there is a problem. Say long and float have the same
> > size, I have a union containing a long and a float, I write to the
> > long and read the float, then I am supposed to get a float with
> > exactly those bits that I stored. That's perfectly fine.

>
> > But what if the compiler doesn't know that both are elements of the
> > same union? If I just have a long*, and a float*, which _might_ point
> > to members of the same union, but the compiler doesn't know. Does the
> > rule apply then as well? That would completely destroy what is said in
> > other places.

>
> This case is different, because it is addressed by different
> portions of the effective type rules. In particular, using
> the '.' or '->' form of access, the lvalue being accessed
> has a declared type, and so those accesses never violate effective
> type rules. When access is done using pointers, the rule for
> determining effective type is different, so the two accesses
> may very well run afoul of the effective type requirements.
>
> > I'd prefer if this was said in the standard explicitely, but with the
> > restriction that the value must be written, then read, using the . or -
> >> operators.

>
> Unfortunately the Standard often expresses itself rather obliquely,
> and this case certainly falls into that category. However, it should
> be easy to see that the two different cases you bring up are covered
> under different areas of the effective type rules. See 6.5 p6.
> Note especially the first sentence, which applies in the case of
> member access (ie, through '.' or '->', but which does not apply
> in the case of pointer access.


Sorry. Some silly questions if I may, please? Consider the following
programs:

int main(void)
{
union { int x; float y; } u;
u.y = 2;
u.x = 1;
return u.x;
}
/* ---- */
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
*y = 2;
int * x = &u.x;
*x = 1;
return u.x;
}
/* ---- */
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
int * x = &u.x;
*y = 2;
*x = 1;
return u.x;
}
/* ---- */
void foo(int * x, float * y)
{
*y = 2;
*x = 1;
}
int main(void)
{
union { int x; float y; } u;
float * y = &u.y;
int * x = &u.x;
foo(x, y);
return u.x;
}
/* ---- */
The last program above, except with foo in a different translation
unit.

Where exactly do you think we cross from defined behavior to undefined
behavior? I would argue that the first example is clearly not UB, and
the last example with foo() in a different translation unit is
probably UB. Specifically, the intent of the effective type rules is
to allow the compiler to do additional aliasing analysis and reorder
reads and writes that are sufficiently differently typed. With foo()
in a different translation unit, we want the compiler to be able to
reorder the writes to x and y in foo() from type aliasing analysis,
but if we do that then we'll change the semantics of the last program
and have it return garbage.

I don't have a strong opinion on this one. It seems that the intent of
the type access rules and the existence of unions is an inherent
contradiction - with several plausible ways out, of course.
 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      07-08-2012
On Sat, 7 Jul 2012 16:46:42 -0700 (PDT), Joshua Maurice
<(E-Mail Removed)> wrote:

>On Jun 27, 5:44 pm, Tim Rentsch <(E-Mail Removed)> wrote:
>> "christian.bau" <(E-Mail Removed)> writes:
>> > On Jun 27, 7:45 pm, James Kuyper <(E-Mail Removed)> wrote:

>>
>> >> A footnote in the current version of the standard says that the result
>> >> of reading from a different member of a union than the one last written
>> >> is that the bit pattern stored in that memory is reinterpreted according
>> >> to the type of the member being read; it would therefore have defined
>> >> behavior, so long as that bit pattern is a valid one for that type.


<snip>

>Sorry. Some silly questions if I may, please? Consider the following
>programs:
>
> int main(void)
> {
> union { int x; float y; } u;
> u.y = 2;
> u.x = 1;
> return u.x;
> }


<snip three similar examples>

>Where exactly do you think we cross from defined behavior to undefined
>behavior? I would argue that the first example is clearly not UB, and


None of your examples perform the sequence of operations under
discussion. In every case, you store a value in one member of the
union, store a value in a different member of the union, and then
access the member which was last stored. Accessing the last stored
member never yields undefined behavior.

--
Remove del for email
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      07-08-2012
Joshua Maurice <(E-Mail Removed)> writes:

> On Jun 27, 5:44 pm, Tim Rentsch <(E-Mail Removed)> wrote:
>> "christian.bau" <(E-Mail Removed)> writes:
>> > On Jun 27, 7:45 pm, James Kuyper <(E-Mail Removed)> wrote:

>>
>> >> A footnote in the current version of the standard says that the result
>> >> of reading from a different member of a union than the one last written
>> >> is that the bit pattern stored in that memory is reinterpreted according
>> >> to the type of the member being read; it would therefore have defined
>> >> behavior, so long as that bit pattern is a valid one for that type.
>> >> Some have claimed that this conclusion can be derived from the normative
>> >> text of the standard, but I find the argument supporting that claim
>> >> weak. There's certainly no normative text that says so directly.
>> >> However, that is how unions were always intended to work, whether or not
>> >> the normative text of the standard has ever actually said so.

>>
>> > I had to check that, and you are right (footnote 95 in the N1570
>> > draft). I think there is a problem. Say long and float have the same
>> > size, I have a union containing a long and a float, I write to the
>> > long and read the float, then I am supposed to get a float with
>> > exactly those bits that I stored. That's perfectly fine.

>>
>> > But what if the compiler doesn't know that both are elements of the
>> > same union? If I just have a long*, and a float*, which _might_ point
>> > to members of the same union, but the compiler doesn't know. Does the
>> > rule apply then as well? That would completely destroy what is said in
>> > other places.

>>
>> This case is different, because it is addressed by different
>> portions of the effective type rules. In particular, using
>> the '.' or '->' form of access, the lvalue being accessed
>> has a declared type, and so those accesses never violate effective
>> type rules. When access is done using pointers, the rule for
>> determining effective type is different, so the two accesses
>> may very well run afoul of the effective type requirements.
>>
>> > I'd prefer if this was said in the standard explicitely, but with the
>> > restriction that the value must be written, then read, using the . or -
>> >> operators.

>>
>> Unfortunately the Standard often expresses itself rather obliquely,
>> and this case certainly falls into that category. However, it should
>> be easy to see that the two different cases you bring up are covered
>> under different areas of the effective type rules. See 6.5 p6.
>> Note especially the first sentence, which applies in the case of
>> member access (ie, through '.' or '->', but which does not apply
>> in the case of pointer access.

>
> Sorry. Some silly questions if I may, please? Consider the following
> programs:
>
> int main(void)
> {
> union { int x; float y; } u;
> u.y = 2;
> u.x = 1;
> return u.x;
> }
> /* ---- */
> int main(void)
> {
> union { int x; float y; } u;
> float * y = &u.y;
> *y = 2;
> int * x = &u.x;
> *x = 1;
> return u.x;
> }
> /* ---- */
> int main(void)
> {
> union { int x; float y; } u;
> float * y = &u.y;
> int * x = &u.x;
> *y = 2;
> *x = 1;
> return u.x;
> }
> /* ---- */
> void foo(int * x, float * y)
> {
> *y = 2;
> *x = 1;
> }
> int main(void)
> {
> union { int x; float y; } u;
> float * y = &u.y;
> int * x = &u.x;
> foo(x, y);
> return u.x;
> }
> /* ---- */
> The last program above, except with foo in a different translation
> unit.
>
> Where exactly do you think we cross from defined behavior to undefined
> behavior? I would argue that the first example is clearly not UB, and
> the last example with foo() in a different translation unit is
> probably UB. Specifically, the intent of the effective type rules is
> to allow the compiler to do additional aliasing analysis and reorder
> reads and writes that are sufficiently differently typed. With foo()
> in a different translation unit, we want the compiler to be able to
> reorder the writes to x and y in foo() from type aliasing analysis,
> but if we do that then we'll change the semantics of the last program
> and have it return garbage.
>
> I don't have a strong opinion on this one. It seems that the intent of
> the type access rules and the existence of unions is an inherent
> contradiction - with several plausible ways out, of course.


I'm sorry, I didn't see any silly questions. Is it okay if I
just answer what you asked? (See, there's an example of a silly
question.

If we take the effective type rules at face value, I don't think
any of these are undefined behavior. In each case the stores that
are done are consistent with the declared type of the member whose
object is being stored into. Going through the different sequences
(and I admit I haven't checked them as carefully as I might have)
and referring to the effective type rules in each case, I don't see
any violations. That includes the last case where the foo()
function is defined in a different TU, although AFAIK that doesn't
change whether effective type rules are violated.

Of course, this is upsetting, because intuitively we expect that
when it looks like reordering might muck things up then either the
reordering isn't allowed (presumably due to effective type rules
considerations) or the program has crossed over into undefined
behavior (probably because effective type rules have been
violated). None of the obvious alternatives seems appealing, eg,
"no reordering can be done in cases like this" (ick), or "stores
through the x and y pointers can be reordered, and the later access
of u.x just gets one or the other -- ie, unspecified behavior, but
not undefined behavior" (at odds with other parts of the Standard),
or "even though these case follow the letter of the law, effective
type wise, they violate its spirit, and therefore are undefined
behavior" (lacks evidence to be convincing). Of course, any
sensible developer would instinctively shy away from writing such
code, but that doesn't resolve the question.

I have two principal takeaways to offer.

First, how the effective type rules are phrased is somewhat broken,
or at least incomplete. If these examples are defined behavior,
that has serious negative consequences for code reordering. If
they are supposed to have undefined behavior, the effective type
rules don't express that adequately. Neither of those consequences
is acceptable, I would say, and in either case the Standard needs
to clarify what is meant.

Second, as a practical matter, this kind of pattern (taking
addresses of several members of the same union object, storing
through the resultant pointers, then using . or -> to get the value
of one of those members, is likely to be unspecified hehavior as
far as which store occurred last. That behavior is what I think
most seasoned developers would expect, how most actual compilers
will generate code, and (I opine) what the Standard would prescribe
if a suitable way of expressing that presented itself. My feeling
is that cases like this one _should_ be unspecified behavior, and not
undefined behavior, but I also know that finding suitable language
to delimit the boundaries -- clearly, correctly, and exactly --
is not at all an easy task.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      07-08-2012
Barry Schwarz <(E-Mail Removed)> writes:

> On Sat, 7 Jul 2012 16:46:42 -0700 (PDT), Joshua Maurice
> <(E-Mail Removed)> wrote:
>
>>On Jun 27, 5:44 pm, Tim Rentsch <(E-Mail Removed)> wrote:
>>> "christian.bau" <(E-Mail Removed)> writes:
>>> > On Jun 27, 7:45 pm, James Kuyper <(E-Mail Removed)> wrote:
>>>
>>> >> A footnote in the current version of the standard says that the result
>>> >> of reading from a different member of a union than the one last written
>>> >> is that the bit pattern stored in that memory is reinterpreted according
>>> >> to the type of the member being read; it would therefore have defined
>>> >> behavior, so long as that bit pattern is a valid one for that type.

>
> <snip>
>
>>Sorry. Some silly questions if I may, please? Consider the following
>>programs:
>>
>> int main(void)
>> {
>> union { int x; float y; } u;
>> u.y = 2;
>> u.x = 1;
>> return u.x;
>> }

>
> <snip three similar examples>
>
>>Where exactly do you think we cross from defined behavior to undefined
>>behavior? I would argue that the first example is clearly not UB, and

>
> None of your examples perform the sequence of operations under
> discussion. In every case, you store a value in one member of the
> union, store a value in a different member of the union, and then
> access the member which was last stored. Accessing the last stored
> member never yields undefined behavior.


Only the first example (ie, the only one not snipped) stores into
members. The other examples store into objects that happen to
coincide with memory areas corresponding to members of u, but
that's not the same as storing into members. If nothing else,
which parts of the effective type rules govern the accesses
are different in the two cases.
 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      07-08-2012
On Sun, 08 Jul 2012 11:13:53 -0700, Tim Rentsch
<(E-Mail Removed)> wrote:

>Barry Schwarz <(E-Mail Removed)> writes:

snip

>> None of your examples perform the sequence of operations under
>> discussion. In every case, you store a value in one member of the
>> union, store a value in a different member of the union, and then
>> access the member which was last stored. Accessing the last stored
>> member never yields undefined behavior.

>
>Only the first example (ie, the only one not snipped) stores into
>members. The other examples store into objects that happen to
>coincide with memory areas corresponding to members of u, but
>that's not the same as storing into members. If nothing else,
>which parts of the effective type rules govern the accesses
>are different in the two cases.


Do I understand correctly that storing into a member and storing into
the memory occupied by that member are somehow different?

--
Remove del for email
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      07-09-2012
On Jul 8, 3:24*pm, Barry Schwarz <(E-Mail Removed)> wrote:
> On Sun, 08 Jul 2012 11:13:53 -0700, Tim Rentsch
> >Only the first example (ie, the only one not snipped) stores into
> >members. *The other examples store into objects that happen to
> >coincide with memory areas corresponding to members of u, but
> >that's not the same as storing into members. *If nothing else,
> >which parts of the effective type rules govern the accesses
> >are different in the two cases.

>
> Do I understand correctly that storing into a member and storing into
> the memory occupied by that member are somehow different?


I would hope not! (But maybe.) I agree that the current rules are
unclear.

I think/hope that:
struct foo { int x; };
int main(void)
{
struct foo f;
f.x = 1;
}
is definitionally equivalent to:
struct foo { int x; };
int main(void)
{
struct foo f;
int * y;
y = &f.x;
*y = 1;
}
Any decision that makes "f.x = 1;" somehow different than "y = &f.x;
*y = 1;" is my least preferred alternative.

I'd much rather have rules that require the compiler to limit its type
aliasing optimizations when unions are in scope. Basically, a rule in
the standard somewhere which says something like the following. Please
note that I just whipped this up, and I have no clue if it's actually
"correct". It could very probably/definitely be fixed, improved, etc.
I'm just trying to get the ball rolling. There's closely related
alternative formulations that would also be appealing to me.

Quickie Definition: The "lifetime" of a pointer value is the
contiguous interval of time of the program execution, starting when
the pointer value is "created", and ending when the last "copy" or
"derivation" of the pointer value ceases to exist in an object.
Example:
#include <stdlib.h>
int main(void)
{
{
int a[2];
int * x;
int * y;
{
x = a; /* this statement "creates" a pointer value */
}
/*the pointer object "x" exists, and it contains the pointer
value, so it's still "alive" */
y = x + 1;
x = 0;
/* the pointer value is still "alive" because a "derivation" of
it exists in the pointer object "y" */
}
/* the pointer value is now "dead", and the pointer value lifetime
has ended */
}

New Rule: For two accesses to two sufficiently differently typed
members of a union, if:
- the accesses are a write and a read, or two writes, to the union
member objects or sub-objects thereof, and
- the pointer value lifetimes of the pointer values used to do the
accesses overlap, and
- both accesses are done in scopes where the union definition is not
visible, then
- the program has undefined behavior.

This approach formulated disallows all aliasing optimization with the
types in a union when the union definition is in scope. Perhaps there
are "nicer" ways to do this without such a substantial penalty.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      07-10-2012
Barry Schwarz <(E-Mail Removed)> writes:

> On Sun, 08 Jul 2012 11:13:53 -0700, Tim Rentsch
> <(E-Mail Removed)> wrote:
>
>>Barry Schwarz <(E-Mail Removed)> writes:

> snip
>
>>> None of your examples perform the sequence of operations under
>>> discussion. In every case, you store a value in one member of the
>>> union, store a value in a different member of the union, and then
>>> access the member which was last stored. Accessing the last stored
>>> member never yields undefined behavior.

>>
>>Only the first example (ie, the only one not snipped) stores into
>>members. The other examples store into objects that happen to
>>coincide with memory areas corresponding to members of u, but
>>that's not the same as storing into members. If nothing else,
>>which parts of the effective type rules govern the accesses
>>are different in the two cases.

>
> Do I understand correctly that storing into a member and storing into
> the memory occupied by that member are somehow different?


They are, if for no other reason than because effective type
rules are different for the two cases. Let's look at the
pointer case first:

int
f( int *pi, float *pf ){
*pi = 1;
*pf = 2;
return *pi;
}

If pi and pf point to the same place -- for example, to two
members of the same union object -- this function violates
effective type rules, and therefore transgresses into
undefined behavior. So a call like

union { int i; float f; } u;
...
f( &u.i, &u.f );

would provoke undefined behavior. Now consider a similar
function that accesses the union object 'u' directly, eg,

int
g(){
u.i = 1;
u.f = 2;
return u.i;
}

The function g does not violate effective type rules. Its
behavior is defined, subject to the implementation-defined
representations of the two types involved. That is, it
should obey all the regular access rules, and there are no
'shall' stipulations that it violates (at least, I'm not
aware of any, and I've looked fairly long and hard at
questions like this), and that is enough to define the
behavior (again, subject to how the types are represented).

It makes sense that these two cases would be different. If
they weren't, then everywhere there were pointers to two
different types, those pointers might potentially point to
members of the same union object, which would greatly inhibit
potential code movement. Also, the "special guarantee" of
6.5.2.3 p6 would not be needed, because the possibility of
the two struct types belonging to the same union would (under
the assumption that pointers to objects of members and direct
member access is the same) be enough to guarantee correct
behavior. If that were so, there would be no reason to have
the guarantee of 6.5.2.3 p6.

In footnote 95 (footnote 83 in N1256), the Standard says in
plain English what happens when one member is read when
another has been stored into. But notice the way it says
that:

If the member used to read the contents of a union object
is not the same as the member last used to store a value
in the object, ...

Note: 'the member /used/ to read', and 'the member last /used/ to
store' (my emphasis). The explanation in the footnote applies only
to member access that is done directly, ie, using '.' or '->', and
not just dereferencing a pointer that happens to point to the
member in question. And that distinction is consistent with the
differences in how effective type rules treat the two situations.

Does this help explain my earlier statement?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Up casting and down casting Sosuke C++ 2 12-20-2009 03:24 PM
Problem with depracated casting method (down casting) Wally Barnes C++ 3 11-20-2008 05:33 AM
write a function such that when ever i call this function in some other function .it should give me tha data type and value of calling function parameter komal C++ 6 01-25-2005 11:13 AM
Another question about inheritance (up-casting and down-casting) kevin Java 11 01-08-2005 07:11 PM
Casting member function pointer Bren C++ 4 09-18-2003 06:08 PM



Advertisments