Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Aliasing in C99

Reply
Thread Tools

Re: Aliasing in C99

 
 
Tim Rentsch
Guest
Posts: n/a
 
      05-31-2012
David Brown <(E-Mail Removed)> writes:

> I am trying to figure out how to get aliasing to work correctly according
> to the C99 rules. For example, converting between a float and its binary
> representation.
>
> float negPCast(float x) {
> uint32_t u = *((uint32_t *) &x);
> u ^= 0x80000000u;
> return *((float *) &u);
> }
>
> In the absence of type-based aliasing, this will negate a float using
> just a simple xor operation (ignore any issues with endianness, int
> sizes, NaNs, etc., since this is just an example).
>
>
> The pointer typecasting here will break strict aliasing rules, and is
> therefore not valid C99. (I'm guessing that in this case, most compilers
> will generate code that works as desired - but I'm looking for strictly
> conforming methods.)


Point of terminology: the appropriate term is "effective type"
rules, ie, this terminology is what the C Standard uses. The
term "strict aliasing" is a gcc-ism; the rules for "strict
aliasing" are similar to, but not exactly the same as, effective
type rules as the C Standard defines them.


> It is possible to re-implement it using type-punning unions:
>
> float negUnion(float x) {
> union { float f; uint32_t u; } uf;
> uf.f = x;
> uf.u ^= 0x80000000;
> return uf.f;
> }
>
> This doesn't use pointer typecasting, but I believe type-punning unions
> are undefined in C but implemented "properly" in most compilers.


Defined, not undefined. The specific behavior depends on the
representations of the types in question, but presumably these
representations are suitable on the systems you want to run on.


> It is also possible to use pointers to unions in casts:
>
> float negUnionPCast(float x) {
> typedef union { float f; uint32_t u; } UF;
> uint32_t u = ((UF*) &x)->u;
> u ^= 0x80000000u;
> return ((UF*) &u)->f;
> }
>
> I /think/ pointer casts like this are not subject to strict aliasing
> rules, but I don't know if the union usage is valid.


This approach gives undefined behavior, for several different
reasons, as others have explained. There's a good chance it will
work, but certainly that's not guaranteed under the Standard.


> Does anyone know of other ways that are strictly valid and defined in
> C99, and that also are efficient in use (I'd like to avoid things like
> casting back and forth between char or char pointers, or volatile
> accesses, etc.)?


The union method is well-defined (modulo the proviso about type
representations) and should work just fine. Since you are using
C99, this approach can be coded more directly using compound
literls, viz., (I am using 'unsigned' rather than 'uint32_t'
but they are the same on my system):

float
negate_float( float x ){
typedef union { float f; unsigned u; } UF;
return (UF){ .u = (UF){ x }.u ^ 0x80000000 }.f;
}

Compiling this function with gcc (using -O2 or -O3), the
generated code looks pretty good, about what I'd expect and also
as good as I think you would hope for. If an 'inline' qualifier
is added to the function definition, then generated code for a
call is just three instructions (this is on an x86), ie, load,
xor, store, and no floating point instructions.
 
Reply With Quote
 
 
 
 
Tim Rentsch
Guest
Posts: n/a
 
      06-01-2012
David Brown <(E-Mail Removed)> writes:

> On Thu, 31 May 2012 13:57:58 -0700, Tim Rentsch wrote:
>
>> David Brown <(E-Mail Removed)> writes:
>> [snip]

>
> OK - this is the main point I've learned with this thread (and the reason
> I asked in the first place). I know how to use such unions, I knew they
> worked in practice - know I also know they work in theory (assuming, as
> you say, the underlying representations are known).


Good deal.


>>> It is also possible to use pointers to unions in casts:
>>>
>>> float negUnionPCast(float x) {
>>> typedef union { float f; uint32_t u; } UF; uint32_t u = ((UF*)
>>> &x)->u;
>>> u ^= 0x80000000u;
>>> return ((UF*) &u)->f;
>>> }
>>>
>>> I /think/ pointer casts like this are not subject to strict aliasing
>>> rules, but I don't know if the union usage is valid.

>>
>> This approach gives undefined behavior, for several different reasons,
>> as others have explained. There's a good chance it will work, but
>> certainly that's not guaranteed under the Standard.
>>

>
> OK. I'm not entirely confident about /why/ this is not correct according
> to the standard, and it seems to be in conflict with other things I've
> read. But either way, it leads to the same conclusion - it can't be
> relied on to work, and so should not be used.


For the same reason (among others) you were concerned in the
first place, ie, aliasing rules. You have something that is a
float, and in particular a float not in a union, and you access
it through a pointer to a union type! It's possible -- and I'm
not sure about this -- that the Standard does indeed allow this
under effective type rules. However, it's a grey area, and
because of that compilers may take liberties with what kinds of
optimizations that allow in such cases. In almost all cases
accessing something through a pointer converted to a type
not the same as that of the target is best avoided. There
also are alignment and padding byte issues, as others have
mentioned; no reason to start down that path when there
is another one that is easier and safer.


>>> Does anyone know of other ways that are strictly valid and defined in
>>> C99, and that also are efficient in use (I'd like to avoid things like
>>> casting back and forth between char or char pointers, or volatile
>>> accesses, etc.)?

>>
>> The union method is well-defined (modulo the proviso about type
>> representations) and should work just fine. Since you are using C99,
>> this approach can be coded more directly using compound literls, viz.,
>> (I am using 'unsigned' rather than 'uint32_t' but they are the same on
>> my system):
>>
>> float
>> negate_float( float x ){
>> typedef union { float f; unsigned u; } UF;
>> return (UF){ .u = (UF){ x }.u ^ 0x80000000 }.f;
>> }
>>
>> Compiling this function with gcc (using -O2 or -O3), the generated code
>> looks pretty good, about what I'd expect and also as good as I think you
>> would hope for. If an 'inline' qualifier is added to the function
>> definition, then generated code for a call is just three instructions
>> (this is on an x86), ie, load, xor, store, and no floating point
>> instructions.


[Note that the quoted function body had line-wrapping issues not
present in the original, which I have repaired above.]

> I like the idea, as it is an elegant and efficient solution. However,
> I'm a big fan of clear and explicit code, and my code has to be easily
> understood by others (even those not yet well versed in C99) - I think
> this looks a bit convoluted for common use. But in some circumstances,
> it could be the best solution.


I am also a fan of clear code. I suspect the issue here is
not lack of clarity but lack of familiarity; compound literals
were introduced in C99 and few people use them. So there is
something of a chicken and egg problem. However, if you
aren't comfortable using compound literals, we can still
write a simple function using a direct, functional style
(disclaimer: not compiled):

float
negate_float( float x ){
typedef union { unsigned u; float f; } UF;
const UF f = { .f = x }, u = { .u = f.u ^ 0x80000000 };
return u.f;
}

The type punning change from float to unsigned happens at 'f.u',
and from unsigned to float happens at 'u.f'. Taking a functional
approach allows the two union variables to be 'const'. Personally
I think this functional style is easier to understand than an
imperative one where a single union object is serving two different
purposes.


> With my brief testing (with gcc on amd64), all versions of the functions
> gave the same code, including the movement of data onto the stack
> mentioned elsewhere in the thread. But that's not a concern for me, as
> that is not one of my targets.


Another reason for writing this function using a functional style
rather than an updating assignment is that it's often easier for a
compiler to optimize such code, mapping as it does very
straightforwardly onto a single-assignment canonical form. Gcc
is pretty clever at optimizing, but for another compiler of
unknown abilities I think there is a better chance of it
optimizing nicely if this kind of functional approach is taken.
 
Reply With Quote
 
 
 
 
Tim Rentsch
Guest
Posts: n/a
 
      06-01-2012
David Brown <(E-Mail Removed)> writes:

> On 01/06/2012 03:15, Tim Rentsch wrote:
>> David Brown<(E-Mail Removed)> writes:
>>
>>> On Thu, 31 May 2012 13:57:58 -0700, Tim Rentsch wrote:
>>>
>>>> David Brown<(E-Mail Removed)> writes:

[several snips done in the following, for compactness]

>>> [on the matter of using a casted pointer]

>>
>> For the same reason (among others) you were concerned in the
>> first place, ie, aliasing rules. You have something that is a
>> float, and in particular a float not in a union, and you access
>> it through a pointer to a union type!


I should have been more specific about my reaction. The wording
of the effective type rules is rather clumsy. I think it's hard
to make a convincing argument either way, based just on that
wording and nothing else. Despite that, I think it's reasonable
to make an educated guess as to the intention behind what was
actually written, and that would go like this: on the one hand
we have a simple variable, ie, not in a union (or struct), and on
the other hand we have an access to a member of a union (struct
member access would be equivalent); we know that the standalone
variable cannot possibly be in a struct or union, whereas member
access _must_ refer to an actual struct or union object, and
therefore the two objects in question must be distinct, ie, no
aliasing can definedly occur between them.

To repeat myself, I wouldn't call this an ironclad argument,
reasoning as it does based somewhat on speculating about the
underlying intention. However, I think the basic reasoning
is convincing enough so that some implementations might take
the same view, and that's why I think it's a grey area, and
consequently best avoided.

> Casts to unions are discussed in the gcc documentation as a gcc
> extension, and are safe to use (carefully) with gcc - but while I
> often use various gcc ports, I also use other compilers, so gcc
> extensions are not a solution.


Even if casts to unions were universally available, casting
a pointer to a non-compound type (ie, not a struct or union)
to a pointer to a compound type is a totally different beast,
because casting to a union (or struct) operates on values,
whereas pointer casting implicitly operates on object
representations and may have resulting aliasing issues.
Casting that works on values can never, in and of itself,
have even potential aliasing issues; casting that works
on pointers always can.


>> However, if you
>> aren't comfortable using compound literals, we can still
>> write a simple function using a direct, functional style
>> (disclaimer: not compiled):
>>
>> float
>> negate_float( float x ){
>> typedef union { unsigned u; float f; } UF;
>> const UF f = { .f = x }, u = { .u = f.u ^ 0x80000000 };
>> return u.f;
>> }
>>

>
> I'd split the line in two:
> const UF f = (UF) { .f = x };
> const UF u = (UF) { .u = f.u ^ 0x80000000 };


I'm fine with either one line or two. Any developer worth his
salt should be able to read either form with no difficulty, and
I think it's wrong to be overly dogmatic about a "one declaration
per line" rule. That said, the specific case here is (for me at
least) below the threshold of arguing one way or another.

Incidentally, it's usually a good rule in newsgroup postings
to use multiple spaces rather than tabs for indentation (and
for that matter all other uses too).


> Then I think it is clear even to people unfamiliar with
> compound literals.


Note that the function body I wrote did not use compound
literals, but just regular initialization. Your two lines
above could (and perhaps should) have been written thusly:

const UF f = { .f = x };
const UF u = { .u = f.u ^ 0x80000000 };

I expect most developers would prefer this writing to the
earlier alternative.

If compound literals are okay for your audience, it seems
natural to avoid the temporary variable 'f', which is used
in only one place; that would allow one declaration instead
of two:

const UF u = { .u = 0x80000000 ^ (UF){ .f = x }.u };

But then I think you know where this is going.


> Thanks for your suggestions and comments.


You're welcome, it's good to know they have been helpful.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Aliasing in C99 Eric Sosman C Programming 3 05-31-2012 09:21 PM
Re: Aliasing in C99 Xavier Roche C Programming 1 05-31-2012 07:10 PM
Re: Aliasing in C99 James Kuyper C Programming 0 05-31-2012 02:21 PM
C99, strict aliasing Mike C Programming 5 07-21-2010 06:47 AM
C99 complex numbers and aliasing Glen Low C Programming 5 08-20-2004 10:13 PM



Advertisments