Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > A basic (?) problem with addresses (gcc)

Reply
Thread Tools

A basic (?) problem with addresses (gcc)

 
 
Joshua Maurice
Guest
Posts: n/a
 
      12-17-2010
On Dec 17, 7:54*am, Nick Bowler <(E-Mail Removed)> wrote:
> On Thu, 16 Dec 2010 20:27:56 -0800, Joshua Maurice wrote:
> > suppose the malloc implementation ran such that a points to the same
> > memory address as b? How is this not a violation of the strict
> > aliasing rules? When does the lifetime of one object begin and the
> > lifetime of one object end? Is malloc and free special in this regard?
> > What if I used my own memory allocator, such as follows?

>
> The key difference is that allocated storage has no declared type. *In
> this case, the following applies (C99 6.5#6, emphasis mine):
>
> * If a value is stored into an object having no declared type
> * through an lvalue having a type that is not a character type,
> * then the type of the lvalue becomes the effective type of the
> * object **for that access** and for subsequent accesses that do
> * not modify the stored value.
>
> So both your first and second examples are perfectly fine w.r.t strict
> aliasing rules, annotations inline:
>
> > * /* This is a very bad memory allocator. It tracks only a single block
> > * of a fixed size. It's not threadsafe.
> > * It requires an explicit init and deinit. It's global.
> > * Let's ignore all that for now.*/
> > * void* my_malloc_ptr;
> > * int my_malloc_ptr_is_allocated_to_user; void my_malloc_init()
> > * { assert( ! my_malloc_ptr);
> > * * my_malloc_ptr = malloc(1024);

>
> Here, the object pointed to by my_malloc_pointer (assuming that
> allocation was successful) has no declared type, so its effective
> type is determined by subsequent accesses as per 6.5#6.
>
>
>
> > * * my_malloc_ptr_is_allocated_to_user = 0;
> > * }

> [...]
> > * int main()
> > * {
> > * * my_malloc_init();
> > * * int* a = my_malloc();
> > * * *a = 1;

>
> For this assignment, the effective type of the allocated object is int,
> so the access is ok.
>
> > * * printf("%d\n", *a);

>
> The last assignment to the allocated object set its efective type to
> int, so this access is also ok.
>
> > * * my_free(a);
> > * * float* b = my_malloc();
> > * * *b = 2;

>
> Despite the earlier access as int, the effective type of the allocated
> object is float for this assignment, so this access is ok.
>
> > * * printf("%f\n", *b);

>
> The last assignment to the allocated object set its effective type to
> float, so this access is also ok.
>
> > * * my_free(b);
> > * * my_malloc_deinit();
> > * }

>
> You would violate the strict aliasing rules if your program assigned to
> the allocated object via b and then subsequently read from the allocated
> object via a (or vice versa), but your program does not do this.


That seems like the sensible interpretation. However, you didn't the
interesting followup for this interpretation. Let me post it again:

/* Program 1 */
#include <stdlib.h>
int main()
{
void* p;
int* x;
float* y;
p = malloc(sizeof(int) + sizeof(float));
x = p;
y = p;
*x = 1;
*y = 2;
return *y;
}

/* Program 2 */
#include <stdlib.h>
void foo(int* x, float* y)
{
*x = 1;
*y = 2;
}
int main()
{
void* p;
int* x;
float* y;
p = malloc(sizeof(int) + sizeof(float));
x = p;
y = p;
foo(x, y);
return *y;
}

This problem is otherwise known as the union DR. Is the compiler
allowed to change foo to:

void foo(int* x, float* y)
{
*y = 2;
*x = 1;
}

This happens in gcc all the time, for more complex code. gcc
frequently assumes that int* and float* do not alias - unless there is
something in scope that could make them alias. It uses some mostly (?)
undocumented and entirely non-standardized rules to determine when
sufficiently differently typed pointer may alias and may not alias.

As written, both of my examples follow the your rule, the rule that
there shall be no read on a piece of storage through type T which
reads the result of a write of a sufficiently different type U.
However, if the compiler is able to assume that sufficiently
differently typed pointers don't alias, then it can take a conforming
program and break it - specifically in example the above program and
reordering of foo.
 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      12-17-2010
On Dec 17, 5:06*am, pete <(E-Mail Removed)> wrote:
> Joshua Maurice wrote:
>
> > On Dec 16, 9:26 pm, pete <(E-Mail Removed)> wrote:
> > > Joshua Maurice wrote:
> > > > When does the lifetime of one object begin and
> > > > the lifetime of one object end? Is malloc and free special in this
> > > > regard?

>
> > > ISO/IEC 9899:1999 (E)

>
> > > 6.2.4 Storage durations of objects
> > > 1 An object has a storage duration that determines its lifetime.
> > > * There are three storage durations: static, automatic, and
> > > allocated. * * Allocated storage is described in 7.20.3.

>
> > > 7.20.3 Memory management functions

>
> > > The lifetime of an allocated object extends
> > > from the allocation until the deallocation.

>
> > So, malloc and free are special, and a user cannot write his own
> > memory allocator in purely conforming C code?

>
> Yes.
>
> K&R2 8.7 describes how make a storage allocator,
> but it's for UNIX systems.
>
> C89 was before they realised that "allocated"
> was a kind of duration.
>
> ISO/IEC 9899: 1990
> 6.1.2.4 Storage durations of objects
> * * An object has a storage duration that determines its lifetime.
> * * There are two storage durations: static and automatic.


This is an entirely consistent approach, I agree. You treat malloc and
free as special, despite the fact that common implementations IIRC do
nothing special with malloc and free - they exist purely as userland
libraries with little to no special compiler support. It's consistent,
but wholely unsatisfactory, and possibly (?) inconsistent with
practice at large.

I assume that plenty of people have written their own memory
allocators, and they expect that they should work. At least, I guess.
Am I right? Do C most programmers agree that they can't write a
conforming memory allocator in pure C code on top of malloc? I'm not
very knowledgeable in this area, but it just seems so contrary to the
practice which I've seen.
 
Reply With Quote
 
 
 
 
Nick Bowler
Guest
Posts: n/a
 
      12-17-2010
On Fri, 17 Dec 2010 21:55:28 +0000, Jens Thoms Toerring wrote:
> The other aspect is the question what kind of casts are required to
> "work" according to the standard. And what I found (in 3.3.4 in the C89
> standard) is the following:
>
> A pointer to an object or incomplete type may be converted to a
> pointer to a different object type or a different incomplete type.
> The resulting pointer might not be valid if it is impro- perly aligned
> for the type pointed to. It is guaranteed, how- ever, that a pointer
> to an object of a given alignment may be converted to a pointer to an
> object of the same alignment or a less strict alignment and back
> again; the result shall compare equal to the original pointer.
>
> So a cast from a pointer to an object of type T to a pointer to an
> object of a different type U might result in an invalid pointer under
> the stated conditions, i.e. if the alignment requirements of U are more
> strict than those of T. In that case only the union-trick will do (see
> below).


Note that the effective type rules (i.e., -fstrict-aliasing) do not
prevent you from converting pointers between two types with the same
alignment. The rules only apply when you dereference such a pointer.
So -fstrict-aliasing doesn't violate the above C89 requirements.

> In that sense I would consider the behavior of gcc 4.3.2 with
> '-fstrict-aliasing' as not being standard compatible had the
> documentation not at the same time warned about this fact - the fix
> for the problem is thus not to use '-fstrict-aliasing' under these
> conditions when one wants a fully standard compliant compiler.


The OP's posted code has undefined behaviour in C99 because it violates
the "shall" requirement in 6.5#7. This is a requirement not in C89, but
as I don't have a copy handy, I can't say whether or not gcc's
-fstrict-aliasing option renders the compiler non-conforming to C89.

It would be interesting to know if there is a strictly conforming C89
program that both (a) violates no constraints of C99, and (b) has
undefined behaviour in C99 due to the new rules about effective types.

> Now concerning unions. I don't think that unions were meant specifically
> for that kind of stuff.


Yes and no. C99 adopted new wording which comes just shy of officially
blessing this use: in C89, the results of reading a union member other
than the last stored member were undefined. C99 drops this text and
instead says that assigning to a union member causes the bytes of the
object representation of other members to become unspecified.

Furthermore, unions are explicitly exempt from the aliasing problems
which the OP experienced.

So while no strictly conforming C99 program can use unions in this
fashion, and while the DS9K might ensure that trap representations occur
whenever possible, it's clear that the authors of the C99 standard
intended for this use of unions to be available. It is even explicitly
mentioned in a (non-normative) footnote.

 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      12-18-2010

"Seebs" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On 2010-12-17, BartC <(E-Mail Removed)> wrote:


>> If it has been written to (initialising y to 0 would have been better),
>> then
>> how did that rogue value get there? What were they trying to optimise?

>
> The general case is this:
>
> int foo(int *a, float *b) {
> float x = *b;
> *a = 3;
> x = *b;
> }
>
> Can the second read from *b be optimized away? In C, yes. In your
> hypothesized language, no.


OK, thanks, this makes it clearer (although it's still not clear why the
assignment to y was optimised out; it looks like the compiler didn't even
attempt to perform one assignment, let alone two, for reasons of it's own).

Does that also mean the x=*b assignment *cannot* be optimised out here:

int foo(int *a, int *b) {
int x = *b;
*a = 3;
x = *b;
}

--
Bartc

 
Reply With Quote
 
Nick Bowler
Guest
Posts: n/a
 
      12-18-2010
On Fri, 17 Dec 2010 14:11:59 -0800, Joshua Maurice wrote:
> On Dec 17, 7:54*am, Nick Bowler <(E-Mail Removed)> wrote:
>> You would violate the strict aliasing rules if your program assigned to
>> the allocated object via b and then subsequently read from the
>> allocated object via a (or vice versa), but your program does not do
>> this.

>
> That seems like the sensible interpretation. However, you didn't the
> interesting followup for this interpretation. Let me post it again:


I admit that I replied hastily and only skimmed the last half of your
post. Let's look at the first program:

> /* Program 1 */
> #include <stdlib.h>
> int main()
> {
> void* p;
> int* x;
> float* y;
> p = malloc(sizeof(int) + sizeof(float)); x = p;
> y = p;


No access to the allocated storage has occurred yet, so all the above is ok
(modulo allocation failures).

> *x = 1;


Effective type is int for the above assignment.

> *y = 2;


That didn't last long; effective type is float for this assignment.

> return *y;


Access of the allocated object with effective type float via lvalue of type
float is OK.

> }


So there are no aliasing problems in the above. Let's look at the other
program:

> /* Program 2 */
> #include <stdlib.h>
> void foo(int* x, float* y)
> {
> *x = 1;
> *y = 2;
> }
> int main()
> {
> void* p;
> int* x;
> float* y;
> p = malloc(sizeof(int) + sizeof(float)); x = p;
> y = p;
> foo(x, y);
> return *y;
> }


This is exactly the same as program #1; you've simply moved the accesses
into a function. The wording in the standard does not seem to care
about *where* in the program the various accesses to an object occur,
only *when*.

> This problem is otherwise known as the union DR. Is the compiler allowed
> to change foo to:
>
> void foo(int* x, float* y)
> {
> *y = 2;
> *x = 1;
> }


I believe that the above reordering, as written, is not allowed, for
reasons I have already stated. However, if we can change foo ever so
slightly:

int bar(int *x, float *y)
{
*x = 1;
*y = 2;

return *x;
}

Now the implementation _can_ assume that *x and *y do not alias the
same object, because if they did, the second reference to *x would
violate the effective type rules and thus program behaviour is
undefined. So the compiler is well within its rights to replace the
second *x with a constant 1 or even to re-order the stores to *x and *y!

> This happens in gcc all the time, for more complex code. gcc frequently
> assumes that int* and float* do not alias - unless there is something in
> scope that could make them alias. It uses some mostly (?) undocumented
> and entirely non-standardized rules to determine when sufficiently
> differently typed pointer may alias and may not alias.


Be sure that "more complex code" does not involve further references
like my above example. I agree that GCC's documentation is not very
specific in this regard.

> As written, both of my examples follow the your rule, the rule that
> there shall be no read on a piece of storage through type T which reads
> the result of a write of a sufficiently different type U. However, if
> the compiler is able to assume that sufficiently differently typed
> pointers don't alias, then it can take a conforming program and break it


Right, I think that _only_ taking the types into account (without also
considering the contexts of the references) is insufficient for a
conforming implementation.

 
Reply With Quote
 
Nick Bowler
Guest
Posts: n/a
 
      12-18-2010
On Sat, 18 Dec 2010 00:17:29 +0000, BartC wrote:
> OK, thanks, this makes it clearer (although it's still not clear why the
> assignment to y was optimised out; it looks like the compiler didn't
> even attempt to perform one assignment, let alone two, for reasons of
> it's own).


Analyzing why programs do what they do in the presence of undefined
behaviour is generally not productive.

Nevertheless, most likely what the compiler has optimized away is not
the assignments themselves, but rather it has optimized away stores of
some CPU registers to memory that would be required for the subsequent
accesses to make sense. Instead of storing the object representation
of an int to memory and then re-interpreting those bytes as a float, the
store has been optimized away and we just end up with garbage bytes
being re-interpreted as a float.

These are exactly the kind of optimizations that the aliasing rules in
C99 are designed to permit: by allowing the compiler to make assumptions
that certain references cannot alias the same object, the compiler can
emit code which keeps more things in CPU registers for longer, avoiding
(generally very expensive in comparison) memory accesses.
 
Reply With Quote
 
Seebs
Guest
Posts: n/a
 
      12-18-2010
On 2010-12-18, BartC <(E-Mail Removed)> wrote:
> Does that also mean the x=*b assignment *cannot* be optimised out here:
>
> int foo(int *a, int *b) {
> int x = *b;
> *a = 3;
> x = *b;
> }


Right!

Which is why "restrict" was useful enough to justify specifying.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      12-18-2010
On Dec 17, 6:36*pm, Nick Bowler <(E-Mail Removed)> wrote:
> On Fri, 17 Dec 2010 14:11:59 -0800, Joshua Maurice wrote:
> > This problem is otherwise known as the union DR. Is the compiler allowed
> > to change foo to:

>
> > * void foo(int* x, float* y)
> > * {
> > * * *y = 2;
> > * * *x = 1;
> > * }

>
> I believe that the above reordering, as written, is not allowed, for
> reasons I have already stated. *However, if we can change foo ever so
> slightly:
>
> * int bar(int *x, float *y)
> * {
> * * *x = 1;
> * * *y = 2;
>
> * * return *x;
> * }
>
> Now the implementation _can_ assume that *x and *y do not alias the
> same object, because if they did, the second reference to *x would
> violate the effective type rules and thus program behaviour is
> undefined. *So the compiler is well within its rights to replace the
> second *x with a constant 1 or even to re-order the stores to *x and *y!


Interesting. I didn't consider this case at all. I was about to ask
what actual non-trivial optimizations the compiler can perform through
your aliasing rules if it can't reorder my foo. I read your reply
again, and I realized that this example is a perfect example. *x and
*y may refer to the same memory location, but if they do, then your
foo has undefined behavior even before the compiler attempts any
reodering (quote unquote - I'll gloss over register allocation et
al.). Because if they alias, then foo has undefined behavior before
any reordering (quote unquote), the compiler is free to assume that
they don't, and thus it's free to reorder them (quote unquote).

Thank you.

Now, I just have to figure out if this is what most compiler writers,
most of the standard writers, and most competent practitioners
interpret the rules to mean. Your rules are sensible, but I don't know
if that's the actual common interpretation.

For example, I've already had a reply else-thread from pete
<(E-Mail Removed)> which states that:
1- All of the programs we've been discussing have undefined behavior
because we've used a single piece of storage as two or more
sufficiently different object types.
2- malloc and free are special with regards to the language. Once a
piece of memory has been free'd and (re)malloc'ed, you can treat that
piece of memory as a different object type, but only once per
allocation. A corollary is that you cannot write a conforming memory
allocator in pure C on top of malloc, nor mmap, etc.

I'm concerned that there is no consensus because no one has yet to
reply to my comp.std.c++ thread, and because the proposed resolution
on the C++ standards committee website directly contradictions your
(Nick Bowler's) interpretation - it's a third interpretation, though
it's closer to pete's than Nick's.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      01-02-2011
Seebs <(E-Mail Removed)> writes:

> On 2010-12-15, BartC <(E-Mail Removed)> wrote:
>> What was wrong with it? Assuming int and float are the same sizes and are
>> aligned in a compatible way.

>
> It tried to read something throug an lvalue of the wrong type. Ultimately,
> this violates the strict aliasing rules; the compiler is allowed to ignore
> the reference or do anything it wants with it.


You mean effective type rules. The term 'strict aliasing' is a gcc-ism
and independent of the C Standard.
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      01-02-2011
(E-Mail Removed) writes:

> BartC <(E-Mail Removed)> wrote [re. unions]:
>>
>> I must have got the idea somewhere that you could only read out the same
>> member that was last written.

>
> C89. The rules were changed in C99 to bless what everyone expected and
> all known implementations did anyway.


Looks more like a clarification of what was intended than an actual rule
change.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to implement a firewall for Windows platform that blocks based on Mac addresses instead of IP addresses cagdas.gerede@gmail.com C Programming 1 12-07-2006 04:30 AM
Physical Addresses VS. Logical Addresses namespace1 C++ 3 11-29-2006 03:07 PM
What is the difference between Visual Basic.NET and Visual Basic 6? Jimmy Dean Computer Support 3 07-25-2005 07:05 AM
Upgrading Microsoft Visual Basic 6.0 to Microsoft Visual Basic .NET Jaime MCSD 2 09-20-2003 05:16 AM
Basic question about casting and addresses drowned C++ 4 08-03-2003 12:25 AM



Advertisments