Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Strict aliasing and Q2.6 in the FAQ

Reply
Thread Tools

Strict aliasing and Q2.6 in the FAQ

 
 
Conor F
Guest
Posts: n/a
 
      09-19-2011
(Trying this again as the velocityreviews site doesn't seem to forward
to NNTP - hope this doesn't appear twice!

Back to this old topic again. Sorry about this but I'm just not sure
if aliasing applies in this case. Question 2.6 in the FAQ describes
the case of using one malloc and piggy backing a char * onto it. It's
a pretty common idiom I would have thought, but I'm now having my
doubts:

struct name {
int namelen;
char *namep;
};

struct name *makename(char *newname)
{
char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);

struct name *ret = (struct name *)buf;
ret->namelen = strlen(newname);
ret->namep = buf + sizeof(struct name);
strcpy(ret->namep, newname);

return ret;
}

I don't really believe there are aliasing issues here due to a char *
being reassigned to a struct name *.

But. If you did this instead:

struct name *ret = malloc(sizeof(struct name) + strlen(newname) +
1);
char *buf = (char *)(&ret[1]);

which also seems a perfectly reasonable way of going about it, and
avoids the sizeof(struct name) addition which can be a little tricky
in the cases where you have several leading structs. Ok, it's not
terrible but I always though the above was clearer.

Anyway - you've now taken an object of type struct name and converted
to a different type pointing to the same memory.

Isn't that an aliasing issue?

And if char * is a special case then what if I had used a wchar_t
instead? Or another type?

It's all a bit subtle for me.

Conor.

(Hum. Google won't let me post with my hotmail address any more. How
annoying...)
 
Reply With Quote
 
 
 
 
Harald van Dijk
Guest
Posts: n/a
 
      09-19-2011
On Sep 19, 11:53*pm, Conor F <(E-Mail Removed)>
wrote:
> struct name {
> * * int namelen;
> * * char *namep;
>
> };
>
> struct name *makename(char *newname)
> {
> * * char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);
>
> * * struct name *ret = (struct name *)buf;
> * * ret->namelen = strlen(newname);
> * * ret->namep = buf + sizeof(struct name);
> * * strcpy(ret->namep, newname);
>
> * * return ret;
>
> }
>
> I don't really believe there are aliasing issues here due to a char *
> being reassigned to a struct name *.


That's fine, but for a different reason. You're accessing
sizeof(struct name) bytes as struct name, and you're accessing the
following strlen(newname)+1 bytes as char. You never access any data
as a type it isn't.

Because of that, your suggested alternative has no aliasing issues
either.

My preference would be to use neither, and instead use the C99
alternative

struct name {
int namelen;
char name[];
};
 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      09-20-2011
On 9/19/2011 5:53 PM, Conor F wrote:
> (Trying this again as the velocityreviews site doesn't seem to forward
> to NNTP - hope this doesn't appear twice!
>
> Back to this old topic again. Sorry about this but I'm just not sure
> if aliasing applies in this case. Question 2.6 in the FAQ describes
> the case of using one malloc and piggy backing a char * onto it. It's
> a pretty common idiom I would have thought, but I'm now having my
> doubts:
>
> struct name {
> int namelen;
> char *namep;
> };
>
> struct name *makename(char *newname)
> {
> char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);
>
> struct name *ret = (struct name *)buf;
> ret->namelen = strlen(newname);
> ret->namep = buf + sizeof(struct name);
> strcpy(ret->namep, newname);
>
> return ret;
> }
>
> I don't really believe there are aliasing issues here due to a char *
> being reassigned to a struct name *.


Nor do I.

> But. If you did this instead:
>
> struct name *ret = malloc(sizeof(struct name) + strlen(newname) +
> 1);
> char *buf = (char *)(&ret[1]);
>
> which also seems a perfectly reasonable way of going about it, and
> avoids the sizeof(struct name) addition which can be a little tricky
> in the cases where you have several leading structs. Ok, it's not
> terrible but I always though the above was clearer.
>
> Anyway - you've now taken an object of type struct name and converted
> to a different type pointing to the same memory.
>
> Isn't that an aliasing issue?


I don't see why. Personally, I prefer the latter form (although
I usually write `ret + 1' for `&ret[1]'). Both examples convert a
pointer value from one type to another.

> And if char * is a special case then what if I had used a wchar_t
> instead? Or another type?


Alignment problems could arise. They can be put to bed again,
but the code gets uglier.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
Conor F
Guest
Posts: n/a
 
      09-20-2011
> > I don't really believe there are aliasing issues here due to a char *
> > being reassigned to a struct name *.

>
> That's fine, but for a different reason. You're accessing
> sizeof(struct name) bytes as struct name, and you're accessing the
> following strlen(newname)+1 bytes as char. You never access any data
> as a type it isn't.
>
> Because of that, your suggested alternative has no aliasing issues
> either.


Ah! But I thought that it wouldn't matter where the data was. In other
words, gcc could decide that all that memory over there is now of type
"struct name" and if you pretend part of it isn't, gcc will play games
with you, like optimise all the references to buf out because you
didn't return it

I can't always tell what gcc might mess with. I've seen posts by Linus
Torvalds where advocates using the no-strict-aliasing flag to avoid
any subtle issues like this thread from a few years back:
https://lkml.org/lkml/2003/2/25/270

In that case, reordering the code make a difference. But if properly
assigned pointers point to different blocks of memory, I can't see how
anything would fail. Which is why I asked here

> My preference would be to use neither, and instead use the C99
> alternative
>
> struct name {
> * int namelen;
> * char name[];


Oh absolutely. The case I mentioned was a simple one where the above
notation would suit much better. The Windows header files have those
notations all over the place (except using the pre C99 form of char
name[1].

char *buf = (char *)(ret + 1);

As Eric says, I also prefer the above form, especially when things get
a little hairier, like the classic array of pointers to char followed
by the char data:

char **strarray -> [ptrc0][ptrc1][ptrc2][NULL][string0][string1]
[string2]

dataptr = (char *)(strarray + nstrings + 1)

And then if it's an array of pointers to structures, then we hit
alignment issues. But at least those are easy to deal with (just round
up to the next even multiple of the structure size). And then compile
on a Sparc just to see if you are right!

Conor.
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      09-20-2011
On Sep 20, 12:11*pm, Conor F <(E-Mail Removed)>
wrote:
> > > I don't really believe there are aliasing issues here due to a char *
> > > being reassigned to a struct name *.

>
> > That's fine, but for a different reason. You're accessing
> > sizeof(struct name) bytes as struct name, and you're accessing the
> > following strlen(newname)+1 bytes as char. You never access any data
> > as a type it isn't.

>
> > Because of that, your suggested alternative has no aliasing issues
> > either.

>
> Ah! But I thought that it wouldn't matter where the data was. In other
> words, gcc could decide that all that memory over there is now of type
> "struct name" and if you pretend part of it isn't, gcc will play games
> with you, like optimise all the references to buf out because you
> didn't return it
>
> I can't always tell what gcc might mess with. I've seen posts by Linus
> Torvalds where advocates using the no-strict-aliasing flag to avoid
> any subtle issues like this thread from a few years back:
> * *https://lkml.org/lkml/2003/2/25/270
>
> In that case, reordering the code make a difference. But if properly
> assigned pointers point to different blocks of memory, I can't see how
> anything would fail. Which is why I asked here


Looking further in that thread, the problem comes from a subtle bug/
misfeature in the implementation of the kernel's own memcpy macro/
function. Generally speaking, when you're not writing a kernel, you
can assume memcpy behaves as required by the standard.
 
Reply With Quote
 
Conor F
Guest
Posts: n/a
 
      09-20-2011
So, to summarise :

Strict aliasing would only apply if a type punned pointer pointed to
the same place in memory - which in my opinion is wild west code
anyway...

So, to be awkward and use a wchar_t instead simply to avoid the char *
case:

struct name { int namelen; wchar_t *namep; };

struct name *ret = malloc(sizeof(struct name) +
wcslen(newname) + 1);

wchar_t *buf = (wchar_t *)(ret + 1);

... copy to buf here ...


Would be fine simply because the type punning is to a different memory
location; but:

wchar_t *buf = (wchar_t *)(ret + 0);

Isn't fine. Ok, other than the fact that I made a mess of the example
I mean. Um, a better example would be the one in the wikipedia article
on type punning:

struct sockaddr_in sa = {0};
....
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

which is obviously bad. But if bind took a char * and they did this:

bind(sockfd, (char *)&sa, sizeof sa);

that would be ok I guess.

Plus any other type of inheritance creation like COM - where two
structures share initial sequences (IUnknown and all that) but are not
unioned are also out. But then I'd be aware that I'm messing around in
those circumstances and I'd use -fno-strict-aliasing...

Thanks,

Conor.
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      09-20-2011
On Sep 20, 8:26*pm, Conor F <(E-Mail Removed)>
wrote:
> So, to summarise :
>
> Strict aliasing would only apply if a type punned pointer pointed to
> the same place in memory - which in my opinion is wild west code
> anyway...


Pretty much, yes.

> So, to be awkward and use a wchar_t instead simply to avoid the char *
> case:
>
> * struct name { *int namelen; wchar_t *namep; };
>
> * struct name *ret = malloc(sizeof(struct name) +
> * * * * * * * * * * * * * * * wcslen(newname) + 1);


(wcslen(newname) + 1) * sizeof(wchar_t)

> * wchar_t *buf = (wchar_t *)(ret + 1);
>
> * ... copy to buf here ...
>
> Would be fine simply because the type punning is to a different memory
> location; but:


Right.

> * wchar_t *buf = (wchar_t *)(ret + 0);
>
> Isn't fine. Ok, other than the fact that I made a mess of the example
> I mean.


Yes, that example doesn't work. Accessing the data as

struct name *ret = malloc(sizeof(wchar_t));
wchar_t *buf = (wchar *) (ret + 0);
*buf = L'x';

is no violation of the aliasing rules, because you're still only
accessing the data as wchar_t, even though you have a suspicious cast
now.

> Um, a better example would be the one in the wikipedia article
> on type punning:
>
> * struct sockaddr_in sa = {0};
> * * ....
> * bind(sockfd, (struct sockaddr *)&sa, sizeof sa);
>
> which is obviously bad.


By C's aliasing rules, yes, you're right. Remember, though, that bind
is a non-standard function, and POSIX makes additional guarantees
about what compilers must permit, beyond what standard C does. It may
say that the above use must be given the "obvious" interpretation by a
conforming POSIX compiler. I don't know if it does so.

> But if bind took a char * and they did this:
>
> * bind(sockfd, (char *)&sa, sizeof sa);
>
> that would be ok I guess.


If bind is declared as taking a struct sockaddr *, and if bind
dereferences its parameter to get a struct sockaddr, then C's aliasing
rules don't allow you to pass a pointer to what is really a struct
sockaddr_in, not even via an intermediate char * cast. If bind takes a
char *, and accesses the memory byte by byte, then yes, that is the
special exception in the aliasing rules.

> Plus any other type of inheritance creation like COM - where two
> structures share initial sequences (IUnknown and all that) but are not
> unioned are also out. But then I'd be aware that I'm messing around in
> those circumstances and I'd use -fno-strict-aliasing...


Another case where COM pretty much ignores the aliasing rules is in
IUnknown's QueryInterface method, where its last argument's type is
void **, but will almost never really be a pointer to void *. Which is
okay if MS decides that COM compilers must allow this, even if C's
aliasing rules don't.
 
Reply With Quote
 
Conor F
Guest
Posts: n/a
 
      09-20-2011
> > * struct name *ret = malloc(sizeof(struct name) +
> > * * * * * * * * * * * * * * * wcslen(newname) + 1);

>
> (wcslen(newname) + 1) * sizeof(wchar_t)


Ooops. Erm, sorry. That's what I get for coding in a rush and then
changing my mind. That example was a total mess <grin>.

> > Would be fine simply because the type punning is to a different memory
> > location; but:

>
> Right.


Grand. That clarifies a lot. I used to do QA so I'm a tad pedantic
about these things (except the example I typed in). I just wanted to
be sure on that point.

> Yes, that example doesn't work. Accessing the data as
>
> * struct name *ret = malloc(sizeof(wchar_t));
> * wchar_t *buf = (wchar *) (ret + 0);
> * *buf = L'x';
>
> is no violation of the aliasing rules, because you're still only
> accessing the data as wchar_t, even though you have a suspicious cast
> now.


That's somewhat of a surprise. I guess you might get hosed as soon as
you access "ret", because the compiler would decide to optimise given
the assumption that ret and buf couldn't possibly point to the same
location.

> By C's aliasing rules, yes, you're right. Remember, though, that bind
> is a non-standard function, and POSIX makes additional guarantees
> about what compilers must permit, beyond what standard C does. It may
> say that the above use must be given the "obvious" interpretation by a
> conforming POSIX compiler. I don't know if it does so.


Hmmm. I see - I believe I've seen that before with threading - Posix
makes assurances above what ISO C makes so that calls like
pthread_mutex_lock() don't get messed with. I'd guess the gcc
documentation might shed some light.

> > But if bind took a char * and they did this:

>
> > * bind(sockfd, (char *)&sa, sizeof sa);

>
> > that would be ok I guess.

>
> If bind is declared as taking a struct sockaddr *, and if bind
> dereferences its parameter to get a struct sockaddr, then C's aliasing
> rules don't allow you to pass a pointer to what is really a struct
> sockaddr_in, not even via an intermediate char * cast. If bind takes a
> char *, and accesses the memory byte by byte, then yes, that is the
> special exception in the aliasing rules.


Yes, thank you. I had that feeling when I typed that bit that maybe it
would be bad to recast it back once cast to a char *. Doing some
googling shows that some coders (eg: Putty) have made changes to put
these in unions to avoid the problem.

> Another case where COM pretty much ignores the aliasing rules is in
> IUnknown's QueryInterface method, where its last argument's type is
> void **, but will almost never really be a pointer to void *. Which is
> okay if MS decides that COM compilers must allow this, even if C's
> aliasing rules don't.


Although if I was using a compiler like mingw it would probably be
good to be aware of the possible issues and use the appropriate flags
if necessary. The Windows compilers would do the right thing
automagically of course.

Conor.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Union and strict aliasing Maxim Fomin C Programming 4 08-02-2012 02:06 PM
Strict aliasing and Q2.6 in the FAQ Carveone C Programming 0 09-19-2011 04:11 PM
Strict aliasing and buffer handling Francois Duranleau C++ 20 06-21-2011 11:43 PM
char and strict aliasing Paul Brettschneider C++ 4 07-18-2008 12:22 PM
Strict Pointer Aliasing Question Bryan Parkoff C++ 2 01-15-2004 06:43 PM



Advertisments