Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Confirm reinterpret_cast if is safe?

Reply
Thread Tools

Confirm reinterpret_cast if is safe?

 
 
tni
Guest
Posts: n/a
 
      03-01-2011
Snipped example that does something like this:

#include <iostream>
#include <stdint.h>

int main() {
uint32_t i = 7;
uint16_t& i_ref16 = *reinterpret_cast<uint16_t*>(&i);
i_ref16 = 1;
std::cout << i << " " << i_ref16 << std::endl;
return 0;
}

On 2011-03-01 10:29, James Kanze wrote:
> In practice, however, reinterpret_cast becomes useless if this doesn't
> work.


That's exactly the case.

> The intent of the standard, here, is rather clear: the behavior
> is undefined in the standard (since there's no way it could be defined
> portably), but it is expected that the implementation define it when
> reasonable. But there is a lot of gray areas around this. There is
> also a very definite intent that the compiler can assume no aliasing
> between two pointers to different types, provided that neither type is
> char or unsigned char. From a QoI point of view, on "normal"
> machines,
> I would expect to be able to access, and even modify the bit patterns,
> through a pointer to any integral type, provided alignment
> restrictions
> are respected, and all of the reinterpret_cast are in the same
> function,
> so the compiler can see them, and take the additional aliasing into
> account (or alternatively, a compiler option is used to turn off all
> optimization based on aliasing analysis). As to what happens to the
> object whose bit pattern was actually accessed... that's very
> architecture dependent, but if you know the architecture, and the
> actual
> types are all more or less basic types, you can play games.


The above example example with GCC 4.4 -O2 produces (unless you use
-fno-strict-aliasing):
7 1

So don't do it, even if it seems to work. It's simply nasty bugs waiting
to happen in corner cases/future compiler versions.
 
Reply With Quote
 
 
 
 
Paul
Guest
Posts: n/a
 
      03-01-2011

"Leigh Johnston" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ...
> On 01/03/2011 15:18, James Kanze wrote:
>> As soon as you need reinterpret_cast, portability goes out the
>> window. It's strictly for experts, and only for very machine
>> dependent code, at the very lowest level.
>>

<snip>
>Also, I am not sure that agree with you that a particular language feature
>should only be used by "experts".
>

No big surprise there you are always looking for flaws in James' posts.
I think Jamseyboy meant specialists specialising in low level C++ coding,
also suggesting this was advanced C++ programming.
You must remember English is probably not James' first language so he does
quite well .Pity he's unclear about some C++ basics though.

 
Reply With Quote
 
 
 
 
Paul
Guest
Posts: n/a
 
      03-01-2011

"Leigh Johnston" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ...
> On 01/03/2011 19:34, Paul wrote:
>>
>> "Leigh Johnston" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed) ...
>>> On 01/03/2011 15:18, James Kanze wrote:
>>>> As soon as you need reinterpret_cast, portability goes out the
>>>> window. It's strictly for experts, and only for very machine
>>>> dependent code, at the very lowest level.
>>>>

>> <snip>
>>> Also, I am not sure that agree with you that a particular language
>>> feature should only be used by "experts".
>>>

>> No big surprise there you are always looking for flaws in James' posts.
>> I think Jamseyboy meant specialists specialising in low level C++
>> coding, also suggesting this was advanced C++ programming.
>> You must remember English is probably not James' first language so he
>> does quite well .Pity he's unclear about some C++ basics though.
>>

>
> The part of my post that you snipped clearly indicated that
> reinterpret_cast is not solely the domain of low level C++ coding.
> Pointer/integer type conversion is a very common idiom in Microsoft
> Windows C/C++ development mainly stemming from the fact that a Windows
> message passes information via the "WPARAM" and "LPARAM" integer
> arguments.
>
> /Leigh

Who cares?
You're an idiot.

 
Reply With Quote
 
Paul
Guest
Posts: n/a
 
      03-01-2011

"Leigh Johnston" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ...
> On 01/03/2011 19:43, Paul wrote:
>>
>> "Leigh Johnston" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed) ...
>>> On 01/03/2011 19:34, Paul wrote:
>>>>
>>>> "Leigh Johnston" <(E-Mail Removed)> wrote in message
>>>> news:(E-Mail Removed) ...
>>>>> On 01/03/2011 15:18, James Kanze wrote:
>>>>>> As soon as you need reinterpret_cast, portability goes out the
>>>>>> window. It's strictly for experts, and only for very machine
>>>>>> dependent code, at the very lowest level.
>>>>>>
>>>> <snip>
>>>>> Also, I am not sure that agree with you that a particular language
>>>>> feature should only be used by "experts".
>>>>>
>>>> No big surprise there you are always looking for flaws in James' posts.
>>>> I think Jamseyboy meant specialists specialising in low level C++
>>>> coding, also suggesting this was advanced C++ programming.
>>>> You must remember English is probably not James' first language so he
>>>> does quite well .Pity he's unclear about some C++ basics though.
>>>>
>>>
>>> The part of my post that you snipped clearly indicated that
>>> reinterpret_cast is not solely the domain of low level C++ coding.
>>> Pointer/integer type conversion is a very common idiom in Microsoft
>>> Windows C/C++ development mainly stemming from the fact that a Windows
>>> message passes information via the "WPARAM" and "LPARAM" integer
>>> arguments.
>>>
>>> /Leigh

>> Who cares?
>> You're an idiot.

>
> Again rather than accepting your mistake you go into denial and throw
> insults about; can you not see how pathetic this behaviour is?
>
> /Leigh
>

I can see that you are an idiot.

 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      03-01-2011
On Mar 1, 7:18*am, James Kanze <(E-Mail Removed)> wrote:
> On Mar 1, 10:25 am, Joshua Maurice <(E-Mail Removed)> wrote:
> > Or, of course, you
> > could go ask them for us as you actually know them in person (maybe?),
> > and perhaps you could get them to answer the few other pesky issues I
> > have about how a general purpose portable conforming pooling memory
> > allocator on top of malloc is supposed to work, or not work. It would
> > be nice.

>
> If you ask on comp.std.c++, you'll get some feedback.


I have had a thread up now for months, many posts by myself that
include musing and questions, and not a single reply. It was in a
thread talking about a DR in the current draft that would disallow
general purpose portable conforming pooling memory allocators on top
of malloc, maybe.

> Or
> perhaps comp.std.c---I haven't looked there in ages,


I tried that as well. Got a large thread going, ~137 replies atm.
Unfortunately, nothing was really resolved. Instead we started talking
about the effective type rules and whether
*a.x
has distinct behavior from
a->x
which seemed quite silly given my C++ background where they are
defined to be equivalent. The argument was that "*a" counts as an
access of the struct of type a no matter its context, and "a->x" does
not count as an access of the (whole?) struct. I think near the end of
the discussion the silly side backed down and said that they don't
know, and are waiting for the C committee to clear this up.

> but IMHO,
> this is a problem that C should resolve, and C++ should simply
> accept the decision. *There's absolutely no reason for the
> languages to have different rules here.


Agreed. First C needs to figure out its own rules though. There does
not appear to be a clear consensus on if and why the following program
has undefined behavior in C and/or C++.

#include <stddef.h>
#include <stdlib.h>
int main()
{
typedef struct T1 { int x; int y; } T1;
typedef struct T2 { int x; int y; } T2;

void* p = 0;
T1* a = 0;
T2* b = 0;
int* y = 0;

if (offsetof(T1, y) != offsetof(T2, y))
return 1;
if (sizeof(T1) != sizeof(T2))
return 2;

p = malloc(sizeof(T1));
a = (T1*) p;
b = (T2*) p;
y = & a->y;
*y = 1;
y = & b->y;
return *y;
}

The interesting part is:
y = & b->y;
return *y;
My naive understanding is that
y = & b->y;
is not meant to be UB, nor is UB as Rules As Written. Moreover, for
the read
return *y;
we know that y has the same bit-value as y did when we made a write
through it. That is, under a naive understanding, it is the same
pointer value, and it points to the same memory location. In the C
parlance, it is reading a (sub-)object whose effective type is int
through an int lvalue. No UB there.

However, when you combine these two things, the intent and consensus
seems to be UB, but this is not Rules As Written anywhere where I can
find. That's the first interesting problem.

In a related DR, the C standards committee is talking about the
"Providence" of the pointer - that is a sort of data dependency
analysis of its origins. That might be a way to resolve that
particular mess.

The second interesting problem is the following program:

/* Program 2, version 1 */
#include <stdlib.h>
#include <stdio.h>
int main()
{
void* p = 0;
float* f = 0;
int* i = 0;

p = malloc(sizeof(float) + sizeof(int));
f = (float*) p;
*f = 1;
printf("%f\n", *f);
i = (int*) p;
*i = 1;
printf("%d\n", *i);
}

Which naively can be rewritten as:

#include <stdlib.h>
#include <stdio.h>
int main()
{
void* p = 0;
float* f = 0;
int* i = 0;

p = malloc(sizeof(float) + sizeof(int));
f = (float*) p;
i = (int*) p;
*f = 1;
printf("%f\n", *f);
*i = 1;
printf("%d\n", *i);
}

Which naively can be rewritten as:

#include <stdlib.h>
#include <stdio.h>
void foo(float* f, int* i)
{
*f = 1;
printf("%f\n", *f);
*i = 1;
printf("%d\n", *i);
}
int main()
{
void* p = 0;
float* f = 0;
int* i = 0;

p = malloc(sizeof(float) + sizeof(int));
f = (float*) p;
i = (int*) p;
foo(f, i);
}

Which naively can be optimized as the following, which of course
breaks things:

#include <stdlib.h>
#include <stdio.h>
void foo(float* f, int* i)
{
*f = 1;
*i = 1;
printf("%f\n", *f);
printf("%d\n", *i);
}
int main()
{
void* p = 0;
float* f = 0;
int* i = 0;

p = malloc(sizeof(float) + sizeof(int));
f = (float*) p;
i = (int*) p;
foo(f, i);
}

"Program 2, version 1" needs to have defined behavior if you want
userspace portable conforming general purpose pooling memory
allocators written on top of malloc or new. I argue that we definitely
need to allow that. The last program needs to have UB. The question of
course is where exactly is it broken. The C standards committee,
judging from its DR notes and meeting minutes, wants to say it breaks
when we introduce the function foo. That is of course sensible, but
not Rules As Written, and definitely not anything official yet.
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      03-02-2011
On Mar 1, 3:51 pm, Leigh Johnston <(E-Mail Removed)> wrote:
> On 01/03/2011 15:18, James Kanze wrote:


> > As soon as you need reinterpret_cast, portability goes out the
> > window. It's strictly for experts, and only for very machine
> > dependent code, at the very lowest level.


> Quite often I have to use reinterpret_cast in GUI code (Microsoft) for
> converting to/from GUI control item "LPARAM" values which I use for
> associating a GUI control item with an object; I wouldn't call this
> "very lowest level". Also, I am not sure that agree with you that a
> particular language feature should only be used by "experts".


I'll admit that it can be necessary to work around a poorly
designed interface. The obvious answer is to fix the interface,
but we don't always have that liberty.

And the term "expert" is not meant to be precise, but just to
suggest that the decision to use it should not be made lightly.
I can easily imagine a case where the decision to use it
regularly in the interface to some external interface was made
by one of the project's "experts", but the actual use (following
the guidelines laid down by the expert) was by very run of the
mill programmers. I'd generally recommend having the experts
write a wrapper to the poorly designed interface, and letting
the others use that. But if the poorly designed interface is
a standard (Posix dlsym, for example, which can't be used
without a reinterpret_cast) or a pseudo-standard (Windows GUI?),
there's a strong argument for letting the programmers use the
interface they already know, rather than having to learn a new
one.

--
James Kanze
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      03-02-2011
On Mar 1, 10:47 pm, Joshua Maurice <(E-Mail Removed)> wrote:
> On Mar 1, 7:18 am, James Kanze <(E-Mail Removed)> wrote:


> > On Mar 1, 10:25 am, Joshua Maurice <(E-Mail Removed)> wrote:
> > > Or, of course, you could go ask them for us as you
> > > actually know them in person (maybe?), and perhaps you
> > > could get them to answer the few other pesky issues I have
> > > about how a general purpose portable conforming pooling
> > > memory allocator on top of malloc is supposed to work, or
> > > not work. It would be nice.


> > If you ask on comp.std.c++, you'll get some feedback.


> I have had a thread up now for months, many posts by myself
> that include musing and questions, and not a single reply. It
> was in a thread talking about a DR in the current draft that
> would disallow general purpose portable conforming pooling
> memory allocators on top of malloc, maybe.


Hmmm. In the past, you'd typically have gotten an answer. (Or
several contradictory answers.)

> > Or
> > perhaps comp.std.c---I haven't looked there in ages,


> I tried that as well. Got a large thread going, ~137 replies atm.
> Unfortunately, nothing was really resolved.


I think part of the problem is that standard (C99) doesn't
actually say what was intended. And that different people don't
even agree with what was intended (nor with what it actually
says, for that matter, although in most cases, I find what it
actually says rather clear, albeit almost certainly not what was
intended).

> Instead we started talking
> about the effective type rules and whether
> *a.x
> has distinct behavior from
> a->x
> which seemed quite silly given my C++ background where they are
> defined to be equivalent.


They're defined to be equivalent in C as well. Although there's
special language in C (and maybe in C++0x) which says that the
equivalence has some limits: a[i] is the same as *(a+i), except
when it is preceded by a & operator (in C99), for example.

> The argument was that "*a" counts as an
> access of the struct of type a no matter its context, and "a->x" does
> not count as an access of the (whole?) struct. I think near the end of
> the discussion the silly side backed down and said that they don't
> know, and are waiting for the C committee to clear this up.


> > but IMHO,
> > this is a problem that C should resolve, and C++ should simply
> > accept the decision. There's absolutely no reason for the
> > languages to have different rules here.


> Agreed. First C needs to figure out its own rules though. There does
> not appear to be a clear consensus on if and why the following program
> has undefined behavior in C and/or C++.


> #include <stddef.h>
> #include <stdlib.h>
> int main()
> {
> typedef struct T1 { int x; int y; } T1;
> typedef struct T2 { int x; int y; } T2;


> void* p = 0;
> T1* a = 0;
> T2* b = 0;
> int* y = 0;


> if (offsetof(T1, y) != offsetof(T2, y))
> return 1;
> if (sizeof(T1) != sizeof(T2))
> return 2;


> p = malloc(sizeof(T1));
> a = (T1*) p;
> b = (T2*) p;
> y = & a->y;
> *y = 1;
> y = & b->y;
> return *y;
> }


There is one place here where C and C++ are different. In
C (IIRC), if you leave out the first T1/T2 in the typedef's, T1
and T2 would be "compatible types". C++ doesn't have the notion
of "compatbile type", and would continue to treat them as two
distinct types.

> The interesting part is:
> y = & b->y;
> return *y;
> My naive understanding is that
> y = & b->y;
> is not meant to be UB, nor is UB as Rules As Written. Moreover, for
> the read
> return *y;
> we know that y has the same bit-value as y did when we made a write
> through it. That is, under a naive understanding, it is the same
> pointer value, and it points to the same memory location. In the C
> parlance, it is reading a (sub-)object whose effective type is int
> through an int lvalue. No UB there.


I'm not sure. The real question is what is the type of the
object in the malloc'ed memory. In the above, we're dealing
with a corner case where I don't think the committee really
cares if it is defined or not---they'll take whatever falls out
of the definitions made to handle the more general cases.

Of course, in practice, something like the above will always
work. Even if T1 and T2 were full C++ classes, with
constructors, virtual functions, and the works. As long as T1
and T2 had the same layout, and they will have the same layout
as long as their data members are the same, and they are
"similar enough" on other grounds (e.g. either both have virtual
functions, or neither; both have the same base classes, if any,
etc.). In general, C++ wants this to be undefined behavior, at
least when non-PODs are involved; the committee almost certainly
doesn't want to get into the issues defining how similar is
"similar enough". And as I said, they don't really care whether
the case in the example is UB, so they don't make a special case
of it.

> However, when you combine these two things, the intent and consensus
> seems to be UB, but this is not Rules As Written anywhere where I can
> find. That's the first interesting problem.


I think that the intent is clearly UB, at least in C++. With
regards to the rules, the real question is when malloc'ed memory
becomes an object with a specific type. Once it has become an
object with a specific type, accessing it as another object with
a specific type is clearly undefined behavior. And although *y
has type int, the "complete object" being accessed does depend
on how the pointer was initialized. And whether the expression
*y = 1; has "created" an object of type T1 (since y was
initialized with a pointer into a T1) or not. And while the
standard definitly does forbid accessing an object of one type
through an lvalue expression of another type, it doesn't say
anything about when memory acquired by calling malloc or the
operator new function becomes an object of a specific type
(unless the object is a non-POD class, in which case, it's when
the constructor runs).

There is also an interesting variant on the above example.
Suppose, instead of y, I have:
int* y1 = &a->y;
int* y2 = &b->y;
assert(y1 == y2);
If y1 == y2, then they point to the same object. So *y1 = 1;
return *y2 is definitely legal. Unless, of course, something
previous triggered undefined behavior. (But at this point,
we've not yet accessed the actual memory returned by malloc, so
its type is indeterminate. Or?)

Until we know when (if ever) the memory returned by malloc
becomes a T1 or a T2, we can't really answer these questions.
And both the C and the C++ standards are completely silent about
this.

> In a related DR, the C standards committee is talking about the
> "Providence" of the pointer - that is a sort of data dependency
> analysis of its origins. That might be a way to resolve that
> particular mess.


> The second interesting problem is the following program:


> /* Program 2, version 1 */
> #include <stdlib.h>
> #include <stdio.h>
> int main()
> {
> void* p = 0;
> float* f = 0;
> int* i = 0;


> p = malloc(sizeof(float) + sizeof(int));
> f = (float*) p;
> *f = 1;
> printf("%f\n", *f);
> i = (int*) p;
> *i = 1;
> printf("%d\n", *i);
> }


> Which naively can be rewritten as:


> #include <stdlib.h>
> #include <stdio.h>
> int main()
> {
> void* p = 0;
> float* f = 0;
> int* i = 0;


> p = malloc(sizeof(float) + sizeof(int));
> f = (float*) p;
> i = (int*) p;
> *f = 1;
> printf("%f\n", *f);
> *i = 1;
> printf("%d\n", *i);
> }


> Which naively can be rewritten as:


> #include <stdlib.h>
> #include <stdio.h>
> void foo(float* f, int* i)
> {
> *f = 1;
> printf("%f\n", *f);
> *i = 1;
> printf("%d\n", *i);
> }
> int main()
> {
> void* p = 0;
> float* f = 0;
> int* i = 0;


> p = malloc(sizeof(float) + sizeof(int));
> f = (float*) p;
> i = (int*) p;
> foo(f, i);
> }


> Which naively can be optimized as the following, which of course
> breaks things:


> #include <stdlib.h>
> #include <stdio.h>
> void foo(float* f, int* i)
> {
> *f = 1;
> *i = 1;
> printf("%f\n", *f);
> printf("%d\n", *i);
> }
> int main()
> {
> void* p = 0;
> float* f = 0;
> int* i = 0;


> p = malloc(sizeof(float) + sizeof(int));
> f = (float*) p;
> i = (int*) p;
> foo(f, i);
> }


> "Program 2, version 1" needs to have defined behavior if you want
> userspace portable conforming general purpose pooling memory
> allocators written on top of malloc or new. I argue that we definitely
> need to allow that. The last program needs to have UB.


I agree whole heartedly. I think that that is the intent, or at
least it should be.

> The question of
> course is where exactly is it broken. The C standards committee,
> judging from its DR notes and meeting minutes, wants to say it breaks
> when we introduce the function foo. That is of course sensible, but
> not Rules As Written, and definitely not anything official yet.


Yes. There are at least two problems: one (at least for C++) is
when the raw memory allocated with malloc or the operator new
function starts being an object with a definite type (or in
other words, when does it start being illegal to access it as
a different type); the other is defining the exact aliasing
rules, so that they do depend on the compiler being able to see
aliases: as the standard is currently written, I think your
Program 2, version 1 has undefined behavior (although the intent
is clearly that it should work), and the following program:

int foo(int* p1, float* p2)
{
int result = *p1;
*p2 = 3.14159;
return result;
}

int main()
{
union { int i; float f; } u;
u.i = 42;
printf("%d\n", foo(&u.i, &u.f));
return 0;
}

is clearly legal, although it doesn't work with g++, I don't
think it was the intent that it be well defined, and there are
very good reasons for making it undefined. But I don't know
quite how to formulate this in standardese: intuitively, I'd say
that anytime there was a reinterpret_cast or a union in
a function, the compiler must assume that pointers and
references to any of the involved types may be aliases, and in
all other cases, it may assume that pointers and references to
different types are not aliases, unless one of the types is char
or unsigned char. But that isn't conform with the way the
standard is written.

--
James Kanze
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reinterpret_cast<std::size_t>(p) and reinterpret_cast<std::size_t&>() Alex Vinokur C++ 1 02-06-2011 07:48 AM
const_cast, reinterpret_cast johny smith C++ 18 06-24-2004 09:53 PM
reinterpret_cast<> Aman C++ 15 02-25-2004 03:03 PM
reinterpret_cast<>() v. static_cast<>() Scott Brady Drummonds C++ 11 01-20-2004 09:12 PM
reinterpret_cast - to interpret double as long Suzanne Vogel C++ 17 07-07-2003 02:50 PM



Advertisments