Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Confirm reinterpret_cast if is safe? (http://www.velocityreviews.com/forums/t744341-confirm-reinterpret_cast-if-is-safe.html)

Nephi Immortal 02-28-2011 11:39 PM

Confirm reinterpret_cast if is safe?
 
I use reinterpret_cast to convert from 32 bits integer into 8 bits
integer. I use reference instead of pointer to modify value. Please
confirm if reinterpret_cast is safe on either Intel machine or AMD
machine.
If another machine has 9 bits instead of 8 bits, then I use “if
condition” macro to use bit shift and bit mask instead.


typedef unsigned __int8 size_8;
typedef unsigned __int16 size_16;
typedef unsigned __int32 size_32;

int main () {
size_32 dword = 0x123456U;

size_8 &L = *reinterpret_cast< size_8* >( &dword );
size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
size_16 &W = *reinterpret_cast< size_16* >( &dword );

++L;
++H;
++B;

L += 2;

return 0;
}

Joshua Maurice 03-01-2011 12:33 AM

Re: Confirm reinterpret_cast if is safe?
 
On Feb 28, 3:39*pm, Nephi Immortal <immortalne...@gmail.com> wrote:
> * * * * I use reinterpret_cast to convert from 32 bits integer into 8 bits
> integer. *I use reference instead of pointer to modify value. *Please
> confirm if reinterpret_cast is safe on either Intel machine or AMD
> machine.
> * * * * If another machine has 9 bits instead of 8 bits, then I use “if
> condition” macro to use bit shift and bit mask instead.
>
> typedef unsigned __int8 size_8;
> typedef unsigned __int16 size_16;
> typedef unsigned __int32 size_32;
>
> int main () {
> * * * * size_32 dword = 0x123456U;
>
> * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );
>
> * * * * ++L;
> * * * * ++H;
> * * * * ++B;
>
> * * * * L += 2;
>
> * * * * return 0;
>
> }


This is broken by C++ standard. You are reading an __in32 object
through a __int8 lvalue, and that is undefined behavior. At least, it
is UB if __int8 is not a typedef of char nor unsigned char.

I don't know what various implements will actually do with that code.

To fix it, at least the following is allowed by the standard: you can
use "char" or "unsigned char" lvalues to read any POD object.

Nephi Immortal 03-01-2011 02:54 AM

Re: Confirm reinterpret_cast if is safe?
 
On Feb 28, 6:33*pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Feb 28, 3:39*pm, Nephi Immortal <immortalne...@gmail.com> wrote:
>
>
>
>
>
> > * * * * I use reinterpret_cast to convert from 32 bits integer into 8 bits
> > integer. *I use reference instead of pointer to modify value. *Please
> > confirm if reinterpret_cast is safe on either Intel machine or AMD
> > machine.
> > * * * * If another machine has 9 bits instead of 8 bits, then Iuse “if
> > condition” macro to use bit shift and bit mask instead.

>
> > typedef unsigned __int8 size_8;
> > typedef unsigned __int16 size_16;
> > typedef unsigned __int32 size_32;

>
> > int main () {
> > * * * * size_32 dword = 0x123456U;

>
> > * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> > * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> > * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> > * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );

>
> > * * * * ++L;
> > * * * * ++H;
> > * * * * ++B;

>
> > * * * * L += 2;

>
> > * * * * return 0;

>
> > }

>
> This is broken by C++ standard. You are reading an __in32 object
> through a __int8 lvalue, and that is undefined behavior. At least, it
> is UB if __int8 is not a typedef of char nor unsigned char.
>
> I don't know what various implements will actually do with that code.
>
> To fix it, at least the following is allowed by the standard: you can
> use "char" or "unsigned char" lvalues to read any POD object.- Hide quoted text -
>
> - Show quoted text -


I think you meant unrecognized keyword. Another C++ Compiler than
Microsoft C++ Compiler or Intel C++ Compiler will generate an error
message to state undeclared __int8, __int16, and __int32.
I assume that you suggest:

typedef unsigned char size_8;
typedef unsigned short size_16;
typedef unsigned long size_32;

instead of

typedef unsigned __int8 size_8;
typedef unsigned __int16 size_16;
typedef unsigned __int32 size_32;

Can you guarantee to be sure if my source code works without any
undefined behavior on IA-32, IA-64, and x64 machine? Other machines
require different definitions.

For example

#if defined( __INTEL__ ) || defined( __AMD__ )
size_32 dword = 0x123456U;

size_8 &L = *reinterpret_cast< size_8* >( &dword );
size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
size_16 &W = *reinterpret_cast< size_16* >( &dword );
#else
size_32 dword = 0x123456U;

size_8 L = dword & 0xFFU;
size_8 H = ( dword >> 8 ) & 0xFFU;
size_8 B = ( dword >> 16 ) & 0xFFU;
size_16 W = dword & 0xFFFFU;
#end if

Joshua Maurice 03-01-2011 06:14 AM

Re: Confirm reinterpret_cast if is safe?
 
On Feb 28, 6:54*pm, Nephi Immortal <immortalne...@gmail.com> wrote:
> On Feb 28, 6:33*pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:
>
>
>
> > On Feb 28, 3:39*pm, Nephi Immortal <immortalne...@gmail.com> wrote:

>
> > > * * * * I use reinterpret_cast to convert from 32 bits integer into 8 bits
> > > integer. *I use reference instead of pointer to modify value. *Please
> > > confirm if reinterpret_cast is safe on either Intel machine or AMD
> > > machine.
> > > * * * * If another machine has 9 bits instead of 8 bits, thenI use “if
> > > condition” macro to use bit shift and bit mask instead.

>
> > > typedef unsigned __int8 size_8;
> > > typedef unsigned __int16 size_16;
> > > typedef unsigned __int32 size_32;

>
> > > int main () {
> > > * * * * size_32 dword = 0x123456U;

>
> > > * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> > > * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> > > * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> > > * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );

>
> > > * * * * ++L;
> > > * * * * ++H;
> > > * * * * ++B;

>
> > > * * * * L += 2;

>
> > > * * * * return 0;

>
> > > }

>
> > This is broken by C++ standard. You are reading an __in32 object
> > through a __int8 lvalue, and that is undefined behavior. At least, it
> > is UB if __int8 is not a typedef of char nor unsigned char.

>
> > I don't know what various implements will actually do with that code.

>
> > To fix it, at least the following is allowed by the standard: you can
> > use "char" or "unsigned char" lvalues to read any POD object.- Hide quoted text -

>
> > - Show quoted text -

>
> * * * * I think you meant unrecognized keyword. *Another C++ Compiler than
> Microsoft C++ Compiler or Intel C++ Compiler will generate an error
> message to state undeclared __int8, __int16, and __int32.
> * * * * I assume that you suggest:
>
> typedef unsigned char size_8;
> typedef unsigned short size_16;
> typedef unsigned long size_32;
>
> instead of
>
> typedef unsigned __int8 size_8;
> typedef unsigned __int16 size_16;
> typedef unsigned __int32 size_32;
>
> * * * * Can you guarantee to be sure if my source code works without any
> undefined behavior on IA-32, IA-64, and x64 machine? *Other machines
> require different definitions.
>
> For example
>
> #if defined( __INTEL__ ) || defined( __AMD__ )
> * * * * size_32 dword = 0x123456U;
>
> * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );
> #else
> * * * * size_32 dword = 0x123456U;
>
> * * * * size_8 L = dword & 0xFFU;
> * * * * size_8 H = ( dword >> 8 ) & 0xFFU;
> * * * * size_8 B = ( dword >> 16 ) & 0xFFU;
> * * * * size_16 W = dword & 0xFFFFU;
> #end if


Why wouldn't you just use the second form? That would be a much
preferred way. Check the assembly yourself - but I would expect/hope
that it should be compiled down to the same thing with optimization.

m0shbear 03-01-2011 07:04 AM

Re: Confirm reinterpret_cast if is safe?
 
On Mar 1, 1:14*am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Feb 28, 6:54*pm, Nephi Immortal <immortalne...@gmail.com> wrote:
>
>
>
> > On Feb 28, 6:33*pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:

>
> > > On Feb 28, 3:39*pm, Nephi Immortal <immortalne...@gmail.com> wrote:


> > > This is broken by C++ standard. You are reading an __in32 object
> > > through a __int8 lvalue, and that is undefined behavior. At least, it
> > > is UB if __int8 is not a typedef of char nor unsigned char.

>
> > > I don't know what various implements will actually do with that code.

>
> > > To fix it, at least the following is allowed by the standard: you can
> > > use "char" or "unsigned char" lvalues to read any POD object.- Hide quoted text -

>
> > > - Show quoted text -

>
> > * * * * I think you meant unrecognized keyword. *Another C++ Compiler than
> > Microsoft C++ Compiler or Intel C++ Compiler will generate an error
> > message to state undeclared __int8, __int16, and __int32.
> > * * * * I assume that you suggest:

>
> > typedef unsigned char size_8;
> > typedef unsigned short size_16;
> > typedef unsigned long size_32;

>
> > instead of

>
> > typedef unsigned __int8 size_8;
> > typedef unsigned __int16 size_16;
> > typedef unsigned __int32 size_32;

>
> > * * * * Can you guarantee to be sure if my source code works without any
> > undefined behavior on IA-32, IA-64, and x64 machine? *Other machines
> > require different definitions.

>
> > For example

>
> > #if defined( __INTEL__ ) || defined( __AMD__ )
> > * * * * size_32 dword = 0x123456U;

>
> > * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> > * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> > * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> > * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );
> > #else
> > * * * * size_32 dword = 0x123456U;

>
> > * * * * size_8 L = dword & 0xFFU;
> > * * * * size_8 H = ( dword >> 8 ) & 0xFFU;
> > * * * * size_8 B = ( dword >> 16 ) & 0xFFU;
> > * * * * size_16 W = dword & 0xFFFFU;

Possible typo: L, H, B, W are not lvalue references.
> > #end if

#endif. If your compiler doesn't spit out an error, something is very
wrong.
>
> Why wouldn't you just use the second form? That would be a much
> preferred way. Check the assembly yourself - but I would expect/hope
> that it should be compiled down to the same thing with optimization.


It's only the same on little-endian machines.
LE: |56|34|12|00|
(1) |L |H |B |
| W |
(2) |L |H |B |
| W |

On big-endian:
BE: |00|12|34|56|
(1) |L |H |B |
| W |
(2) | |B |H |L |
| W |

Hence, (1) is, strictly speaking, undefined. Should you cast it to
char* and then do the pointer arithmetic, you should get better
results. I've only had to use reinterpret_cast<uintN_t*, for
N=16,...> when calling htobeN* with a pointer to _char_.
Hence, (2) is implementation-defined, specifically with respect to
byte ordering.

I suggest unions with packed structs if you want to be more explicit:
union u32_u8 {
uint32_t DW;
struct {
union {
struct {
uint8_t L;
uint8_t H;
};
uint16_t W;
};
uint8_t B;
};
};

Then
u32_u8& example = *reinterpret_cast<u32_u8*>(dword);
Or, better yet,
u32_u8 example2; example2.DW = dword;
// proceed as usual, using members of example2

Also,
extern "C" {
#include <stdint.h>
}
, if supported.
For microsoft, look up ms-inttypes and save the headers to your system
include directory.
Then you can use (u)intN_t instead of __intN.

C++0x should have <cstdint>

* BSD. See <endian.h>.

Remember, assumptions + reinterpret_cast = UB.
I've only had to use reinterpret_cast when serializing/deserializing
multibyte integers, for e.g. disk and cross-thread/process exception
passing via pipes, and when doing casts _which violate conversion
rules_, e.g. from void * to function pointer (this was an experiment
in using a std::map<std::string, const void*> to implement runtime-
based named parameters),

Goran 03-01-2011 07:49 AM

Re: Confirm reinterpret_cast if is safe?
 
On Mar 1, 12:39*am, Nephi Immortal <immortalne...@gmail.com> wrote:
> * * * * I use reinterpret_cast to convert from 32 bits integer into 8 bits
> integer. *I use reference instead of pointer to modify value. *Please
> confirm if reinterpret_cast is safe on either Intel machine or AMD
> machine.
> * * * * If another machine has 9 bits instead of 8 bits, then I use “if
> condition” macro to use bit shift and bit mask instead.
>
> typedef unsigned __int8 size_8;
> typedef unsigned __int16 size_16;
> typedef unsigned __int32 size_32;
>
> int main () {
> * * * * size_32 dword = 0x123456U;
>
> * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );
>
> * * * * ++L;
> * * * * ++H;
> * * * * ++B;
>
> * * * * L += 2;
>
> * * * * return 0;
>
> }
>
>


Looking at your variable names ("L", "H"), you seem to presume little-
endian machine, which is the case for x86 and x64 (I don't think that
intel/AMD distinction matters). That means that your code isn't doing
what you think it does on a big-endian machine.

On the other hand, I don't think that 9-bit datums are a practical
consideration.

I agree with Joshua, from what you've shown, masks and shifts seems to
be a better approach (endiannes is handled for you). If, on the other
hand, you actually know the binary layout inside your "dword", I would
say that the best approach is to write a compiler-specific POD union
and use that, e.g.

compiler-specific-pack-to-1-directive-here
struct as_chars { char c[4]; };
union data
{
as_chars chars;
size_32 dword;
};
end_compiler-specific-pack-to-1-directive

Goran.

SG 03-01-2011 08:10 AM

Re: Confirm reinterpret_cast if is safe?
 
On 1 Mrz., 00:39, Nephi Immortal wrote:
> * * * * I use reinterpret_cast to convert from 32 bits integer into 8 bits
> integer. *I use reference instead of pointer to modify value. *Please
> confirm if reinterpret_cast is safe on either Intel machine or AMD
> machine.
> * * * * If another machine has 9 bits instead of 8 bits, then I use “if
> condition” macro to use bit shift and bit mask instead.


I guess that means you *do* want the program to compile on possibly
obscure hardware/compilers.

> typedef unsigned __int8 size_8;
> typedef unsigned __int16 size_16;
> typedef unsigned __int32 size_32;


But then why would you use non-standard types like __int16 (etc)?
What is your goal exactly?

> int main () {
> * * * * size_32 dword = 0x123456U;
>
> * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );
>
> * * * * ++L;
> * * * * ++H;
> * * * * ++B;
>
> * * * * L += 2;
>
> * * * * return 0;
> }


Regardless of what meaning you attatch to "safety" this is rather
unsafe. Apart from the non-stanrad types __int8, __int16 etc and a
possibly differing "endianness" you have to account for if you are
interested in portability, you finally violate §3.10/15 which results
in undefined behaviour.

Here's a thought: Try to write it as portable as you can (using the
bit operations on unsigned ints). It's totally acceptible IMHO to
assume CHAR_BIT==8 nowadays. So,...

#include <climits>
#if CHAR_BIT != 8
#error "sorry, this is all too weird for me"
#endif

Then you can use constant expressions for shifts (8,16,24) and masks
(0xFF, 0xFFFF) which gives a good compiler enough opportunities to
optimize the code.

reinterpret_casts have their use. But using them is hardly portable.
Also violating §3.10/15 ("strict aliasing") comes with risks.

Cheers!
SG

James Kanze 03-01-2011 09:29 AM

Re: Confirm reinterpret_cast if is safe?
 
On Mar 1, 12:33 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Feb 28, 3:39 pm, Nephi Immortal <immortalne...@gmail.com> wrote:


> > I use reinterpret_cast to convert from 32 bits integer into 8 bits
> > integer. I use reference instead of pointer to modify value.
> > Please confirm if reinterpret_cast is safe on either Intel machine
> > or AMD machine.


> > If another machine has 9 bits instead of 8 bits, then I use “if
> > condition” macro to use bit shift and bit mask instead.


> > typedef unsigned __int8 size_8;
> > typedef unsigned __int16 size_16;
> > typedef unsigned __int32 size_32;


Not sure what __int8, etc. are, although I can guess. Why not use the
standard uint8_t, etc.?

> > int main () {
> > size_32 dword = 0x123456U;


> > size_8 &L = *reinterpret_cast< size_8* >( &dword );
> > size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> > size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> > size_16 &W = *reinterpret_cast< size_16* >( &dword );


> > ++L;
> > ++H;
> > ++B;


> > L += 2;


> > return 0;
> > }


> This is broken by C++ standard. You are reading an __in32 object
> through a __int8 lvalue, and that is undefined behavior.


> At least, it
> is UB if __int8 is not a typedef of char nor unsigned char.


> I don't know what various implements will actually do with that code.


> To fix it, at least the following is allowed by the standard: you can
> use "char" or "unsigned char" lvalues to read any POD object.


In practice, however, reinterpret_cast becomes useless if this doesn't
work. The intent of the standard, here, is rather clear: the behavior
is undefined in the standard (since there's no way it could be defined
portably), but it is expected that the implementation define it when
reasonable. But there is a lot of gray areas around this. There is
also a very definite intent that the compiler can assume no aliasing
between two pointers to different types, provided that neither type is
char or unsigned char. From a QoI point of view, on "normal"
machines,
I would expect to be able to access, and even modify the bit patterns,
through a pointer to any integral type, provided alignment
restrictions
are respected, and all of the reinterpret_cast are in the same
function,
so the compiler can see them, and take the additional aliasing into
account (or alternatively, a compiler option is used to turn off all
optimization based on aliasing analysis). As to what happens to the
object whose bit pattern was actually accessed... that's very
architecture dependent, but if you know the architecture, and the
actual
types are all more or less basic types, you can play games. When
implementing the C library, I accessed a double through an unsigned
short* several types, including modifying it (e.g. in ldexp). It's
certainly not portable, and it's not the sort of thing to be used
everywhere, but there are a few specific cases in very low level code
where it is necessary. (FWIW: his code will give different
results---supposing he output dword---on an Intel/AMD and on most
other
platforms.)

--
James Kanze

Joshua Maurice 03-01-2011 10:25 AM

Re: Confirm reinterpret_cast if is safe?
 
On Mar 1, 1:29*am, James Kanze <james.ka...@gmail.com> wrote:
> On Mar 1, 12:33 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
>
> > On Feb 28, 3:39 pm, Nephi Immortal <immortalne...@gmail.com> wrote:
> > > I use reinterpret_cast to convert from 32 bits integer into 8 bits
> > > integer. *I use reference instead of pointer to modify value.
> > > Please confirm if reinterpret_cast is safe on either Intel machine
> > > or AMD machine.
> > > If another machine has 9 bits instead of 8 bits, then I use “if
> > > condition” macro to use bit shift and bit mask instead.
> > > typedef unsigned __int8 size_8;
> > > typedef unsigned __int16 size_16;
> > > typedef unsigned __int32 size_32;

>
> Not sure what __int8, etc. are, although I can guess. *Why not use the
> standard uint8_t, etc.?
>
>
>
> > > int main () {
> > > * * * * size_32 dword = 0x123456U;
> > > * * * * size_8 &L = *reinterpret_cast< size_8* >( &dword );
> > > * * * * size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> > > * * * * size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> > > * * * * size_16 &W = *reinterpret_cast< size_16* >( &dword );
> > > * * * * ++L;
> > > * * * * ++H;
> > > * * * * ++B;
> > > * * * * L += 2;
> > > * * * * return 0;
> > > }

> > This is broken by C++ standard. You are reading an __in32 object
> > through a __int8 lvalue, and that is undefined behavior.
> > At least, it
> > is UB if __int8 is not a typedef of char nor unsigned char.
> > I don't know what various implements will actually do with that code.
> > To fix it, at least the following is allowed by the standard: you can
> > use "char" or "unsigned char" lvalues to read any POD object.

>
> In practice, however, reinterpret_cast becomes useless if this doesn't
> work. *The intent of the standard, here, is rather clear: the behavior
> is undefined in the standard (since there's no way it could be defined
> portably), but it is expected that the implementation define it when
> reasonable. *But there is a lot of gray areas around this. *There is
> also a very definite intent that the compiler can assume no aliasing
> between two pointers to different types, provided that neither type is
> char or unsigned char. *From a QoI point of view, on "normal"
> machines,
> I would expect to be able to access, and even modify the bit patterns,
> through a pointer to any integral type, provided alignment
> restrictions
> are respected, and all of the reinterpret_cast are in the same
> function,
> so the compiler can see them, and take the additional aliasing into
> account (or alternatively, a compiler option is used to turn off all
> optimization based on aliasing analysis). *As to what happens to the
> object whose bit pattern was actually accessed... that's very
> architecture dependent, but if you know the architecture, and the
> actual
> types are all more or less basic types, you can play games. *When
> implementing the C library, I accessed a double through an unsigned
> short* several types, including modifying it (e.g. in ldexp). *It's
> certainly not portable, and it's not the sort of thing to be used
> everywhere, but there are a few specific cases in very low level code
> where it is necessary. *(FWIW: his code will give different
> results---supposing he output dword---on an Intel/AMD and on most
> other
> platforms.)


Perhaps, but the gcc team and the gcc compiler clearly disagree with
your interpretation of the strict aliasing rules, and the C standards
committee /seems/ to be leaning towards disagreeing with you.
(Disagrees for the C programming language mind you, though I would
argue that C++ ought to adopt whatever reasonable resolutions that the
C committee comes to on the union DR and related issues.) See
http://www.open-std.org/jtc1/sc22/wg...ocs/dr_236.htm
and the links for associated meeting minutes for the discussion of the
union DR. The C standards committee /seems/ to be leaning towards a
very naive aliasing rule, though of course I guess we'll have to wait
and see until they publish something definitive. Or, of course, you
could go ask them for us as you actually know them in person (maybe?),
and perhaps you could get them to answer the few other pesky issues I
have about how a general purpose portable conforming pooling memory
allocator on top of malloc is supposed to work, or not work. It would
be nice.

To quote you: "In practice, however, reinterpret_cast becomes useless
if this doesn't work." I think that reinterpret_cast is largely
useless, (except for converting between pointer types and integer
types), especially when compiled under the default gcc options. AFAIK,
reinterpret_cast exists for platform dependent hackery (of which I've
had the pleasure to never need to hack), to convert between pointers
types and integer types, and convert pointer types to char pointer and
unsigned char pointer, and not much else.

I stand by my original point that you really ought not read objects
through incorrectly typed lvalues, unless that incorrectly typed
lvalue is char or unsigned char, if you can at all help it, for
maximum portability and conformance.

James Kanze 03-01-2011 03:18 PM

Re: Confirm reinterpret_cast if is safe?
 
On Mar 1, 10:25 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Mar 1, 1:29 am, James Kanze <james.ka...@gmail.com> wrote:
> > On Mar 1, 12:33 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:


> > > On Feb 28, 3:39 pm, Nephi Immortal <immortalne...@gmail.com> wrote:
> > > > I use reinterpret_cast to convert from 32 bits integer into 8 bits
> > > > integer. I use reference instead of pointer to modify value.
> > > > Please confirm if reinterpret_cast is safe on either Intel machine
> > > > or AMD machine.
> > > > If another machine has 9 bits instead of 8 bits, then I use “if
> > > > condition” macro to use bit shift and bit mask instead.
> > > > typedef unsigned __int8 size_8;
> > > > typedef unsigned __int16 size_16;
> > > > typedef unsigned __int32 size_32;


> > Not sure what __int8, etc. are, although I can guess. Why not use the
> > standard uint8_t, etc.?


> > > > int main () {
> > > > size_32 dword = 0x123456U;
> > > > size_8 &L = *reinterpret_cast< size_8* >( &dword );
> > > > size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
> > > > size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
> > > > size_16 &W = *reinterpret_cast< size_16* >( &dword );
> > > > ++L;
> > > > ++H;
> > > > ++B;
> > > > L += 2;
> > > > return 0;
> > > > }
> > > This is broken by C++ standard. You are reading an __in32 object
> > > through a __int8 lvalue, and that is undefined behavior.
> > > At least, it
> > > is UB if __int8 is not a typedef of char nor unsigned char.
> > > I don't know what various implements will actually do with that code.
> > > To fix it, at least the following is allowed by the standard: you can
> > > use "char" or "unsigned char" lvalues to read any POD object.


> > In practice, however, reinterpret_cast becomes useless if
> > this doesn't work. The intent of the standard, here, is
> > rather clear: the behavior is undefined in the standard
> > (since there's no way it could be defined portably), but it
> > is expected that the implementation define it when
> > reasonable. But there is a lot of gray areas around this.
> > There is also a very definite intent that the compiler can
> > assume no aliasing between two pointers to different types,
> > provided that neither type is char or unsigned char. From
> > a QoI point of view, on "normal" machines, I would expect to
> > be able to access, and even modify the bit patterns, through
> > a pointer to any integral type, provided alignment
> > restrictions are respected, and all of the reinterpret_cast
> > are in the same function, so the compiler can see them, and
> > take the additional aliasing into account (or alternatively,
> > a compiler option is used to turn off all optimization based
> > on aliasing analysis). As to what happens to the object
> > whose bit pattern was actually accessed... that's very
> > architecture dependent, but if you know the architecture,
> > and the actual types are all more or less basic types, you
> > can play games. When implementing the C library, I accessed
> > a double through an unsigned short* several types, including
> > modifying it (e.g. in ldexp). It's certainly not portable,
> > and it's not the sort of thing to be used everywhere, but
> > there are a few specific cases in very low level code where
> > it is necessary. (FWIW: his code will give different
> > results---supposing he output dword---on an Intel/AMD and on
> > most other platforms.)


> Perhaps, but the gcc team and the gcc compiler clearly disagree with
> your interpretation of the strict aliasing rules, and the C standards
> committee /seems/ to be leaning towards disagreeing with you.
> (Disagrees for the C programming language mind you, though I would
> argue that C++ ought to adopt whatever reasonable resolutions that the
> C committee comes to on the union DR and related issues.) See
> http://www.open-std.org/jtc1/sc22/wg...ocs/dr_236.htm


The problem is not simple, and the current wording in both the
C and the C++ standard definitely guarantees some behavior that
I don't think the committee wanted to guarantee (and which
doesn't work in g++, and probably in other compilers as well).
Fundamentally, there are two issues which have to be addressed:

1. Some sort of type punning is necessary in low level code.
Historically, K&R favored unions for this (rather than
pointer casting). For various reasons, the C committee,
when formulating C90, moved in the direction of favoring
pointer casts. All strictly as "intent", since there's
nothing the standard can define with regards to modifying
some bits in a double through a pointer to an integral type
(for example). It's been a long time since I've been
involved in C standardization, so I don't know if the
committee has moved again. Both pointer casting and unions
are wide spread in C code, and from a QoI point of view,
I would expect both to work, the the caveats discussed
below. Anything else shows disdain for the user community.

2. Possible aliasing kills optimization. This is the
motivation behind restrict, and before that noalias; the
programmer declares that there will be no aliasing, and pays
the price if he makes a mistake. The rules also allow the
compiler to assume that pointers to different types (with
a number of exceptions) do not alias. Except when they do.
My own opinion here is that type punning using pointer casts
discussed in 1. should only hold when the aliasing mechanism
is visible in the function where the aliasing occurs. (I'm
not sure how to formulate this in standardese, but I think
this is what the C committee is trying to do.)

> and the links for associated meeting minutes for the discussion of the
> union DR. The C standards committee /seems/ to be leaning towards a
> very naive aliasing rule, though of course I guess we'll have to wait
> and see until they publish something definitive.


I haven't followed it at all closely, but the last time
I looked, the tendancy was to favor a rule which made the
aliasing clear; accessing members of a union was fine (as long
as the member read was the last member written), but only
provided all of the accesses were through the union members.
With regards to pointer casts, it's harder to specify, but in
the end, they don't have to; they only have to make the intent
sort of clear. Formally, accessing any union member except the
last written, or accessing an object except as an unsigned char
or its actual type is undefined behavior. And will remain so,
since it is impossible to define anything which could be valid
on all platforms. This means that formally, all type punning is
undefined behavior.

> Or, of course, you
> could go ask them for us as you actually know them in person (maybe?),
> and perhaps you could get them to answer the few other pesky issues I
> have about how a general purpose portable conforming pooling memory
> allocator on top of malloc is supposed to work, or not work. It would
> be nice.


If you ask on comp.std.c++, you'll get some feedback. Or
perhaps comp.std.c---I haven't looked there in ages, but IMHO,
this is a problem that C should resolve, and C++ should simply
accept the decision. There's absolutely no reason for the
languages to have different rules here.

> To quote you: "In practice, however, reinterpret_cast becomes useless
> if this doesn't work." I think that reinterpret_cast is largely
> useless, (except for converting between pointer types and integer
> types), especially when compiled under the default gcc options. AFAIK,
> reinterpret_cast exists for platform dependent hackery (of which I've
> had the pleasure to never need to hack), to convert between pointers
> types and integer types, and convert pointer types to char pointer and
> unsigned char pointer, and not much else.


The purpose of reinterpret_cast is to make very low level,
platform dependent hackery, necessary. I've used the equivalent
(in C) when implementing the standard C library, one one hand in
the implementation of malloc and free, and on the other in some
of the low level math functions like ldexp. I tend to avoid it
otherwise.

> I stand by my original point that you really ought not read objects
> through incorrectly typed lvalues, unless that incorrectly typed
> lvalue is char or unsigned char, if you can at all help it, for
> maximum portability and conformance.


As soon as you need reinterpret_cast, portability goes out the
window. It's strictly for experts, and only for very machine
dependent code, at the very lowest level.

--
James Kanze


All times are GMT. The time now is 07:23 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57