Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   endian conversion - composite type (http://www.velocityreviews.com/forums/t459760-endian-conversion-composite-type.html)

ma740988 01-10-2007 06:48 PM

endian conversion - composite type
 


Data stored on a storage device is byte swapped. The data is big
endian and my PC is little. At issue: There's a composite type ( a
header ) at the front of the files that I'm trying to read in. I'm
trying to _simulate_ the endian conversion in code below but I'm just
wondering if there's an ideal way to do this besides what's shown?
Padding produces some interesting results. Notice how the parameter d
is different in the print outs . Serializing the data - at the
present time - is not an option.
An aside: Matlab is my prime analysis tool. With matlab I could pass
a parameter to the fopen call and all's well. I'm trying to write
code to do something similar. Thanks in advance

#include <cstdio>
#include <iostream>

typedef unsigned char uc_type ;

#define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
void ByteSwap( unsigned char * b, int n)
{
register int i = 0;
register int j = n - 1;
while ( i < j )
{
std::swap( b[ i ], b[ j ] );
i++, j--;
}
}


struct foo { // lets try a simple struct
short a; // works
short b; // works
unsigned d ; // introduced padding
//char test [ 5 ] ; // swap these
//double dd ;
//float ar ;
};


void showBytes( foo *barp )
{
size_t i;
unsigned char *cp = (unsigned char *)barp;

for (i = 0 ; i < sizeof(*barp) ; ++i ) {
printf("0x%02X ", (unsigned int)cp[i]);
}
std::cout << std::endl;
}

void showBytes( foo& barp )
{
std::cout << barp.a << std::endl;
std::cout << barp.b << std::endl;
std::cout << barp.d << std::endl;
}

int main()
{
foo bar = {0x0102, 0x0304, 0x2030 };

showBytes( &bar );
ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;

//showBytes( bar ) ;
showBytes( &bar );

return 0;
}
/*
0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
Press any key to continue
*/


=?ISO-8859-15?Q?Juli=E1n?= Albo 01-10-2007 07:25 PM

Re: endian conversion - composite type
 
ma740988 wrote:

> Data stored on a storage device is byte swapped. The data is big
> endian and my PC is little. At issue: There's a composite type ( a
> header ) at the front of the files that I'm trying to read in. I'm
> trying to _simulate_ the endian conversion in code below but I'm just
> wondering if there's an ideal way to do this besides what's shown?


The best way to read binary files is to use an unsigned char buffer and
convert from this buffer to the structure you use in the program for that
data. You make the conversion as complex as your goal of portability are,
considering endianess, type of sign enconding used...

A bit more code to write at first, but avoids the need to worry about
padding and many other issues.

--
Salu2

Robert Mabee 01-10-2007 09:32 PM

Re: endian conversion - composite type
 
Julián Albo wrote:
> The best way to read binary files is to use an unsigned char buffer and
> convert from this buffer to the structure you use in the program for that
> data. You make the conversion as complex as your goal of portability are,
> considering endianess, type of sign enconding used...
>
> A bit more code to write at first, but avoids the need to worry about
> padding and many other issues.


To clarify, the converting code needs to worry about padding inserted in
the byte stream because the source wrote entire structs.

I suggest making it look like a stream filter reading chars from an
underlying stream so you won't ever deal with the buffer and boundary
conditions. Each function to read a particular type needs to a) skip
padding bytes that the source would have inserted to align that type;
b) read and assemble the bytes of the object; c) perhaps do something
really hard for floating-point data using a different representation,
or for bitfield data; d) pick up the value as the correct type and
return it. Sometimes you'll find shortcuts, as when 32 bit data only
needs 16 bit alignment so can be fetched by two calls to the 16 bit
fetcher.

I would add separate functions to mark the beginning and end of each
struct as there is additional padding there not related to the type of
the next member. This will require you to analyze the struct so you
can pass in the alignment the source machine will have assumed for the
struct as a whole. At least you won't have to make every single pad
explicit.

Once, when faced with too much foreign data, I wrote functions to take
a dense character string description of a struct like "ssslccl" and
convert to and from the foreign form, knowing the padding requirements
of both forms.

I consider this a defect in the language. I should be able to declare
the interface properties of the struct (padding, byte order, FP format)
in a standard way and let the compiler choose to implement it or reject
it or maybe half-implement it so special functions could be applied to
the members that can't be accessed normally. We do it anyway for device
drivers with memory-mapped I/O and for MMU structures, but fighting the
compiler every step of the way.

bjeremy 01-10-2007 09:49 PM

Re: endian conversion - composite type
 

ma740988 wrote:
> Data stored on a storage device is byte swapped. The data is big
> endian and my PC is little. At issue: There's a composite type ( a
> header ) at the front of the files that I'm trying to read in. I'm
> trying to _simulate_ the endian conversion in code below but I'm just
> wondering if there's an ideal way to do this besides what's shown?
> Padding produces some interesting results. Notice how the parameter d
> is different in the print outs . Serializing the data - at the
> present time - is not an option.
> An aside: Matlab is my prime analysis tool. With matlab I could pass
> a parameter to the fopen call and all's well. I'm trying to write
> code to do something similar. Thanks in advance
>
> #include <cstdio>
> #include <iostream>
>
> typedef unsigned char uc_type ;
>
> #define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) )
> void ByteSwap( unsigned char * b, int n)
> {
> register int i = 0;
> register int j = n - 1;
> while ( i < j )
> {
> std::swap( b[ i ], b[ j ] );
> i++, j--;
> }
> }
>
>
> struct foo { // lets try a simple struct
> short a; // works
> short b; // works
> unsigned d ; // introduced padding
> //char test [ 5 ] ; // swap these
> //double dd ;
> //float ar ;
> };
>
>
> void showBytes( foo *barp )
> {
> size_t i;
> unsigned char *cp = (unsigned char *)barp;
>
> for (i = 0 ; i < sizeof(*barp) ; ++i ) {
> printf("0x%02X ", (unsigned int)cp[i]);
> }
> std::cout << std::endl;
> }
>
> void showBytes( foo& barp )
> {
> std::cout << barp.a << std::endl;
> std::cout << barp.b << std::endl;
> std::cout << barp.d << std::endl;
> }
>
> int main()
> {
> foo bar = {0x0102, 0x0304, 0x2030 };
>
> showBytes( &bar );
> ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ;
> ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ;
> ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ;
>
> //showBytes( bar ) ;
> showBytes( &bar );
>
> return 0;
> }
> /*
> 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00
> 0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30
> Press any key to continue
> */


why can't you just do a ntohs, ntohl once you read data off your
storage device. If your pc is little endian, so the ntohl/ntohs
shouldn't be a no-op, and they will swap the bytes for you. The only
problem you may encounter is if your composite header uses nibbles in
order to store data... each nibble would need to be manually swapped
before you recompose your header.


=?ISO-8859-15?Q?Juli=E1n?= Albo 01-10-2007 10:37 PM

Re: endian conversion - composite type
 
Robert Mabee wrote:

>> The best way to read binary files is to use an unsigned char buffer and
>> convert from this buffer to the structure you use in the program for that
>> data. You make the conversion as complex as your goal of portability are,
>> considering endianess, type of sign enconding used...
>>
>> A bit more code to write at first, but avoids the need to worry about
>> padding and many other issues.

>
> To clarify, the converting code needs to worry about padding inserted in
> the byte stream because the source wrote entire structs.


From the reader point of view this is unimportant. The padding from the
writer's compiler can be seen the same as a FILLER in Cobol, a part of the
organization of the file.

> I suggest making it look like a stream filter reading chars from an
> underlying stream so you won't ever deal with the buffer and boundary
> conditions. Each function to read a particular type needs to a) skip
> padding bytes that the source would have inserted to align that type;


Is doable, but may be difficult to evaluate the padding conditions.

> c) perhaps do something really hard for floating-point data using a
> different representation, or for bitfield data;


Yes, because of that I said that more or less effort will be needed
depending of the portability goal.

> Once, when faced with too much foreign data, I wrote functions to take
> a dense character string description of a struct like "ssslccl" and
> convert to and from the foreign form, knowing the padding requirements
> of both forms.


Some time ago I wrote a program that takes a description of the record and
displayed the content of a file according to it. The same can be done
inside a program, or in a program that generates code to be used in the
program that deals with the data.

> I consider this a defect in the language. I should be able to declare
> the interface properties of the struct (padding, byte order, FP format)
> in a standard way and let the compiler choose to implement it or reject
> it or maybe half-implement it so special functions could be applied to
> the members that can't be accessed normally.


There is no need to make part of the language a thing perfectly doable
without direct language support. This is a general design principle of C++.

--
Salu2

ma740988 01-11-2007 01:43 AM

Re: endian conversion - composite type
 

Julián Albo wrote:
> ma740988 wrote:
>
> > Data stored on a storage device is byte swapped. The data is big
> > endian and my PC is little. At issue: There's a composite type ( a
> > header ) at the front of the files that I'm trying to read in. I'm
> > trying to _simulate_ the endian conversion in code below but I'm just
> > wondering if there's an ideal way to do this besides what's shown?

>
> The best way to read binary files is to use an unsigned char buffer and
> convert from this buffer to the structure you use in the program for that
> data. You make the conversion as complex as your goal of portability are,
> considering endianess, type of sign enconding used...


Do you know of/have an example of this anywhere I could peruse?


=?ISO-8859-15?Q?Juli=E1n?= Albo 01-11-2007 06:51 AM

Re: endian conversion - composite type
 
ma740988 wrote:

>> The best way to read binary files is to use an unsigned char buffer and
>> convert from this buffer to the structure you use in the program for that
>> data. You make the conversion as complex as your goal of portability are,
>> considering endianess, type of sign enconding used...

> Do you know of/have an example of this anywhere I could peruse?


I posted a sample code some time ago in this group, you can try to find it
in google groups.

--
Salu2

Grizlyk 01-14-2007 06:02 AM

Re: endian conversion - composite type
 
ma740988 wrote:

> Data stored on a storage device is byte swapped. The data is big
> endian and my PC is little.


> foo bar = {0x0102, 0x0304, 0x2030 };
>
> 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00


Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
endian?

0x2030 = = 0x00002030 is not the same as 0x20300000

"0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
high 16 bit big-endian word
It looks like mixed endian (google sad - middle-endian(PDP-endian)). In
the case you can not swap bytes in the same manner as words.

for 0x50607080

big endian is:
word: low byte , high byte
dword: low word, high word

" 0x80, 0x70, 0x60, 0x50 "

little endian must have been:
word: high byte, low byte
dword: high word, low word

" 0x00, 0x00, 0x20, 0x30 "

Use:
?#include <netinet/in.h>
htons(), htonl(), ntohs(), ntohl() - POSIX functions.


Grizlyk 01-14-2007 06:32 AM

Re: endian conversion - composite type
 
Grizlyk wrote:

Fuu, sorry, I see, i have mixed all in my poor head with the huge
number of "endians" applied everywhere.

I have replaced your PC's "endians" and your data's "endians", who is
what and simultaneously replaced "little-endian" and "big-endian" names
for byte order.

> ma740988 wrote:
>
> > Data stored on a storage device is byte swapped. The data is big
> > endian and my PC is little.

>
> > foo bar = {0x0102, 0x0304, 0x2030 };
> >
> > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00

>
> Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little
> endian?


Yes, it is correct little endian data on little endian PC.

> "0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" -
> high 16 bit big-endian word


No, "0x30 0x20" - low 16 bit little-endian word was placed befor "0x00
0x00" - high 16 bit little-endian word, was correct placed for
little-endian 32 bit dword.

> It looks like mixed endian


No, this is wrong

> for 0x50607080
>
> big endian is:
> word: low byte , high byte
> dword: low word, high word
>
> " 0x80, 0x70, 0x60, 0x50 "


No, this is little endian

> little endian must have been:
> word: high byte, low byte
> dword: high word, low word
>
> " 0x00, 0x00, 0x20, 0x30 "
>


" 0x50, 0x60, 0x70, 0x80 "
No, this is big endian

It seems to me, the "endians" distribution are more correct. Or no?



All times are GMT. The time now is 01:59 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.