![]() |
endian conversion - composite type
Data stored on a storage device is byte swapped. The data is big endian and my PC is little. At issue: There's a composite type ( a header ) at the front of the files that I'm trying to read in. I'm trying to _simulate_ the endian conversion in code below but I'm just wondering if there's an ideal way to do this besides what's shown? Padding produces some interesting results. Notice how the parameter d is different in the print outs . Serializing the data - at the present time - is not an option. An aside: Matlab is my prime analysis tool. With matlab I could pass a parameter to the fopen call and all's well. I'm trying to write code to do something similar. Thanks in advance #include <cstdio> #include <iostream> typedef unsigned char uc_type ; #define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) ) void ByteSwap( unsigned char * b, int n) { register int i = 0; register int j = n - 1; while ( i < j ) { std::swap( b[ i ], b[ j ] ); i++, j--; } } struct foo { // lets try a simple struct short a; // works short b; // works unsigned d ; // introduced padding //char test [ 5 ] ; // swap these //double dd ; //float ar ; }; void showBytes( foo *barp ) { size_t i; unsigned char *cp = (unsigned char *)barp; for (i = 0 ; i < sizeof(*barp) ; ++i ) { printf("0x%02X ", (unsigned int)cp[i]); } std::cout << std::endl; } void showBytes( foo& barp ) { std::cout << barp.a << std::endl; std::cout << barp.b << std::endl; std::cout << barp.d << std::endl; } int main() { foo bar = {0x0102, 0x0304, 0x2030 }; showBytes( &bar ); ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ; ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ; ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ; //showBytes( bar ) ; showBytes( &bar ); return 0; } /* 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00 0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30 Press any key to continue */ |
Re: endian conversion - composite type
ma740988 wrote:
> Data stored on a storage device is byte swapped. The data is big > endian and my PC is little. At issue: There's a composite type ( a > header ) at the front of the files that I'm trying to read in. I'm > trying to _simulate_ the endian conversion in code below but I'm just > wondering if there's an ideal way to do this besides what's shown? The best way to read binary files is to use an unsigned char buffer and convert from this buffer to the structure you use in the program for that data. You make the conversion as complex as your goal of portability are, considering endianess, type of sign enconding used... A bit more code to write at first, but avoids the need to worry about padding and many other issues. -- Salu2 |
Re: endian conversion - composite type
Julián Albo wrote:
> The best way to read binary files is to use an unsigned char buffer and > convert from this buffer to the structure you use in the program for that > data. You make the conversion as complex as your goal of portability are, > considering endianess, type of sign enconding used... > > A bit more code to write at first, but avoids the need to worry about > padding and many other issues. To clarify, the converting code needs to worry about padding inserted in the byte stream because the source wrote entire structs. I suggest making it look like a stream filter reading chars from an underlying stream so you won't ever deal with the buffer and boundary conditions. Each function to read a particular type needs to a) skip padding bytes that the source would have inserted to align that type; b) read and assemble the bytes of the object; c) perhaps do something really hard for floating-point data using a different representation, or for bitfield data; d) pick up the value as the correct type and return it. Sometimes you'll find shortcuts, as when 32 bit data only needs 16 bit alignment so can be fetched by two calls to the 16 bit fetcher. I would add separate functions to mark the beginning and end of each struct as there is additional padding there not related to the type of the next member. This will require you to analyze the struct so you can pass in the alignment the source machine will have assumed for the struct as a whole. At least you won't have to make every single pad explicit. Once, when faced with too much foreign data, I wrote functions to take a dense character string description of a struct like "ssslccl" and convert to and from the foreign form, knowing the padding requirements of both forms. I consider this a defect in the language. I should be able to declare the interface properties of the struct (padding, byte order, FP format) in a standard way and let the compiler choose to implement it or reject it or maybe half-implement it so special functions could be applied to the members that can't be accessed normally. We do it anyway for device drivers with memory-mapped I/O and for MMU structures, but fighting the compiler every step of the way. |
Re: endian conversion - composite type
ma740988 wrote: > Data stored on a storage device is byte swapped. The data is big > endian and my PC is little. At issue: There's a composite type ( a > header ) at the front of the files that I'm trying to read in. I'm > trying to _simulate_ the endian conversion in code below but I'm just > wondering if there's an ideal way to do this besides what's shown? > Padding produces some interesting results. Notice how the parameter d > is different in the print outs . Serializing the data - at the > present time - is not an option. > An aside: Matlab is my prime analysis tool. With matlab I could pass > a parameter to the fopen call and all's well. I'm trying to write > code to do something similar. Thanks in advance > > #include <cstdio> > #include <iostream> > > typedef unsigned char uc_type ; > > #define c( x ) ByteSwap( (unsigned char *) &x, sizeof( x ) ) > void ByteSwap( unsigned char * b, int n) > { > register int i = 0; > register int j = n - 1; > while ( i < j ) > { > std::swap( b[ i ], b[ j ] ); > i++, j--; > } > } > > > struct foo { // lets try a simple struct > short a; // works > short b; // works > unsigned d ; // introduced padding > //char test [ 5 ] ; // swap these > //double dd ; > //float ar ; > }; > > > void showBytes( foo *barp ) > { > size_t i; > unsigned char *cp = (unsigned char *)barp; > > for (i = 0 ; i < sizeof(*barp) ; ++i ) { > printf("0x%02X ", (unsigned int)cp[i]); > } > std::cout << std::endl; > } > > void showBytes( foo& barp ) > { > std::cout << barp.a << std::endl; > std::cout << barp.b << std::endl; > std::cout << barp.d << std::endl; > } > > int main() > { > foo bar = {0x0102, 0x0304, 0x2030 }; > > showBytes( &bar ); > ByteSwap ( ( unsigned char*) &bar.a, sizeof ( bar.a ) ) ; > ByteSwap ( ( unsigned char*) &bar.b, sizeof ( bar.b ) ) ; > ByteSwap ( ( unsigned char*) &bar.d, sizeof ( bar.d ) ) ; > > //showBytes( bar ) ; > showBytes( &bar ); > > return 0; > } > /* > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00 > 0x01 0x02 0x03 0x04 0x00 0x00 0x20 0x30 > Press any key to continue > */ why can't you just do a ntohs, ntohl once you read data off your storage device. If your pc is little endian, so the ntohl/ntohs shouldn't be a no-op, and they will swap the bytes for you. The only problem you may encounter is if your composite header uses nibbles in order to store data... each nibble would need to be manually swapped before you recompose your header. |
Re: endian conversion - composite type
Robert Mabee wrote:
>> The best way to read binary files is to use an unsigned char buffer and >> convert from this buffer to the structure you use in the program for that >> data. You make the conversion as complex as your goal of portability are, >> considering endianess, type of sign enconding used... >> >> A bit more code to write at first, but avoids the need to worry about >> padding and many other issues. > > To clarify, the converting code needs to worry about padding inserted in > the byte stream because the source wrote entire structs. From the reader point of view this is unimportant. The padding from the writer's compiler can be seen the same as a FILLER in Cobol, a part of the organization of the file. > I suggest making it look like a stream filter reading chars from an > underlying stream so you won't ever deal with the buffer and boundary > conditions. Each function to read a particular type needs to a) skip > padding bytes that the source would have inserted to align that type; Is doable, but may be difficult to evaluate the padding conditions. > c) perhaps do something really hard for floating-point data using a > different representation, or for bitfield data; Yes, because of that I said that more or less effort will be needed depending of the portability goal. > Once, when faced with too much foreign data, I wrote functions to take > a dense character string description of a struct like "ssslccl" and > convert to and from the foreign form, knowing the padding requirements > of both forms. Some time ago I wrote a program that takes a description of the record and displayed the content of a file according to it. The same can be done inside a program, or in a program that generates code to be used in the program that deals with the data. > I consider this a defect in the language. I should be able to declare > the interface properties of the struct (padding, byte order, FP format) > in a standard way and let the compiler choose to implement it or reject > it or maybe half-implement it so special functions could be applied to > the members that can't be accessed normally. There is no need to make part of the language a thing perfectly doable without direct language support. This is a general design principle of C++. -- Salu2 |
Re: endian conversion - composite type
Julián Albo wrote: > ma740988 wrote: > > > Data stored on a storage device is byte swapped. The data is big > > endian and my PC is little. At issue: There's a composite type ( a > > header ) at the front of the files that I'm trying to read in. I'm > > trying to _simulate_ the endian conversion in code below but I'm just > > wondering if there's an ideal way to do this besides what's shown? > > The best way to read binary files is to use an unsigned char buffer and > convert from this buffer to the structure you use in the program for that > data. You make the conversion as complex as your goal of portability are, > considering endianess, type of sign enconding used... Do you know of/have an example of this anywhere I could peruse? |
Re: endian conversion - composite type
ma740988 wrote:
>> The best way to read binary files is to use an unsigned char buffer and >> convert from this buffer to the structure you use in the program for that >> data. You make the conversion as complex as your goal of portability are, >> considering endianess, type of sign enconding used... > Do you know of/have an example of this anywhere I could peruse? I posted a sample code some time ago in this group, you can try to find it in google groups. -- Salu2 |
Re: endian conversion - composite type
ma740988 wrote:
> Data stored on a storage device is byte swapped. The data is big > endian and my PC is little. > foo bar = {0x0102, 0x0304, 0x2030 }; > > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00 Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little endian? 0x2030 = = 0x00002030 is not the same as 0x20300000 "0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" - high 16 bit big-endian word It looks like mixed endian (google sad - middle-endian(PDP-endian)). In the case you can not swap bytes in the same manner as words. for 0x50607080 big endian is: word: low byte , high byte dword: low word, high word " 0x80, 0x70, 0x60, 0x50 " little endian must have been: word: high byte, low byte dword: high word, low word " 0x00, 0x00, 0x20, 0x30 " Use: ?#include <netinet/in.h> htons(), htonl(), ntohs(), ntohl() - POSIX functions. |
Re: endian conversion - composite type
Grizlyk wrote:
Fuu, sorry, I see, i have mixed all in my poor head with the huge number of "endians" applied everywhere. I have replaced your PC's "endians" and your data's "endians", who is what and simultaneously replaced "little-endian" and "big-endian" names for byte order. > ma740988 wrote: > > > Data stored on a storage device is byte swapped. The data is big > > endian and my PC is little. > > > foo bar = {0x0102, 0x0304, 0x2030 }; > > > > 0x02 0x01 0x04 0x03 0x30 0x20 0x00 0x00 > > Is it memory dump? Are you shure "0x30 0x20 0x00 0x00 " is little > endian? Yes, it is correct little endian data on little endian PC. > "0x30 0x20" - low 16 bit big-endian word was placed befor "0x00 0x00" - > high 16 bit big-endian word No, "0x30 0x20" - low 16 bit little-endian word was placed befor "0x00 0x00" - high 16 bit little-endian word, was correct placed for little-endian 32 bit dword. > It looks like mixed endian No, this is wrong > for 0x50607080 > > big endian is: > word: low byte , high byte > dword: low word, high word > > " 0x80, 0x70, 0x60, 0x50 " No, this is little endian > little endian must have been: > word: high byte, low byte > dword: high word, low word > > " 0x00, 0x00, 0x20, 0x30 " > " 0x50, 0x60, 0x70, 0x80 " No, this is big endian It seems to me, the "endians" distribution are more correct. Or no? |
| All times are GMT. The time now is 06:01 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.