![]() |
|
|
|||||||
![]() |
C++ - Binary file IO: Converting imported sequences of chars to desiredtype |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Hi all.
I have used the method from this page, http://www.cplusplus.com/reference/i.../istream/read/ to read some binary data from a file to a char[] buffer. The 4 first characters constitute the binary encoding of a float type number. What is the better way to transfer the chars to a float variable? The naive C way would be to use memcopy. Is there a better C++ way? Rune Rune Allnor |
|
|
|
|
#2 |
|
Posts: n/a
|
On 17/10/09 18:39, Rune Allnor wrote:
> Hi all. > > I have used the method from this page, > > http://www.cplusplus.com/reference/i.../istream/read/ > > to read some binary data from a file to a char[] buffer. > > The 4 first characters constitute the binary encoding of > a float type number. What is the better way to transfer > the chars to a float variable? > > The naive C way would be to use memcopy. Is there a > better C++ way? This is the correct way since memcpy() allows you to copy unaligned data into an aligned object. Another way is to read data directly into the aligned object: float f; stream.read(reinterpret_cast<char*>(&f), sizeof f); -- Max Maxim Yegorushkin |
|
|
|
#3 |
|
Posts: n/a
|
On Oct 17, 7:47 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote: > On 17/10/09 18:39, Rune Allnor wrote: > > I have used the method from this page, > >http://www.cplusplus.com/reference/i.../istream/read/ > > to read some binary data from a file to a char[] buffer. > > The 4 first characters constitute the binary encoding of > > a float type number. What is the better way to transfer > > the chars to a float variable? > > The naive C way would be to use memcopy. Is there a > > better C++ way? > This is the correct way since memcpy() allows you to copy > unaligned data into an aligned object. > Another way is to read data directly into the aligned object: > float f; > stream.read(reinterpret_cast<char*>(&f), sizeof f); Neither, of course, work, except in very limited cases. To convert bytes written in a binary byte stream to any internal format, you have to know the format in the file; if you also know the internal format, and have only limited portability concerns, you can generally do the conversion much faster; a truely portable read requires use of ldexp, etc., but if you are willing to limit your portability to machines using IEEE (Windows and mainstream Unix, but not mainframes), and the file format is IEEE, you can simply read the data as a 32 bit unsigned int, then use reinterpret_cast (or memcpy). FWIW: the fully portable solution is something like: class ByteGetter { public: explicit ByteGetter( ixdrstream& stream ) : mySentry( stream ) , myStream( stream ) , mySB( stream->rdbuf() ) , myIsFirst( true ) { if ( ! mySentry ) { mySB = NULL ; } } uint8_t get() { int result = 0 ; if ( mySB != NULL ) { result = mySB->sgetc() ; if ( result == EOF ) { result = 0 ; myStream.setstate( myIsFirst ? std::ios::failbit | std::ios::eofbit : std::ios::failbit | std::ios::eofbit | std::ios::badbit ) ; } } myIsFirst = false ; return result ; } private: ixdrstream::sentry mySentry ; ixdrstream& myStream ; std::streambuf* mySB ; bool myIsFirst ; } ; ixdrstream& ixdrstream: uint32_t& dest ) { ByteGetter source( *this ) ; uint32_t tmp = source.get() << 24 ; tmp |= source.get() << 16 ; tmp |= source.get() << 8 ; tmp |= source.get() ; if ( *this ) { dest = tmp ; } return *this ; } ixdrstream& ixdrstream: float& dest ) { uint32_t tmp ; operator>>( tmp ) ; if ( *this ) { float f = 0.0 ; if ( (tmp & 0x7FFFFFFF) != 0 ) { f = ldexp( ((tmp & 0x007FFFFF) | 0x00800000), (int)((tmp & 0x7F800000) >> 23) - 126 - 24 ) ; } if ( (tmp & 0x80000000) != 0 ) { f = -f ; } dest = f ; } return *this ; } The above code still needs work to handle NaN's and Infinity correctly, but it should give a good idea of what it necessary. If you aren't concerned about machines which aren't IEEE, of course, you can just memcpy the tmp after having read it in the last function above, or use a reinterpret_cast to force the types. -- James Kanze James Kanze |
|
|
|
#4 |
|
Posts: n/a
|
On 17 Okt, 19:47, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote: > On 17/10/09 18:39, Rune Allnor wrote: > > > Hi all. > > > I have used the method from this page, > > >http://www.cplusplus.com/reference/i.../istream/read/ > > > to read some binary data from a file to a char[] buffer. > > > The 4 first characters constitute the binary encoding of > > a float type number. What is the better way to transfer > > the chars to a float variable? > > > The naive C way would be to use memcopy. Is there a > > better C++ way? > > This is the correct way since memcpy() allows you to copy unaligned data > into an aligned object. > > Another way is to read data directly into the aligned object: > > * * *float f; > * * *stream.read(reinterpret_cast<char*>(&f), sizeof f); The naive std::vector<float> v; for (n=0;n<N;++n) { file.read(reinterpret_cast<char*>(&f), sizeof f); v.push_back(v); } doesn't work as expected. Do I need to call 'seekg' inbetween? Rune Rune Allnor |
|
|
|
#5 |
|
Posts: n/a
|
* Rune Allnor:
> On 17 Okt, 19:47, Maxim Yegorushkin <maxim.yegorush...@gmail.com> > wrote: >> On 17/10/09 18:39, Rune Allnor wrote: >> >>> Hi all. >>> I have used the method from this page, >>> http://www.cplusplus.com/reference/i.../istream/read/ >>> to read some binary data from a file to a char[] buffer. >>> The 4 first characters constitute the binary encoding of >>> a float type number. What is the better way to transfer >>> the chars to a float variable? >>> The naive C way would be to use memcopy. Is there a >>> better C++ way? >> This is the correct way since memcpy() allows you to copy unaligned data >> into an aligned object. >> >> Another way is to read data directly into the aligned object: >> >> float f; >> stream.read(reinterpret_cast<char*>(&f), sizeof f); > > The naive > > std::vector<float> v; > for (n=0;n<N;++n) > { > file.read(reinterpret_cast<char*>(&f), sizeof f); > v.push_back(v); > } > > doesn't work as expected. Do I need to call 'seekg' > inbetween? post complete code cheers & hth - alf Alf P. Steinbach |
|
|
|
#6 |
|
Posts: n/a
|
On 18 Okt, 12:26, "Alf P. Steinbach" <al...@start.no> wrote:
> * Rune Allnor: > > > > > > > On 17 Okt, 19:47, Maxim Yegorushkin <maxim.yegorush...@gmail.com> > > wrote: > >> On 17/10/09 18:39, Rune Allnor wrote: > > >>> Hi all. > >>> I have used the method from this page, > >>>http://www.cplusplus.com/reference/i.../istream/read/ > >>> to read some binary data from a file to a char[] buffer. > >>> The 4 first characters constitute the binary encoding of > >>> a float type number. What is the better way to transfer > >>> the chars to a float variable? > >>> The naive C way would be to use memcopy. Is there a > >>> better C++ way? > >> This is the correct way since memcpy() allows you to copy unaligned data > >> into an aligned object. > > >> Another way is to read data directly into the aligned object: > > >> * * *float f; > >> * * *stream.read(reinterpret_cast<char*>(&f), sizeof f); > > > The naive > > > std::vector<float> v; > > for (n=0;n<N;++n) > > { > > * *file.read(reinterpret_cast<char*>(&f), sizeof f); > > * *v.push_back(v); > > } > > > doesn't work as expected. Do I need to call 'seekg' > > inbetween? > > post complete code Never mind. The project was compiled in 'release mode' with every optimization flag I could find set to 11. No reason to expect the source code to have anything whatsoever to do with what actually goes on. Once I switched back to debug mode, I was able to track the progress. Rune Rune Allnor |
|
|
|
#7 |
|
Posts: n/a
|
On 18/10/09 10:10, James Kanze wrote:
> On Oct 17, 7:47 pm, Maxim Yegorushkin<maxim.yegorush...@gmail.com> > wrote: >> On 17/10/09 18:39, Rune Allnor wrote: > >>> I have used the method from this page, > >>> http://www.cplusplus.com/reference/i.../istream/read/ > >>> to read some binary data from a file to a char[] buffer. > >>> The 4 first characters constitute the binary encoding of >>> a float type number. What is the better way to transfer >>> the chars to a float variable? > >>> The naive C way would be to use memcopy. Is there a >>> better C++ way? > >> This is the correct way since memcpy() allows you to copy >> unaligned data into an aligned object. > >> Another way is to read data directly into the aligned object: > >> float f; >> stream.read(reinterpret_cast<char*>(&f), sizeof f); > > Neither, of course, work, except in very limited cases. The assumption was that the float was written by the same program or a program with a compatible binary API. Is that the case you meant in "except in very limited cases"? -- Max Maxim Yegorushkin |
|
|
|
#8 |
|
Posts: n/a
|
On Oct 18, 12:13 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote: > On 18/10/09 10:10, James Kanze wrote: > > On Oct 17, 7:47 pm, Maxim Yegorushkin<maxim.yegorush...@gmail.com> > > wrote: > >> On 17/10/09 18:39, Rune Allnor wrote: > >>> I have used the method from this page, > >>>http://www.cplusplus.com/reference/i.../istream/read/ > >>> to read some binary data from a file to a char[] buffer. > >>> The 4 first characters constitute the binary encoding of > >>> a float type number. What is the better way to transfer > >>> the chars to a float variable? > >>> The naive C way would be to use memcopy. Is there a > >>> better C++ way? > >> This is the correct way since memcpy() allows you to copy > >> unaligned data into an aligned object. > >> Another way is to read data directly into the aligned object: > >> float f; > >> stream.read(reinterpret_cast<char*>(&f), sizeof f); > > Neither, of course, work, except in very limited cases. > The assumption was that the float was written by the same > program or a program with a compatible binary API. Is that the > case you meant in "except in very limited cases"? More or less. Formally, there's no guarantee that the compatible binary API works, but in practice, it almost certainly will. Note, however, that most systems today support several incompatible binary API's; which one the compiler uses depends on the version and the options used for compiling. In practice, it's not something you can count on except for very short lived data: I wouldn't hesitate about using it for spilling temporary data to disk, to be reread later by the same process. I can imagine that it's quite acceptable as well if you have one program collecting data during e.g. a week, and another processing all of the data in batch over the week-end, provided that both programs were compiled with the same compiler, using the same options. Beyond that, I'd have my doubts (having been bit with the problem more than once in the past). As a general rule, it's better to define a format, and match it. (Even if I were using a memory dump, I'd first "define" the format, just ensuring that the definition was compatible to the in memory image. That way, if worse comes to worse, at least a maintenance programmer will know what to expect, and will have a chance at making it work.) -- James Kanze James Kanze |
|
|
|
#9 |
|
Posts: n/a
|
On Mon, 2009-10-19, James Kanze wrote:
> On Oct 18, 12:13 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com> > wrote: .... >> The assumption was that the float was written by the same >> program or a program with a compatible binary API. Is that the >> case you meant in "except in very limited cases"? > > More or less. Formally, there's no guarantee that the > compatible binary API works, but in practice, it almost > certainly will. > > Note, however, that most systems today support several > incompatible binary API's; which one the compiler uses depends > on the version and the options used for compiling. In practice, > it's not something you can count on except for very short lived > data: I wouldn't hesitate about using it for spilling temporary > data to disk, to be reread later by the same process. I can > imagine that it's quite acceptable as well if you have one > program collecting data during e.g. a week, and another > processing all of the data in batch over the week-end, provided > that both programs were compiled with the same compiler, using > the same options. Beyond that, I'd have my doubts (having been > bit with the problem more than once in the past). As a general > rule, it's better to define a format, and match it. (Even if I > were using a memory dump, I'd first "define" the format, just > ensuring that the definition was compatible to the in memory > image. That way, if worse comes to worse, at least a > maintenance programmer will know what to expect, and will have a > chance at making it work.) But if you have a choice, it's IMO almost always better to write the data as text, compressing it first using something like gzip if I/O or disk space is an issue. (Loss of precision when printing decimal floats could be a problem in this case though ...) /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . Jorgen Grahn |
|
|
|
#10 |
|
Posts: n/a
|
On Oct 23, 9:07 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> On Mon, 2009-10-19, James Kanze wrote: > > On Oct 18, 12:13 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com> > > wrote: > ... > >> The assumption was that the float was written by the same > >> program or a program with a compatible binary API. Is that > >> the case you meant in "except in very limited cases"? > > More or less. Formally, there's no guarantee that the > > compatible binary API works, but in practice, it almost > > certainly will. > > Note, however, that most systems today support several > > incompatible binary API's; which one the compiler uses > > depends on the version and the options used for compiling. > > In practice, it's not something you can count on except for > > very short lived data: I wouldn't hesitate about using it > > for spilling temporary data to disk, to be reread later by > > the same process. I can imagine that it's quite acceptable > > as well if you have one program collecting data during e.g. > > a week, and another processing all of the data in batch over > > the week-end, provided that both programs were compiled with > > the same compiler, using the same options. Beyond that, I'd > > have my doubts (having been bit with the problem more than > > once in the past). As a general rule, it's better to define > > a format, and match it. (Even if I were using a memory > > dump, I'd first "define" the format, just ensuring that the > > definition was compatible to the in memory image. That way, > > if worse comes to worse, at least a maintenance programmer > > will know what to expect, and will have a chance at making > > it work.) > But if you have a choice, it's IMO almost always better to > write the data as text, compressing it first using something > like gzip if I/O or disk space is an issue. Totally agreed. Especially for the maintenance programmer, who can see at a glance what is being written. > (Loss of precision when printing decimal floats could be a > problem in this case though ...) It's a hard problem in general. If writing and reading to internal formats with the same precision, it's sufficient to output enough digits. If you don't know the precision of the reader, however, you don't really know how many digits to output when writing. -- James Kanze James Kanze |
|
![]() |
| Thread Tools | Search this Thread |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Counting In Binary | Raymond | A+ Certification | 13 | 03-07-2004 07:28 PM |
| HD-DVD and DVD's future | Phil Riker | DVD Video | 68 | 09-28-2003 09:32 PM |