Go Back   Velocity Reviews > Newsgroups > C++
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

C++ - Binary file IO: Converting imported sequences of chars to desiredtype

 
Thread Tools Search this Thread
Old 10-17-2009, 06:39 PM   #1
Default Binary file IO: Converting imported sequences of chars to desiredtype


Hi all.

I have used the method from this page,

http://www.cplusplus.com/reference/i.../istream/read/

to read some binary data from a file to a char[] buffer.

The 4 first characters constitute the binary encoding of
a float type number. What is the better way to transfer
the chars to a float variable?

The naive C way would be to use memcopy. Is there a
better C++ way?

Rune


Rune Allnor
  Reply With Quote
Old 10-17-2009, 06:47 PM   #2
Maxim Yegorushkin
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On 17/10/09 18:39, Rune Allnor wrote:
> Hi all.
>
> I have used the method from this page,
>
> http://www.cplusplus.com/reference/i.../istream/read/
>
> to read some binary data from a file to a char[] buffer.
>
> The 4 first characters constitute the binary encoding of
> a float type number. What is the better way to transfer
> the chars to a float variable?
>
> The naive C way would be to use memcopy. Is there a
> better C++ way?


This is the correct way since memcpy() allows you to copy unaligned data
into an aligned object.

Another way is to read data directly into the aligned object:

float f;
stream.read(reinterpret_cast<char*>(&f), sizeof f);

--
Max


Maxim Yegorushkin
  Reply With Quote
Old 10-18-2009, 10:10 AM   #3
James Kanze
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On Oct 17, 7:47 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On 17/10/09 18:39, Rune Allnor wrote:


> > I have used the method from this page,


> >http://www.cplusplus.com/reference/i.../istream/read/


> > to read some binary data from a file to a char[] buffer.


> > The 4 first characters constitute the binary encoding of
> > a float type number. What is the better way to transfer
> > the chars to a float variable?


> > The naive C way would be to use memcopy. Is there a
> > better C++ way?


> This is the correct way since memcpy() allows you to copy
> unaligned data into an aligned object.


> Another way is to read data directly into the aligned object:


> float f;
> stream.read(reinterpret_cast<char*>(&f), sizeof f);


Neither, of course, work, except in very limited cases.

To convert bytes written in a binary byte stream to any internal
format, you have to know the format in the file; if you also
know the internal format, and have only limited portability
concerns, you can generally do the conversion much faster; a
truely portable read requires use of ldexp, etc., but if you are
willing to limit your portability to machines using IEEE
(Windows and mainstream Unix, but not mainframes), and the file
format is IEEE, you can simply read the data as a 32 bit
unsigned int, then use reinterpret_cast (or memcpy).

FWIW: the fully portable solution is something like:

class ByteGetter
{
public:
explicit ByteGetter( ixdrstream& stream )
: mySentry( stream )
, myStream( stream )
, mySB( stream->rdbuf() )
, myIsFirst( true )
{
if ( ! mySentry ) {
mySB = NULL ;
}
}
uint8_t get()
{
int result = 0 ;
if ( mySB != NULL ) {
result = mySB->sgetc() ;
if ( result == EOF ) {
result = 0 ;
myStream.setstate( myIsFirst
? std::ios::failbit | std::ios::eofbit
: std::ios::failbit | std::ios::eofbit |
std::ios::badbit ) ;
}
}
myIsFirst = false ;
return result ;
}

private:
ixdrstream::sentry mySentry ;
ixdrstream& myStream ;
std::streambuf* mySB ;
bool myIsFirst ;
} ;

ixdrstream&
ixdrstream:perator>>(
uint32_t& dest )
{
ByteGetter source( *this ) ;
uint32_t tmp = source.get() << 24 ;
tmp |= source.get() << 16 ;
tmp |= source.get() << 8 ;
tmp |= source.get() ;
if ( *this ) {
dest = tmp ;
}
return *this ;
}

ixdrstream&
ixdrstream:perator>>(
float& dest )
{
uint32_t tmp ;
operator>>( tmp ) ;
if ( *this ) {
float f = 0.0 ;
if ( (tmp & 0x7FFFFFFF) != 0 ) {
f = ldexp( ((tmp & 0x007FFFFF) | 0x00800000),
(int)((tmp & 0x7F800000) >> 23) - 126 -
24 ) ;
}
if ( (tmp & 0x80000000) != 0 ) {
f = -f ;
}
dest = f ;
}
return *this ;
}

The above code still needs work to handle NaN's and Infinity
correctly, but it should give a good idea of what it necessary.

If you aren't concerned about machines which aren't IEEE, of
course, you can just memcpy the tmp after having read it in the
last function above, or use a reinterpret_cast to force the
types.

--
James Kanze


James Kanze
  Reply With Quote
Old 10-18-2009, 11:07 AM   #4
Rune Allnor
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On 17 Okt, 19:47, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On 17/10/09 18:39, Rune Allnor wrote:
>
> > Hi all.

>
> > I have used the method from this page,

>
> >http://www.cplusplus.com/reference/i.../istream/read/

>
> > to read some binary data from a file to a char[] buffer.

>
> > The 4 first characters constitute the binary encoding of
> > a float type number. What is the better way to transfer
> > the chars to a float variable?

>
> > The naive C way would be to use memcopy. Is there a
> > better C++ way?

>
> This is the correct way since memcpy() allows you to copy unaligned data
> into an aligned object.
>
> Another way is to read data directly into the aligned object:
>
> * * *float f;
> * * *stream.read(reinterpret_cast<char*>(&f), sizeof f);


The naive

std::vector<float> v;
for (n=0;n<N;++n)
{
file.read(reinterpret_cast<char*>(&f), sizeof f);
v.push_back(v);
}

doesn't work as expected. Do I need to call 'seekg'
inbetween?

Rune


Rune Allnor
  Reply With Quote
Old 10-18-2009, 11:26 AM   #5
Alf P. Steinbach
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
* Rune Allnor:
> On 17 Okt, 19:47, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> wrote:
>> On 17/10/09 18:39, Rune Allnor wrote:
>>
>>> Hi all.
>>> I have used the method from this page,
>>> http://www.cplusplus.com/reference/i.../istream/read/
>>> to read some binary data from a file to a char[] buffer.
>>> The 4 first characters constitute the binary encoding of
>>> a float type number. What is the better way to transfer
>>> the chars to a float variable?
>>> The naive C way would be to use memcopy. Is there a
>>> better C++ way?

>> This is the correct way since memcpy() allows you to copy unaligned data
>> into an aligned object.
>>
>> Another way is to read data directly into the aligned object:
>>
>> float f;
>> stream.read(reinterpret_cast<char*>(&f), sizeof f);

>
> The naive
>
> std::vector<float> v;
> for (n=0;n<N;++n)
> {
> file.read(reinterpret_cast<char*>(&f), sizeof f);
> v.push_back(v);
> }
>
> doesn't work as expected. Do I need to call 'seekg'
> inbetween?


post complete code

cheers & hth

- alf


Alf P. Steinbach
  Reply With Quote
Old 10-18-2009, 11:42 AM   #6
Rune Allnor
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On 18 Okt, 12:26, "Alf P. Steinbach" <al...@start.no> wrote:
> * Rune Allnor:
>
>
>
>
>
> > On 17 Okt, 19:47, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> > wrote:
> >> On 17/10/09 18:39, Rune Allnor wrote:

>
> >>> Hi all.
> >>> I have used the method from this page,
> >>>http://www.cplusplus.com/reference/i.../istream/read/
> >>> to read some binary data from a file to a char[] buffer.
> >>> The 4 first characters constitute the binary encoding of
> >>> a float type number. What is the better way to transfer
> >>> the chars to a float variable?
> >>> The naive C way would be to use memcopy. Is there a
> >>> better C++ way?
> >> This is the correct way since memcpy() allows you to copy unaligned data
> >> into an aligned object.

>
> >> Another way is to read data directly into the aligned object:

>
> >> * * *float f;
> >> * * *stream.read(reinterpret_cast<char*>(&f), sizeof f);

>
> > The naive

>
> > std::vector<float> v;
> > for (n=0;n<N;++n)
> > {
> > * *file.read(reinterpret_cast<char*>(&f), sizeof f);
> > * *v.push_back(v);
> > }

>
> > doesn't work as expected. Do I need to call 'seekg'
> > inbetween?

>
> post complete code


Never mind. The project was compiled in 'release mode'
with every optimization flag I could find set to 11.
No reason to expect the source code to have anything
whatsoever to do with what actually goes on.

Once I switched back to debug mode, I was able to
track the progress.

Rune


Rune Allnor
  Reply With Quote
Old 10-18-2009, 12:13 PM   #7
Maxim Yegorushkin
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On 18/10/09 10:10, James Kanze wrote:
> On Oct 17, 7:47 pm, Maxim Yegorushkin<maxim.yegorush...@gmail.com>
> wrote:
>> On 17/10/09 18:39, Rune Allnor wrote:

>
>>> I have used the method from this page,

>
>>> http://www.cplusplus.com/reference/i.../istream/read/

>
>>> to read some binary data from a file to a char[] buffer.

>
>>> The 4 first characters constitute the binary encoding of
>>> a float type number. What is the better way to transfer
>>> the chars to a float variable?

>
>>> The naive C way would be to use memcopy. Is there a
>>> better C++ way?

>
>> This is the correct way since memcpy() allows you to copy
>> unaligned data into an aligned object.

>
>> Another way is to read data directly into the aligned object:

>
>> float f;
>> stream.read(reinterpret_cast<char*>(&f), sizeof f);

>
> Neither, of course, work, except in very limited cases.


The assumption was that the float was written by the same program or a
program with a compatible binary API. Is that the case you meant in
"except in very limited cases"?

--
Max


Maxim Yegorushkin
  Reply With Quote
Old 10-19-2009, 10:58 AM   #8
James Kanze
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On Oct 18, 12:13 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On 18/10/09 10:10, James Kanze wrote:
> > On Oct 17, 7:47 pm, Maxim Yegorushkin<maxim.yegorush...@gmail.com>
> > wrote:
> >> On 17/10/09 18:39, Rune Allnor wrote:


> >>> I have used the method from this page,


> >>>http://www.cplusplus.com/reference/i.../istream/read/


> >>> to read some binary data from a file to a char[] buffer.


> >>> The 4 first characters constitute the binary encoding of
> >>> a float type number. What is the better way to transfer
> >>> the chars to a float variable?


> >>> The naive C way would be to use memcopy. Is there a
> >>> better C++ way?


> >> This is the correct way since memcpy() allows you to copy
> >> unaligned data into an aligned object.


> >> Another way is to read data directly into the aligned object:


> >> float f;
> >> stream.read(reinterpret_cast<char*>(&f), sizeof f);


> > Neither, of course, work, except in very limited cases.


> The assumption was that the float was written by the same
> program or a program with a compatible binary API. Is that the
> case you meant in "except in very limited cases"?


More or less. Formally, there's no guarantee that the
compatible binary API works, but in practice, it almost
certainly will.

Note, however, that most systems today support several
incompatible binary API's; which one the compiler uses depends
on the version and the options used for compiling. In practice,
it's not something you can count on except for very short lived
data: I wouldn't hesitate about using it for spilling temporary
data to disk, to be reread later by the same process. I can
imagine that it's quite acceptable as well if you have one
program collecting data during e.g. a week, and another
processing all of the data in batch over the week-end, provided
that both programs were compiled with the same compiler, using
the same options. Beyond that, I'd have my doubts (having been
bit with the problem more than once in the past). As a general
rule, it's better to define a format, and match it. (Even if I
were using a memory dump, I'd first "define" the format, just
ensuring that the definition was compatible to the in memory
image. That way, if worse comes to worse, at least a
maintenance programmer will know what to expect, and will have a
chance at making it work.)

--
James Kanze


James Kanze
  Reply With Quote
Old 10-23-2009, 09:07 AM   #9
Jorgen Grahn
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars todesired type
On Mon, 2009-10-19, James Kanze wrote:
> On Oct 18, 12:13 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> wrote:

....
>> The assumption was that the float was written by the same
>> program or a program with a compatible binary API. Is that the
>> case you meant in "except in very limited cases"?

>
> More or less. Formally, there's no guarantee that the
> compatible binary API works, but in practice, it almost
> certainly will.
>
> Note, however, that most systems today support several
> incompatible binary API's; which one the compiler uses depends
> on the version and the options used for compiling. In practice,
> it's not something you can count on except for very short lived
> data: I wouldn't hesitate about using it for spilling temporary
> data to disk, to be reread later by the same process. I can
> imagine that it's quite acceptable as well if you have one
> program collecting data during e.g. a week, and another
> processing all of the data in batch over the week-end, provided
> that both programs were compiled with the same compiler, using
> the same options. Beyond that, I'd have my doubts (having been
> bit with the problem more than once in the past). As a general
> rule, it's better to define a format, and match it. (Even if I
> were using a memory dump, I'd first "define" the format, just
> ensuring that the definition was compatible to the in memory
> image. That way, if worse comes to worse, at least a
> maintenance programmer will know what to expect, and will have a
> chance at making it work.)


But if you have a choice, it's IMO almost always better to write the
data as text, compressing it first using something like gzip if I/O or
disk space is an issue.

(Loss of precision when printing decimal floats could be a problem in
this case though ...)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .


Jorgen Grahn
  Reply With Quote
Old 10-23-2009, 10:27 AM   #10
James Kanze
 
Posts: n/a
Default Re: Binary file IO: Converting imported sequences of chars to desiredtype
On Oct 23, 9:07 am, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> On Mon, 2009-10-19, James Kanze wrote:
> > On Oct 18, 12:13 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> > wrote:

> ...
> >> The assumption was that the float was written by the same
> >> program or a program with a compatible binary API. Is that
> >> the case you meant in "except in very limited cases"?


> > More or less. Formally, there's no guarantee that the
> > compatible binary API works, but in practice, it almost
> > certainly will.


> > Note, however, that most systems today support several
> > incompatible binary API's; which one the compiler uses
> > depends on the version and the options used for compiling.
> > In practice, it's not something you can count on except for
> > very short lived data: I wouldn't hesitate about using it
> > for spilling temporary data to disk, to be reread later by
> > the same process. I can imagine that it's quite acceptable
> > as well if you have one program collecting data during e.g.
> > a week, and another processing all of the data in batch over
> > the week-end, provided that both programs were compiled with
> > the same compiler, using the same options. Beyond that, I'd
> > have my doubts (having been bit with the problem more than
> > once in the past). As a general rule, it's better to define
> > a format, and match it. (Even if I were using a memory
> > dump, I'd first "define" the format, just ensuring that the
> > definition was compatible to the in memory image. That way,
> > if worse comes to worse, at least a maintenance programmer
> > will know what to expect, and will have a chance at making
> > it work.)


> But if you have a choice, it's IMO almost always better to
> write the data as text, compressing it first using something
> like gzip if I/O or disk space is an issue.


Totally agreed. Especially for the maintenance programmer, who
can see at a glance what is being written.

> (Loss of precision when printing decimal floats could be a
> problem in this case though ...)


It's a hard problem in general. If writing and reading to
internal formats with the same precision, it's sufficient to
output enough digits. If you don't know the precision of the
reader, however, you don't really know how many digits to output
when writing.

--
James Kanze


James Kanze
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting In Binary Raymond A+ Certification 13 03-07-2004 07:28 PM
HD-DVD and DVD's future Phil Riker DVD Video 68 09-28-2003 09:32 PM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46