Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Binary file I/O (http://www.velocityreviews.com/forums/t276016-binary-file-i-o.html)

J. Campbell 07-24-2003 11:56 PM

Binary file I/O
 
OK...I'm in the process of learning C++. In my old (non-portable)
programming days, I made use of binary files a lot...not worrying
about endian issues. I'm starting to understand why C++ makes it
difficult to read/write an integer directly as a bit-stream to a file.
However, I'm at a bit of a loss for how to do the following. So as
not to obfuscate the issue, I won't show what I've been attempting ;-)

What I want to do is the following, using the standare IO streams.

1) open an arbitrary file (file1).
2) starting with the first byte in (file1), read a chunk of data into
an array of integers.
3) manipulate the array, as integer data, and then output the contents
of the array to another file (file2).
4) read the next data-chunk from file1 into the array.
5) goto 3 until end of file.

If anyone knows of a tutorial that contains concrete examples of this,
I'd appreciate a pointer to the info. Thanks

Jonathan Mcdougall 07-25-2003 12:39 AM

Re: Binary file I/O
 
On 24 Jul 2003 16:56:53 -0700, mango_maniac@yahoo.com (J. Campbell)
wrote:

>OK...I'm in the process of learning C++. In my old (non-portable)
>programming days, I made use of binary files a lot...not worrying
>about endian issues. I'm starting to understand why C++ makes it
>difficult to read/write an integer directly as a bit-stream to a file.
> However, I'm at a bit of a loss for how to do the following. So as
>not to obfuscate the issue, I won't show what I've been attempting ;-)
>
>What I want to do is the following, using the standare IO streams.


# include <fstream>
# include <iostream>
# include <vector>
# include <sstream>
# include <string>
# include <algorithm>


>1) open an arbitrary file (file1).


std::ifstream file1("f.txt");

>2) starting with the first byte in (file1), read a chunk of data into
>an array of integers.


const int CHUNK = 128;

char buffer[CHUNK];
file1.read(buffer, CHUNK);

std::vector<int> data;
std::copy(buffer, buffer + 128, std::back_inserter(data));

>3) manipulate the array, as integer data,


void manipulate(std::vector<int> &v);


manipulate(data);

>and then output the contents
>of the array to another file (file2).


std::ofstream file2("g.txt");;
std::copy(data.begin(), data.end(),
std::ostream_iterator<int>(std::cout, "\n"));

>4) read the next data-chunk from file1 into the array.
>5) goto 3 until end of file.


goto 3; :)

>If anyone knows of a tutorial that contains concrete examples of this,
>I'd appreciate a pointer to the info. Thanks


The C++ Standard Library by Josuttis.

Jonathan


Jonathan Mcdougall 07-25-2003 07:19 AM

Re: Binary file I/O
 
On Thu, 24 Jul 2003 20:39:23 -0400, Jonathan Mcdougall
<DELjonathanmcdougall@yahoo.ca> wrote:

>On 24 Jul 2003 16:56:53 -0700, mango_maniac@yahoo.com (J. Campbell)
>wrote:
>
>>OK...I'm in the process of learning C++. In my old (non-portable)
>>programming days, I made use of binary files a lot...not worrying
>>about endian issues. I'm starting to understand why C++ makes it
>>difficult to read/write an integer directly as a bit-stream to a file.
>> However, I'm at a bit of a loss for how to do the following. So as
>>not to obfuscate the issue, I won't show what I've been attempting ;-)
>>
>>What I want to do is the following, using the standare IO streams.

>
># include <fstream>
># include <vector>
># include <algorithm>


Forget these ones :

># include <sstream>
># include <iostream>
># include <string>
>
>
>>1) open an arbitrary file (file1).

>
>std::ifstream file1("f.txt");
>
>>2) starting with the first byte in (file1), read a chunk of data into
>>an array of integers.

>
>const int CHUNK = 128;
>
>char buffer[CHUNK];
>file1.read(buffer, CHUNK);
>
>std::vector<int> data;
>std::copy(buffer, buffer + 128, std::back_inserter(data));


std::copy(buffer, buffer + CHUNK, std::back_inserter(data));

>
>>3) manipulate the array, as integer data,

>
>void manipulate(std::vector<int> &v);
>
>
>manipulate(data);
>
>>and then output the contents
>>of the array to another file (file2).

>
>std::ofstream file2("g.txt");;
>std::copy(data.begin(), data.end(),
> std::ostream_iterator<int>(std::cout, "\n"));


std::copy(data.begin(), data.end(),
std::ostream_iterator<int>(file2, "\n"));


Sorry about that,

Jonathan


J. Campbell 07-25-2003 03:38 PM

Re: Binary file I/O
 
Thanks Jonathan.

Your response is most helpful. Now, I need to digest why it works,
and why it's necessarry.

I want to clairify a few things. Assuming int is 32-bits, then,
after:
-----
const int CHUNK = 128;

char buffer[CHUNK];
file1.read(buffer, CHUNK);
------
at this point the char array, "buffer" contains 128 elements of 1-byte
each, right?

-----
std::vector<int> data;
std::copy(buffer, buffer + 128, std::back_inserter(data));
-----
now, the vector named "data" contains 32 elements, each of which is a
4-byte integer, right?

How do I know if the bytes that went into the vector integers went in
head-first or feet-first? in other words, if the first 4 bytes of the
file were (HEX):
FF 00 00 00
will the first int in the vector "data" be FF000000 (dec 4278190080)
or will it be 000000FF (dec 255)? Or is it machine dependent?

can I avoid all the "std::" by using "using namespace std;" or is it
necessary to scope-resolve all the keywords?

Another thing... Do you think it's better to read chunks of a file as
I've indicated, or is it better to load the whole file into memory?

Also, your method leaves 2-duplicates of the data in memory...one as
the char array, and once as the vector. is this a problem?

One more thing...I asked a question here recently:

http://groups.google.com/groups?hl=e...gle.com&rnum=1

about accessing a char array as an array of int. How is the vector
method different/safer than the (unsafe & non-portable) method I
demonstrated in the earlier post.

thanks again for the help.

I don't seem to be able to quit typing;-) Sorry to innundate you with
so many questions...I realize that you may not choose to address them
all..

Rolf Magnus 07-25-2003 04:19 PM

Re: Binary file I/O
 
Thomas Matthews wrote:

> To nitpick, the constant should be "unsigned" since a quantity can't
> be negative. i.e.
> const unsigned int CHUNK_SIZE = 128;


I'd disagree. It should be signed, since you might have negative offsets
when accessing the array elements, and mixing signed and unsigned
arithmetic can be problematic, and some compilers warn if you do.
Besides, what would you really gain from making it unsigned?

>> std::vector<int> data;
>> std::copy(buffer, buffer + 128, std::back_inserter(data));
>> -----
>> now, the vector named "data" contains 32 elements, each of which is a
>> 4-byte integer, right?

> A 4-byte _signed_ integer.


Yes, as int is by default signed.

>> How do I know if the bytes that went into the vector integers went in
>> head-first or feet-first? in other words, if the first 4 bytes of
>> the file were (HEX):
>> FF 00 00 00
>> will the first int in the vector "data" be FF000000 (dec 4278190080)
>> or will it be 000000FF (dec 255)? Or is it machine dependent?

> It is machine dependent. The topic is called Endianism.


I've only seen it be called Enidaness.

> Try this experiment:
> const unsigned int endian_test = 0x01020304;
> unsigned char byte0;
> unsigned char byte1;
> unsigned char byte2;
> unsigned char byte3;
> unsigned char * ptr = (unsigned char *) &endian_test;
> byte0 = *ptr++;
> byte1 = *ptr++;
> byte2 = *ptr++;
> byte3 = *ptr++;
> cout << hex << (unsigned short) byte0 << endl;
> cout << hex << (unsigned short) byte1 << endl;
> cout << hex << (unsigned short) byte2 << endl;
> cout << hex << (unsigned short) byte3 << endl;
>
>>
>> can I avoid all the "std::" by using "using namespace std;" or is it
>> necessary to scope-resolve all the keywords?

> This is a personal, style, issue. Here are some popular styles:
> 1. Declare each function and class with a separate "using" statement:
> using std::cout;
> using std::vector;
> 2. Use the global "using" statement:
> using namespace std;
> 3. Prefix each function and class with its namespace:
> std::cout << "hello" << std::endl;
> There are different opinions on which to use. Use a search engine
> and search this newsgroup for "namespace" and "using".


At least, most people seem to agree that it's a bad idea to put
something like this in a header.
Btw: you can also put using into functions.

>> Another thing... Do you think it's better to read chunks of a file
>> as I've indicated, or is it better to load the whole file into
>> memory?

> If you have the space, read in the whole file; otherwise read it
> in as chunks. The fewer reads, the faster the execution.


Not necessarily. If you need maximum speed, you should test it for
different block sizes.



Jonathan Mcdougall 07-25-2003 06:02 PM

Re: Binary file I/O
 
On 25 Jul 2003 08:38:23 -0700, mango_maniac@yahoo.com (J. Campbell)
wrote:

>Thanks Jonathan.
>
>Your response is most helpful. Now, I need to digest why it works,
>and why it's necessarry.
>
>I want to clairify a few things. Assuming int is 32-bits, then,
>after:


You can't "assume" this, it depends on the platform. Anyways it does
not matter in this case.

>-----
>const int CHUNK = 128;
>
>char buffer[CHUNK];
>file1.read(buffer, CHUNK);
>------
>at this point the char array, "buffer" contains 128 elements of 1-byte
>each, right?


Yes.

>-----
>std::vector<int> data;
>std::copy(buffer, buffer + 128, std::back_inserter(data));
>-----
>now, the vector named "data" contains 32 elements, each of which is a
>4-byte integer, right?


No, 'data' contains 128 elements of type int. Each element has a size
of sizeof(int), which *could* be 4 bytes.

data[0]

contains the value which was in

buffer[0]

For example, if the first byte in the file was 65, then buffer[0]
contains char(65) (which is 'A') and data[0] simply contains 65.

>can I avoid all the "std::" by using "using namespace std;" or is it
>necessary to scope-resolve all the keywords?


Yes, but I personnaly not recommend it. I prefer to qualify
everything, but it is a matter of style (and carefulness).

>Another thing... Do you think it's better to read chunks of a file as
>I've indicated, or is it better to load the whole file into memory?


Depends on the file size and the memory available.

>Also, your method leaves 2-duplicates of the data in memory...one as
>the char array, and once as the vector. is this a problem?


Well you explicitly wanted an array of integers and since there is no
function which takes an int[], I needed to do a conversion.

>One more thing...I asked a question here recently:
>
>http://groups.google.com/groups?hl=e...gle.com&rnum=1
>
>about accessing a char array as an array of int. How is the vector
>method different/safer than the (unsafe & non-portable) method I
>demonstrated in the earlier post.


Variable-length arrays are, afaik, illegal in C++ anyways. Take a
look at that :

http://www.btinternet.com/~chrisnewton/pp/contarray.xml


Jonathan


Jonathan Mcdougall 07-25-2003 06:05 PM

Re: Binary file I/O
 
>> -----
>> std::vector<int> data;
>> std::copy(buffer, buffer + 128, std::back_inserter(data));
>> -----
>> now, the vector named "data" contains 32 elements, each of which is a
>> 4-byte integer, right?

>A 4-byte _signed_ integer.


I just want to remind you that 'data' contains *128* elements, not 32
and that the endianness discussion does not apply.

<snip>

Jonathan


J. Campbell 07-25-2003 08:22 PM

Re: Binary file I/O
 
Jonathan,

I just tried out your method, and it leaves me scratching my head.
After stumbling briefly for lack of the header to define
back_inserter() and ostream_iterator() (thanks Google and SGI), the
code compiles fine:
__________code__________________

#include <fstream>
#include <vector>
#include <iterator>

using namespace std;

int main(){
const int DATACHUNK = 20;
char buffer[DATACHUNK];

ifstream filein("shifttest.cpp");
filein.read(buffer, DATACHUNK);

vector<int> filedata;
copy(buffer, buffer + DATACHUNK, back_inserter(filedata));

ofstream fileout("shifttest.joe");
copy(filedata.begin(), filedata.end(),
ostream_iterator<int>(fileout, "\n" ));
}

_____end code_________________

However, when I look at the file out, it contains:

35
105
110
99
108
117
100
101
32
60
105
111
115
116
114
101
97
109
62
10

which is the ASCII representation of the integer representation of the
ASCII sequence "#include <iostream>"

which, strangely enough, happens to be the first line of
"shifttest.cpp" ;-)

This is really not at all what I am wanting to do. Now my 20 bytes is
represented by 93 bytes of a rather odd data-type...neither characters
nor integers, but rather some strange beast that combines the worst of
both worlds.

I'm left wondering, in this strange new world of C++ do I need to get
used to dealing with ASCII representations of numbers for file I/O?
Or do I need to always break my 4-byte integers into individual bytes
prior to I/O if I don't want to waste storage space? I suppose this
would be pretty easy...something like:

//not tested
int bytetowrite;
char holdword[4];

for(int i = 0; i < 4; i++)
holdword[i] = (bytetowrite & (255 << (i * 8))) >> (i * 8);
//holdword now contains, small-byte first, the data from bytetowrite

However, this seems a bit tedious, considering that this rigamarole
doesn't really do anything to the internal data. I feel like there's
something really basic that I don't *get* about streams... All I
really want to do is "get at" the data in a file and treat that data
as numbers typed to the native processor word size...then, manipulate
the data and write the data out to a second file. Consider, for
example, that the file consists of a binary bitmap and I want to
invert it, or rotate it or something.

Anyway...It's apparent that I have a lot to learn. This C++ is
tantalizing me...the code is about 10 to 20 x faster than my old
16-bit compiler...but jeez...what would seem to be a simple
manipulation can become so frustrating!!! It feels a little like
typing with my toes.

Thanks for the help people. It is beginning to make some sense.

Joe

Jonathan Mcdougall <DELjonathanmcdougall@yahoo.ca> wrote in message news:<kbm1ivk0hojcea81ik83bsb54dk1qn1n31@4ax.com>. ..
> On Thu, 24 Jul 2003 20:39:23 -0400, Jonathan Mcdougall
> <DELjonathanmcdougall@yahoo.ca> wrote:
>
> >On 24 Jul 2003 16:56:53 -0700, mango_maniac@yahoo.com (J. Campbell)
> >wrote:
> >
> >>OK...I'm in the process of learning C++. In my old (non-portable)
> >>programming days, I made use of binary files a lot...not worrying
> >>about endian issues. I'm starting to understand why C++ makes it
> >>difficult to read/write an integer directly as a bit-stream to a file.
> >> However, I'm at a bit of a loss for how to do the following. So as
> >>not to obfuscate the issue, I won't show what I've been attempting ;-)
> >>
> >>What I want to do is the following, using the standare IO streams.

> >
> ># include <fstream>
> ># include <vector>
> ># include <algorithm>

>
> Forget these ones :
>
> ># include <sstream>
> ># include <iostream>
> ># include <string>
> >
> >
> >>1) open an arbitrary file (file1).

> >
> >std::ifstream file1("f.txt");
> >
> >>2) starting with the first byte in (file1), read a chunk of data into
> >>an array of integers.

> >
> >const int CHUNK = 128;
> >
> >char buffer[CHUNK];
> >file1.read(buffer, CHUNK);
> >
> >std::vector<int> data;
> >std::copy(buffer, buffer + 128, std::back_inserter(data));

>
> std::copy(buffer, buffer + CHUNK, std::back_inserter(data));
>
> >
> >>3) manipulate the array, as integer data,

> >
> >void manipulate(std::vector<int> &v);
> >
> >
> >manipulate(data);
> >
> >>and then output the contents
> >>of the array to another file (file2).

> >
> >std::ofstream file2("g.txt");;
> >std::copy(data.begin(), data.end(),
> > std::ostream_iterator<int>(std::cout, "\n"));

>
> std::copy(data.begin(), data.end(),
> std::ostream_iterator<int>(file2, "\n"));
>
>
> Sorry about that,
>
> Jonathan


J. Campbell 07-25-2003 09:45 PM

Re: Binary file I/O
 
Jonathan Mcdougall <DELjonathanmcdougall@yahoo.ca> wrote in message news:<k7s2ivcqlegcm0eg21n9ho9hchlukc1su5@4ax.com>. ..
> >> -----
> >> std::vector<int> data;
> >> std::copy(buffer, buffer + 128, std::back_inserter(data));
> >> -----
> >> now, the vector named "data" contains 32 elements, each of which is a
> >> 4-byte integer, right?

> >A 4-byte _signed_ integer.

>
> I just want to remind you that 'data' contains *128* elements, not 32
> and that the endianness discussion does not apply.
>
> <snip>
>
> Jonathan


Jonathan...I now understand what's going on and the endianness
discussion. My news reader has serious lag, so I may not be current
with the discussion. However...I understand more after this post.
when I said I wanted the file bytes represented by integers, I meant
that I wanted the first ((char)/sizeof(int)) (eg 4) bytes of data to
be put into integerarray[0], the next into integerarray[1]...etc.
Anyway...thanks for clairifying this.

Jonathan Mcdougall 07-25-2003 09:46 PM

Re: Binary file I/O
 
>I just tried out your method, and it leaves me scratching my head.
>After stumbling briefly for lack of the header to define
>back_inserter() and ostream_iterator() (thanks Google and SGI), the
>code compiles fine:


This depends on the implementation. The standard does not specify
which header must be included by which; <iterator> probably got
included by <algorithm>, sorry about that.

>__________code__________________
>
>#include <fstream>
>#include <vector>
>#include <iterator>
>
>using namespace std;
>
>int main(){
> const int DATACHUNK = 20;
> char buffer[DATACHUNK];
>
> ifstream filein("shifttest.cpp");
> filein.read(buffer, DATACHUNK);
>
> vector<int> filedata;
> copy(buffer, buffer + DATACHUNK, back_inserter(filedata));
>
> ofstream fileout("shifttest.joe");
> copy(filedata.begin(), filedata.end(),
> ostream_iterator<int>(fileout, "\n" ));
>}
>
>_____end code_________________
>
>However, when I look at the file out, it contains:
>
>35
>105
>110
>99
>108
>117
>100
>101
>32
>60
>105
>111
>115
>116
>114
>101
>97
>109
>62
>10
>
>which is the ASCII representation of the integer representation of the
>ASCII sequence "#include <iostream>"
>which, strangely enough, happens to be the first line of
>"shifttest.cpp" ;-)


You asked for binary, that is what I gave you. If you want the ASCII
representation, just make the ostream_iterator <char>, that's it.

>This is really not at all what I am wanting to do. Now my 20 bytes is
>represented by 93 bytes


93 ?? Why do you say that?

> of a rather odd data-type...neither characters
>nor integers, but rather some strange beast that combines the worst of
>both worlds.


These numbers you saw are the ASCII value of the characters in the
file. The thing is, characters and integers are actually the very
same thing, it's just the output which makes the difference : ints are
displayed as numbers and chars are displayed as characters, which
depend on your implementation (but you are probably using ASCII).

Remember your subject is "Binary file I/O", not "Text file I/O".

>I'm left wondering, in this strange new world of C++ do I need to get
>used to dealing with ASCII representations of numbers for file I/O?


It depends on what you want. In the case of a simple text file
(remember, *text* is a ambiguous term in programming, everything boils
down to zeros and ones) , values would be ASCII numbers and text would
be the representation on the screen (65 would be 'A').

In the case of a binary file (such as an image), values would be
simple numbers formatted according to the image's type (jpg, bmp..)
and text would be... garbage, since these numbers would be printed
according to the ASCII table (remember when you first started and
tried to display binary files on screen? Loads of smileys and beeps
and ascii graphics..).

>However, this seems a bit tedious, considering that this rigamarole
>doesn't really do anything to the internal data. I feel like there's
>something really basic that I don't *get* about streams... All I
>really want to do is "get at" the data in a file and treat that data
>as numbers typed to the native processor word size...then, manipulate
>the data and write the data out to a second file. Consider, for
>example, that the file consists of a binary bitmap and I want to
>invert it, or rotate it or something.


In that case, you would store every byte in a vector of whatever
(unsigned char would be the best, I think), you skip the header until
the data, you invert it and store the whole thing in a new file.

The actual type of the vector (or array, as you wish) does not matter
except for the memory wasted.

>Anyway...It's apparent that I have a lot to learn. This C++ is
>tantalizing me...the code is about 10 to 20 x faster than my old
>16-bit compiler...but jeez...what would seem to be a simple
>manipulation can become so frustrating!!! It feels a little like
>typing with my toes.


Hehe.. and you're still only playing with i/o.


Jonathan


All times are GMT. The time now is 09:01 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.