Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Storing/processing binary file input help needed

Reply
Thread Tools

Storing/processing binary file input help needed

 
 
Arnold
Guest
Posts: n/a
 
      01-06-2004
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.



 
Reply With Quote
 
 
 
 
Kevin Goodsell
Guest
Posts: n/a
 
      01-06-2004
[Cross-post to comp.lang.c++ removed. If you want a C answer, ask here.
If you want a C++ answer, ask there. Don't ask in both places. C and C++
are two very different languages. The best solution in one may not even
be valid in the other.]

Arnold wrote:

> I need to read a binary file and store it into a buffer in memory (system
> has large amount of RAM, 2GB+) then pass it to a function. The function
> accepts input as 32 bit unsigned longs (DWORD).


You should leave the Microsoftisms at the door when you ask a question
here. We discuss standard, portable C in this group. We know what
unsigned long is. We don't know or care about DWORD.

> I can pass a max of 512
> words to it at a time. So I would pass them in chunks of 512 words until the
> whole file has been processed. I haven't worked with binary files before so
> I'm confused with how to store the binary file into memory. What sort of
> array do I use? Does C allow char only? Can I declare a DWORD buffer since
> that's what the function is taking as input?


You can declare your buffer basically any way you want, but the
functions for reading will always read a sequence of chars. The problem
with declaring the buffer as something other than char[] is that it
results in basically reinterpreting the raw bits, and the result may be
incorrect or even illegal (resulting in undefined behavior - possibly a
program crash) if the format of the file doesn't match the exact layout
that the C implementation uses for the type (unsigned long in this case).

Basically, you are talking about allowing the C implementation to
dictate the file format. Not only is this a bad idea, but it sounds like
it's backward in your case - the file format is already defined.

The correct, portable way to read a binary file is almost always to read
it as raw bytes, then convert the raw bytes according to the format of
the file. So if your file is made up of 4-byte unsigned values, stored
most-significant-byte first, you could do something like this:

#define FIELD_BYTES 4

unsigned char buf[FIELD_BYTES];
unsigned long value = 0
size_t i;

fread(buf, FIELD_BYTES, 1, fp);
for (i=0; i<FIELD_BYTES; ++i)
{
value = (value << CHAR_BIT) | buf[i];
}

You could also handle more than one value at a time, with a little more
work.

> Or do I need to know the format
> of the original data that binary file is encoding and store it in that?


Not sure what you mean by that. Of course you need to know the format of
the file, and write the code accordingly. You can't wave a magic wand
and make your code handle files in an unknown format.

> That's the part that is really confusing me.
>
> I believe I'll need to used fread to copy the file to that array. I plan on
> getting the size of file, then determining how many DWORD are present in it
> (for example 9000) and use that my number of object parameter in fread. So
> in this case:
>
> fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
> file
>
> Is that right?


It's a possible starting point. It's certainly not a complete, portable
solution.

>
> Once I get the file into the buffer, I can then do a loop where I pass 512
> elements of the array to a function until all 9000 elements are processed. I
> hope that's right. Any other tips on improving speed and efficiency would be
> appreciated. Thanks.


My main tip for improving speed and efficiency is don't even try to.
Write simple, correct code first. Only worry about making it faster if
it's determined to be too slow, and then profile to determine where the
time is being lost so you can target optimizing effort appropriately.

In particular, if you are only able to handle 512 elements at a time, I
wouldn't bother reading more than that from the file each iteration.
There's probably no need to read the entire file into memory, and it
would probably be more complicated. On the other hand, reading larger
blocks (and thus minimizing I/O function calls) /might/ improve
execution speed, but don't worry about that until it's time (as
described above).

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.
 
Reply With Quote
 
 
 
 
Martijn Lievaart
Guest
Posts: n/a
 
      01-06-2004
On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:

> Once I get the file into the buffer, I can then do a loop where I pass 512
> elements of the array to a function until all 9000 elements are processed. I
> hope that's right. Any other tips on improving speed and efficiency would be
> appreciated. Thanks.


As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.

You can write back results in place, if they should occupy the same
storage, ro to some other file. If the data has to be replaced, it is
often best to write the output to a new file, then move the new file over
the old file. That way you will not corrupt the original file if your
program crashes half way through.

HTH,
M4

 
Reply With Quote
 
sathyashrayan
Guest
Posts: n/a
 
      01-06-2004
"Arnold" <(E-Mail Removed)> wrote in message news:<g4uKb.395$(E-Mail Removed) >...

I am not a C wizard but I have some suggestions.

> I need to read a binary file and store it into a buffer in memory (system
> has large amount of RAM, 2GB+) then pass it to a function. The function
> accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
> words to it at a time. So I would pass them in chunks of 512 words until the
> whole file has been processed.


By the term "words" means to say that it is a chunk of chars and a
delimiters with an ASCII space? Or each "words" size is 512 bytes?

> I'm confused with how to store the binary file into memory. What sort of
> array do I use? Does C allow char only? Can I declare a DWORD buffer since
> that's what the function is taking as input? Or do I need to know the format
> of the original data that binary file is encoding and store it in that?
> That's the part that is really confusing me.


By the term binary file and file format are you talking about the
first two letters in a file according to the DOS assembly language
(example MZ in .exe file) or the format of data present in a file
(fields and record with a kind of delimiter). If it is the second then
it is more related with the file's record design concept.

>
> I believe I'll need to used fread to copy the file to that array. I plan on
> getting the size of file, then determining how many DWORD are present in it
> (for example 9000) and use that my number of object parameter in fread. So
> in this case:
>
> fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
> file
>
> Is that right?



Just 512 elements or unknown during the run time? Is not the time to
take up with linked list rather than using array data type?



> Once I get the file into the buffer, I can then do a loop where I pass 512
> elements of the array to a function until all 9000 elements are processed. I
> hope that's right. Any other tips on improving speed and efficiency would be
> appreciated. Thanks.


Optimizing in C is not a kind of "instructions management" like in
asm.
 
Reply With Quote
 
Arnold
Guest
Posts: n/a
 
      01-06-2004

"Martijn Lievaart" <(E-Mail Removed)> wrote in message
news(E-Mail Removed) rt.rtij.nl...
> On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
>
> > Once I get the file into the buffer, I can then do a loop where I pass

512
> > elements of the array to a function until all 9000 elements are

processed. I
> > hope that's right. Any other tips on improving speed and efficiency

would be
> > appreciated. Thanks.

>
> As an alternative to the mmap solution from Glanni, the easiest way to do
> this would be to read 512 words, process them, write back result, repaet
> until end-of-file. No need to read the whole file in memory.


I thought of that but speed is a concern so I want to keep the number of
disk accesses at a minimum.

>
> You can write back results in place, if they should occupy the same
> storage, ro to some other file. If the data has to be replaced, it is
> often best to write the output to a new file, then move the new file over
> the old file. That way you will not corrupt the original file if your
> program crashes half way through.
>


In my case, I don't have to write any data back to the original file. Thanks
for the suggestions.
> HTH,
> M4
>



 
Reply With Quote
 
Arnold
Guest
Posts: n/a
 
      01-06-2004

"sathyashrayan" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
> "Arnold" <(E-Mail Removed)> wrote in message

news:<g4uKb.395$(E-Mail Removed) >...
>
> I am not a C wizard but I have some suggestions.
>
> > I need to read a binary file and store it into a buffer in memory

(system
> > has large amount of RAM, 2GB+) then pass it to a function. The function
> > accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
> > words to it at a time. So I would pass them in chunks of 512 words until

the
> > whole file has been processed.

>
> By the term "words" means to say that it is a chunk of chars and a
> delimiters with an ASCII space? Or each "words" size is 512 bytes?


Each word is a DWORD, so each one is 32 bits. I can pass a maximum of 512
DWORDs at a time to the function.


> > I'm confused with how to store the binary file into memory. What sort of
> > array do I use? Does C allow char only? Can I declare a DWORD buffer

since
> > that's what the function is taking as input? Or do I need to know the

format
> > of the original data that binary file is encoding and store it in that?
> > That's the part that is really confusing me.

>
> By the term binary file and file format are you talking about the
> first two letters in a file according to the DOS assembly language
> (example MZ in .exe file) or the format of data present in a file
> (fields and record with a kind of delimiter). If it is the second then
> it is more related with the file's record design concept.


It is the second.

>
> >
> > I believe I'll need to used fread to copy the file to that array. I plan

on
> > getting the size of file, then determining how many DWORD are present in

it
> > (for example 9000) and use that my number of object parameter in fread.

So
> > in this case:
> >
> > fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my

binary
> > file
> >
> > Is that right?

>
>
> Just 512 elements or unknown during the run time? Is not the time to
> take up with linked list rather than using array data type?
>


512 is the maximum the function can handle at a time so that is fixed,
except for the last iteration though as the file won't have a multiple of
512 number of DWORDs.

>
>
> > Once I get the file into the buffer, I can then do a loop where I pass

512
> > elements of the array to a function until all 9000 elements are

processed. I
> > hope that's right. Any other tips on improving speed and efficiency

would be
> > appreciated. Thanks.

>
> Optimizing in C is not a kind of "instructions management" like in
> asm.



 
Reply With Quote
 
Sean Kenwrick
Guest
Posts: n/a
 
      01-06-2004

"Arnold" <(E-Mail Removed)> wrote in message
news:g4uKb.395$(E-Mail Removed). ..
> I need to read a binary file and store it into a buffer in memory (system
> has large amount of RAM, 2GB+) then pass it to a function. The function
> accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
> words to it at a time. So I would pass them in chunks of 512 words until

the
> whole file has been processed. I haven't worked with binary files before

so
> I'm confused with how to store the binary file into memory. What sort of
> array do I use? Does C allow char only? Can I declare a DWORD buffer since
> that's what the function is taking as input? Or do I need to know the

format
> of the original data that binary file is encoding and store it in that?
> That's the part that is really confusing me.
>
> I believe I'll need to used fread to copy the file to that array. I plan

on
> getting the size of file, then determining how many DWORD are present in

it
> (for example 9000) and use that my number of object parameter in fread. So
> in this case:
>
> fread(buffer, 4,9000,fp); file://each DWORD is 4 bytes, 900 DWORDs in my

binary
> file
>
> Is that right?
>


You don't need to read the whole file, you can read 512 bytes at a time into
a buffer of appropriate size:

char buffer[512];
x=fread(buffer,512 1, fp); // don't forget to check the value of x (which
is the number of bytes actually read)
...

You can then pass a pointer to this buffer to you function which has been
prototyped to accept an
array of DWORD, and the number of elements to process (which will be x/4
from the fread above)
e.g.

int process_buf(DWORD *my_array, int number_of_elements);

Then you function can iterate across this array as follows:

int process_buff(DWORD * my_array,int no_elements)
{
int i;
DWORD next_val;
for(i=0;i<no_elements;i++){
next_val=my_array[i]; // You might need to convert from
big-endian to little-endian here (see below)
}

}


Of course this makes an assumption that the data in the file is stored in
the same byte order as the processor you are running your program on (most
likely you are using an Intel Pentium so Little-Endian is the byte order you
are assuming). If the file uses another byte order then you can write
(or google for) a macro that will do the conversion for you..

Hope this helps
Sean





 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      01-06-2004
On Tue, 06 Jan 2004 08:10:52 GMT, "Arnold" <(E-Mail Removed)>
wrote:

>I need to read a binary file and store it into a buffer in memory (system
>has large amount of RAM, 2GB+) then pass it to a function. The function
>accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
>words to it at a time. So I would pass them in chunks of 512 words until the
>whole file has been processed. I haven't worked with binary files before so
>I'm confused with how to store the binary file into memory. What sort of
>array do I use? Does C allow char only? Can I declare a DWORD buffer since
>that's what the function is taking as input? Or do I need to know the format
>of the original data that binary file is encoding and store it in that?
>That's the part that is really confusing me.


The I/O function (fread as you suggest below) does not care how you
define the buffer. However, how you use the buffer may make a
difference. If you define the buffer as unsigned char, then you are
guaranteed that all possible 256 values are acceptable (unsigned char
cannot have trap values) and the buffer will be portable (at least for
systems which have CHAR_BIT defined as . If you define the buffer
as DWORD, are you sure that all 4 billion plus possible values that
could come from a binary file are acceptable and your program will
never execute on a machine with a different sizeof(unsigned long)?

>
>I believe I'll need to used fread to copy the file to that array. I plan on
>getting the size of file, then determining how many DWORD are present in it
>(for example 9000) and use that my number of object parameter in fread. So
>in this case:


There is no portable way to get the file size (unless you read the
entire file) so you probably need to use a system specific extension
or function for this.

>
>fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
>file


You meant 9000.

>
>Is that right?
>
>Once I get the file into the buffer, I can then do a loop where I pass 512
>elements of the array to a function until all 9000 elements are processed. I
>hope that's right. Any other tips on improving speed and efficiency would be
>appreciated. Thanks.


How you pass a quantity of array elements will determine the
suitability of your design. (Actually, the method of passing the
argument(s) should drive the design.) What is the prototype for the
receiving function?

The odds on the file containing an exact multiple of 512 DWORDs is
about 1 in 500 so you may want to be able to handle the last set as a
smaller quantity.



<<Remove the del for email>>
 
Reply With Quote
 
Jack Klein
Guest
Posts: n/a
 
      01-07-2004
On Tue, 06 Jan 2004 03:41:14 -0500, Michael B Allen
<(E-Mail Removed)> wrote in comp.lang.c:

> On Tue, 06 Jan 2004 03:10:52 -0500, Arnold wrote:
>
> > I need to read a binary file and store it into a buffer in memory
> > (system has large amount of RAM, 2GB+) then pass it to a function. The
> > function accepts input as 32 bit unsigned longs (DWORD). I can pass a
> > max of 512 words to it at a time. So I would pass them in chunks of 512
> > words until the whole file has been processed. I haven't worked with
> > binary files before so I'm confused with how to store the binary file
> > into memory.

>
> The term "binary file" is a bit of a misnomer. It just means it's not
> text. Otherwise *everything* is "binary".
>
> > What sort of array do I use? Does C allow char only? Can I
> > declare a DWORD buffer since that's what the function is taking as
> > input? Or do I need to know the format of the original data that binary
> > file is encoding and store it in that? That's the part that is really
> > confusing me.

>
> Pretend for a minute that you have a really big array in memory:
>
> struct mystruct {
> int foo;
> char bar[10];
> float zap;
> }
> ...
> struct mystruct *s = malloc(100000 * sizeof(struct mystruct));


Are you new in comp.lang.c? Everybody here by now should know the clc
preferred idiom:

struct mystruct *s = malloc(100000 * sizeof *s);

....and the magic number is anathema, of course, so:

#define NUM_STRUCTS 100000

struct mystruct *s = malloc(STRUCTS * sizeof *s);

> populate(s);
>
> If you write this array to a file you have a "binary file". Now you could
> do the reverse and read in your array from the file. At least you can
> on the same machine. If you write the file on an a litte-endian i386 and
> read it in on a big-endian Sparc you're going to have endianness problems.
>
> Mike
>
> PS: This question didn't warrant cross-posting to two different news
> groups. Please refrain from doing that. Some people will simply not
> answer your question when they see that.


Why not? The fread() function is part of the standard C++ library as
well, so the post is topical there, and two is certainly not an
excessive number of groups for a cross-post.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
 
Reply With Quote
 
Martijn Lievaart
Guest
Posts: n/a
 
      01-07-2004
On Tue, 06 Jan 2004 18:01:29 +0000, Arnold wrote:

>
> "Martijn Lievaart" <(E-Mail Removed)> wrote in message
> news(E-Mail Removed) rt.rtij.nl...
>> On Tue, 06 Jan 2004 08:10:52 +0000, Arnold wrote:
>>
>> > Once I get the file into the buffer, I can then do a loop where I pass

> 512
>> > elements of the array to a function until all 9000 elements are

> processed. I
>> > hope that's right. Any other tips on improving speed and efficiency

> would be
>> > appreciated. Thanks.

>>
>> As an alternative to the mmap solution from Glanni, the easiest way to do
>> this would be to read 512 words, process them, write back result, repaet
>> until end-of-file. No need to read the whole file in memory.

>
> I thought of that but speed is a concern so I want to keep the number of
> disk accesses at a minimum.


Memory mapping the file is probably still the best way, but suffers of a
size limit. To get around this, you can also read in large chunks of the
file. Instead of 512 words, read a few 100KB at the time and operate on
that. Experiment with buffer sizes to see what gives the best result.

I'm not sure what will be faster. Large buffers reduce the number of
system calls slightly (good), but decrease locality of reference (bad).
The mmap solution does not suffer either of these disadvantages I think.

Note that the number of disk accesses will be the same whatever solution
you chose. You have to read the whole file, period. I guess the main speed
factors are the number of system calls and how effectively you use your
memory. Also, you should try to do some useful work while waiting for the
disk, maybe asynchronous I/O or multithreading can be of help?

(If you look into multithreading, be sure you know what synchronisation
machisms are lightweight and which are heavyweight, huge difference).

I would just try a simple solution. If it isn't fast enough, try others.
Profile to see where your program spends its time. If most of the time is
spend on calculations, all of the above will give only very marginal
speedups. If run on a fast machine, maybe a naive implementation will be
fast enough for your needs. Remember the old truism about optimizing:
Don't (until you have proven you need it).

HTH,
M4

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie: working with binary files/extract png from a binary file Jim Ruby 6 12-24-2013 08:09 AM
writing binary file (ios::binary) Ron Eggler C++ 9 04-28-2008 08:20 AM
Storing/processing binary file input help needed Arnold C++ 7 01-07-2004 08:10 AM
Re: Storing/processing binary file input help needed Gianni Mariani C++ 0 01-06-2004 09:10 AM
binary file input with cin Clemens Park C++ 5 09-21-2003 07:37 AM



Advertisments