Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Reading whole text files

Reply
Thread Tools

Reading whole text files

 
 
Michael Mair
Guest
Posts: n/a
 
      02-10-2005
Cheerio,


I would appreciate opinions on the following:

Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem -- the character
input is a different matter, at least if I want to remain within
the bounds of the standard library.

Essentially, I can think of three variants:
- Low: Use fgetc(). Simple, straightforward, probably inefficient.
- Default: Use fgets(); ugly, if we are not interested in lines
and have many newline characters to read.
- Interesting: fscanf("%"XSTR(BUFLEN)"c%n", curr, &read), where
XSTR(BUFLEN) gives me BUFLEN in a string literal.

From the labels, it is pretty obvious that I would favour the
last one, so there is the question about possible pitfalls
(yes, I will use the return value and "read") and whether there
are environmental limits for BUFLEN.


If I missed some obvious source (looking for the wrong sort of
stuff in the FAQ and google archives), then please point me
toward it


Regards,
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
 
Reply With Quote
 
 
 
 
infobahn
Guest
Posts: n/a
 
      02-10-2005
Michael Mair wrote:
>
> Cheerio,
>
> I would appreciate opinions on the following:
>
> Given the task to read a _complete_ text file into a string:
> What is the "best" way to do it?
> Handling the buffer is not the problem -- the character
> input is a different matter, at least if I want to remain within
> the bounds of the standard library.
>
> Essentially, I can think of three variants:
> - Low: Use fgetc(). Simple, straightforward, probably inefficient.


Why inefficient? I'd prefer getc in case you're fortunate enough
to have it implemented as a macro, but it should be efficient
enough.

> - Default: Use fgets(); ugly, if we are not interested in lines
> and have many newline characters to read.


And you have to maintain /two/ buffers (quite apart from the buffer
maintained by your text stream handler) - your expanding buffer,
and the buffer you give to fgets (unless you use the expanding
buffer for that too, which is certainly doable but probably gives
you more headaches).

> - Interesting: fscanf("%"XSTR(BUFLEN)"c%n", curr, &read), where
> XSTR(BUFLEN) gives me BUFLEN in a string literal.
>
> From the labels, it is pretty obvious that I would favour the
> last one,


Fine, so use that. But it wouldn't be my choice.

Vive la difference!
 
Reply With Quote
 
 
 
 
Michael Mair
Guest
Posts: n/a
 
      02-10-2005
infobahn wrote:
> Michael Mair wrote:
>
>>Cheerio,
>>
>>I would appreciate opinions on the following:
>>
>>Given the task to read a _complete_ text file into a string:
>>What is the "best" way to do it?
>>Handling the buffer is not the problem -- the character
>>input is a different matter, at least if I want to remain within
>>the bounds of the standard library.
>>
>>Essentially, I can think of three variants:
>>- Low: Use fgetc(). Simple, straightforward, probably inefficient.

>
> Why inefficient? I'd prefer getc in case you're fortunate enough
> to have it implemented as a macro, but it should be efficient
> enough.


"Probably" inefficient in that I cannot rely on getc() being
implemented as a macro and that I do not want to make assumptions
about the underlying library. So, essentially, the question is
for me whether having a loop in my code is "better" than just
telling fscanf() to get, say 8K characters in one go.
The main beauty of this approach lies for me in the clarity of the
code. Thanks for reminding me of getc() vs. fgetc().

>>- Default: Use fgets(); ugly, if we are not interested in lines
>> and have many newline characters to read.

>
> And you have to maintain /two/ buffers (quite apart from the buffer
> maintained by your text stream handler) - your expanding buffer,
> and the buffer you give to fgets (unless you use the expanding
> buffer for that too, which is certainly doable but probably gives
> you more headaches).


Actually, I have implemented it first with fgets() and one extending
buffer but found, looking at the final code, that approach too unwieldy
and error prone, as you need more code and variables.
Usually, I would have gone for the "Low" approach due to the clarity
of the resulting code but -- as I was at it -- I just asked myself
which options do I have.


>>- Interesting: fscanf("%"XSTR(BUFLEN)"c%n", curr, &read), where
>> XSTR(BUFLEN) gives me BUFLEN in a string literal.
>>
>> From the labels, it is pretty obvious that I would favour the
>>last one,

>
> Fine, so use that. But it wouldn't be my choice.


I _was_ asking for opinions.


> Vive la difference!



Thank you for your input!


Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      02-10-2005
Michael Mair wrote:
> Cheerio,
>
>
> I would appreciate opinions on the following:
>
> Given the task to read a _complete_ text file into a string:
> What is the "best" way to do it?
> Handling the buffer is not the problem -- the character
> input is a different matter, at least if I want to remain within
> the bounds of the standard library.
>
> Essentially, I can think of three variants:
> - Low: Use fgetc(). Simple, straightforward, probably inefficient.
> - Default: Use fgets(); ugly, if we are not interested in lines
> and have many newline characters to read.
> - Interesting: fscanf("%"XSTR(BUFLEN)"c%n", curr, &read), where
> XSTR(BUFLEN) gives me BUFLEN in a string literal.
>
> From the labels, it is pretty obvious that I would favour the
> last one, so there is the question about possible pitfalls
> (yes, I will use the return value and "read") and whether there
> are environmental limits for BUFLEN.
>
>
> If I missed some obvious source (looking for the wrong sort of
> stuff in the FAQ and google archives), then please point me
> toward it
>
>
> Regards,
> Michael


What about this?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *ReadFileIntoRam(char *fname,int *plen)
{
FILE *infile;
char *contents;
int actualBytesRead=0;
unsigned int len;

infile = fopen(fname,"rb");
if (infile == NULL) {
fprintf(stderr,"impossible to open %s\n",fname);
return NULL;
}
fseek(infile,0,SEEK_END);
len = ftell(infile);
fseek(infile,0,SEEK_SET);
contents = calloc(len+1,1);
if (contents) {
actualBytesRead = fread(contents,1,len,infile);
}
else {
fprintf(stderr,"Can't allocate memory to read the file\n");
}
fclose(infile);
*plen = actualBytesRead;
return contents;
}

int main(int argc,char *argv[])
{
if (argc < 2) {
printf("usage: readfile <filename>\n");
exit(1);
}
int len=0;
char *contents=ReadFileIntoRam(argv[1],&len);
// work with the contents of the file
}
 
Reply With Quote
 
Michael Mair
Guest
Posts: n/a
 
      02-10-2005


jacob navia wrote:
> Michael Mair wrote:
>
>> Cheerio,
>>
>>
>> I would appreciate opinions on the following:
>>
>> Given the task to read a _complete_ text file into a string:
>> What is the "best" way to do it?
>> Handling the buffer is not the problem -- the character
>> input is a different matter, at least if I want to remain within
>> the bounds of the standard library.
>>
>> Essentially, I can think of three variants:
>> - Low: Use fgetc(). Simple, straightforward, probably inefficient.
>> - Default: Use fgets(); ugly, if we are not interested in lines
>> and have many newline characters to read.
>> - Interesting: fscanf("%"XSTR(BUFLEN)"c%n", curr, &read), where
>> XSTR(BUFLEN) gives me BUFLEN in a string literal.
>>
>> From the labels, it is pretty obvious that I would favour the
>> last one, so there is the question about possible pitfalls
>> (yes, I will use the return value and "read") and whether there
>> are environmental limits for BUFLEN.
>>
>>
>> If I missed some obvious source (looking for the wrong sort of
>> stuff in the FAQ and google archives), then please point me
>> toward it
>>
>>
>> Regards,
>> Michael

>
>
> What about this?
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> char *ReadFileIntoRam(char *fname,int *plen)
> {
> FILE *infile;
> char *contents;
> int actualBytesRead=0;
> unsigned int len;
>
> infile = fopen(fname,"rb");


Here is the crux: I want/have to work with a _text_ file.
Everything else may give me wrong results.

> if (infile == NULL) {
> fprintf(stderr,"impossible to open %s\n",fname);
> return NULL;
> }
> fseek(infile,0,SEEK_END);
> len = ftell(infile);
> fseek(infile,0,SEEK_SET);
> contents = calloc(len+1,1);
> if (contents) {
> actualBytesRead = fread(contents,1,len,infile);


This is what I would do for binary files.
Essentially, I am looking for the text file equivalent of fread().

> }
> else {
> fprintf(stderr,"Can't allocate memory to read the file\n");
> }
> fclose(infile);
> *plen = actualBytesRead;
> return contents;
> }
>
> int main(int argc,char *argv[])
> {
> if (argc < 2) {
> printf("usage: readfile <filename>\n");
> exit(1);
> }
> int len=0;
> char *contents=ReadFileIntoRam(argv[1],&len);
> // work with the contents of the file
> }


Thank you for trying


Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

 
Reply With Quote
 
S.Tobias
Guest
Posts: n/a
 
      02-10-2005
infobahn <> wrote:
> Michael Mair wrote:
> >
> > Cheerio,
> >
> > I would appreciate opinions on the following:
> >
> > Given the task to read a _complete_ text file into a string:
> > What is the "best" way to do it?
> > Handling the buffer is not the problem -- the character
> > input is a different matter, at least if I want to remain within
> > the bounds of the standard library.
> >
> > Essentially, I can think of three variants:
> > - Low: Use fgetc(). Simple, straightforward, probably inefficient.


> Why inefficient? I'd prefer getc in case you're fortunate enough
> to have it implemented as a macro, but it should be efficient
> enough.


In thread-safe libraries getc() family functions can actually
be quite inefficient, because they must lock the stream object,
which takes time. This is the reason why some systems provide
getc_unlocked() (thread-unsafe) family (I remember a noticeable
difference between them in my tests some time ago).

+++

Excuse my ignorance, I have no experience with text files in
the C Std context. Why wouldn't fread() be suitable for
reading text files? In 7.19.8p2 it says the fread() call is
performed as if by use of fgetc() function in the bottom.
I haven't spotted any mention where these functions would be
constrained to binary streams only.

--
Stan Tobias
mailx `echo LID | sed s/[[:upper:]]//g`
 
Reply With Quote
 
Michael Mair
Guest
Posts: n/a
 
      02-10-2005


S.Tobias wrote:
> infobahn <> wrote:
>
>>Michael Mair wrote:
>>
>>>Cheerio,
>>>
>>>I would appreciate opinions on the following:
>>>
>>>Given the task to read a _complete_ text file into a string:
>>>What is the "best" way to do it?
>>>Handling the buffer is not the problem -- the character
>>>input is a different matter, at least if I want to remain within
>>>the bounds of the standard library.
>>>
>>>Essentially, I can think of three variants:
>>>- Low: Use fgetc(). Simple, straightforward, probably inefficient.

>
>
>>Why inefficient? I'd prefer getc in case you're fortunate enough
>>to have it implemented as a macro, but it should be efficient
>>enough.

>
>
> In thread-safe libraries getc() family functions can actually
> be quite inefficient, because they must lock the stream object,
> which takes time. This is the reason why some systems provide
> getc_unlocked() (thread-unsafe) family (I remember a noticeable
> difference between them in my tests some time ago).


Interesting.

> +++
>
> Excuse my ignorance, I have no experience with text files in
> the C Std context. Why wouldn't fread() be suitable for
> reading text files? In 7.19.8p2 it says the fread() call is
> performed as if by use of fgetc() function in the bottom.
> I haven't spotted any mention where these functions would be
> constrained to binary streams only.


It seems I am plain stupid... Somewhere in my brain, there was
"fread()/fwrite() <-> binary I/O" hardwired :-/
So, if I open the stream as text stream, everything should be
fine. (If this is wrong, please correct me.)
Moreover, if I read the data into dynamically allocated
storage pointed to by an unsigned char *, I circumvent potential
problems with the is** functions from <ctype.h> (as I asked in
another thread).

Thank you


Cheers
Michael
--
E-Mail: Mine is a gmx dot de address.

 
Reply With Quote
 
Michael Mair
Guest
Posts: n/a
 
      02-10-2005


Michael Mair wrote:
>
>
> jacob navia wrote:
>
>> Michael Mair wrote:
>>
>>> Cheerio,
>>>
>>>
>>> I would appreciate opinions on the following:
>>>
>>> Given the task to read a _complete_ text file into a string:
>>> What is the "best" way to do it?
>>> Handling the buffer is not the problem -- the character
>>> input is a different matter, at least if I want to remain within
>>> the bounds of the standard library.
>>>
>>> Essentially, I can think of three variants:
>>> - Low: Use fgetc(). Simple, straightforward, probably inefficient.
>>> - Default: Use fgets(); ugly, if we are not interested in lines
>>> and have many newline characters to read.
>>> - Interesting: fscanf("%"XSTR(BUFLEN)"c%n", curr, &read), where
>>> XSTR(BUFLEN) gives me BUFLEN in a string literal.
>>>
>>> From the labels, it is pretty obvious that I would favour the
>>> last one, so there is the question about possible pitfalls
>>> (yes, I will use the return value and "read") and whether there
>>> are environmental limits for BUFLEN.
>>>
>>>
>>> If I missed some obvious source (looking for the wrong sort of
>>> stuff in the FAQ and google archives), then please point me
>>> toward it
>>>
>>>
>>> Regards,
>>> Michael

>>
>>
>>
>> What about this?
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> char *ReadFileIntoRam(char *fname,int *plen)
>> {
>> FILE *infile;
>> char *contents;
>> int actualBytesRead=0;
>> unsigned int len;
>>
>> infile = fopen(fname,"rb");

>
>
> Here is the crux: I want/have to work with a _text_ file.
> Everything else may give me wrong results.


Sorry, the "b" brought me back onto the wrong track I already
was on. See the other subthread.


Cheers
Michael
>
>> if (infile == NULL) {
>> fprintf(stderr,"impossible to open %s\n",fname);
>> return NULL;
>> }
>> fseek(infile,0,SEEK_END);
>> len = ftell(infile);
>> fseek(infile,0,SEEK_SET);
>> contents = calloc(len+1,1);
>> if (contents) {
>> actualBytesRead = fread(contents,1,len,infile);

>
>
> This is what I would do for binary files.
> Essentially, I am looking for the text file equivalent of fread().
>
>> }
>> else {
>> fprintf(stderr,"Can't allocate memory to read the file\n");
>> }
>> fclose(infile);
>> *plen = actualBytesRead;
>> return contents;
>> }
>>
>> int main(int argc,char *argv[])
>> {
>> if (argc < 2) {
>> printf("usage: readfile <filename>\n");
>> exit(1);
>> }
>> int len=0;
>> char *contents=ReadFileIntoRam(argv[1],&len);
>> // work with the contents of the file
>> }

>
>
> Thank you for trying
>
>
> Cheers
> Michael



--
E-Mail: Mine is a gmx dot de address.

 
Reply With Quote
 
SM Ryan
Guest
Posts: n/a
 
      02-10-2005
Michael Mair <> wrote:
# Cheerio,
#
#
# I would appreciate opinions on the following:
#
# Given the task to read a _complete_ text file into a string:
# What is the "best" way to do it?
# Handling the buffer is not the problem -- the character
# input is a different matter, at least if I want to remain within
# the bounds of the standard library.
#
# Essentially, I can think of three variants:
# - Low: Use fgetc(). Simple, straightforward, probably inefficient.

char *contents=0; int m=0,n=0,ch;
while ((ch=fgetc(file))!=EOF) {
if (n+2>=m) {m = 2*n+2; contents = realloc(contents,m);}
contents[n++] = ch; contents[n] = 0;
}
contents = realloc(contents,n+1);

You might also include #ifdef/#endif code to use memory mapping on systems
that support it.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
This is one wacky game show.
 
Reply With Quote
 
Al Bowers
Guest
Posts: n/a
 
      02-10-2005


Michael Mair wrote:

>>> Given the task to read a _complete_ text file into a string:
>>> What is the "best" way to do it?
>>> Handling the buffer is not the problem -- the character
>>> input is a different matter, at least if I want to remain within
>>> the bounds of the standard library.
>>>
>>> Essentially, I can think of three variants:
>>> - Low: Use fgetc(). Simple, straightforward, probably inefficient.

>>
>>
>> Why inefficient? I'd prefer getc in case you're fortunate enough
>> to have it implemented as a macro, but it should be efficient
>> enough.

>
>
> "Probably" inefficient in that I cannot rely on getc() being
> implemented as a macro and that I do not want to make assumptions
> about the underlying library. So, essentially, the question is
> for me whether having a loop in my code is "better" than just
> telling fscanf() to get, say 8K characters in one go.
> The main beauty of this approach lies for me in the clarity of the
> code. Thanks for reminding me of getc() vs. fgetc().
>
>>> - Default: Use fgets(); ugly, if we are not interested in lines
>>> and have many newline characters to read.

>>


My intuition is the the definition of a "_complete_" text file
would require the "ugly". Hence, I would use function fgets in
a loop.

>>
>> And you have to maintain /two/ buffers (quite apart from the buffer
>> maintained by your text stream handler) - your expanding buffer,
>> and the buffer you give to fgets (unless you use the expanding
>> buffer for that too, which is certainly doable but probably gives
>> you more headaches).

>
>
> Actually, I have implemented it first with fgets() and one extending
> buffer but found, looking at the final code, that approach too unwieldy
> and error prone, as you need more code and variables.


Use fgets to copy into a buffer. And, then append to a
expanding dynamically allocated char array. This is not unwieldy.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
char buffer[128],*fstr, *tmp;
size_t slen, blen;
FILE *fp;

if((fp = fopen("test.c","r")) == NULL) exit(EXIT_FAILURE);
for(slen = 0, fstr = NULL;
(fgets(buffer,sizeof buffer, fp)) ; slen+=blen)
{
blen = strlen(buffer);
if((tmp = realloc(fstr,slen+blen+1)) == NULL)
{
free(fstr);
exit(EXIT_FAILURE);
}
if(slen == 0) *tmp = '\0';
fstr = tmp;
strcat(fstr,buffer);
}
fclose(fp);
puts(fstr);
free(fstr);
return 0;
}


--
Al Bowers
Tampa, Fl USA
mailto: (remove the x to send email)
http://www.geocities.com/abowers822/

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regexing a file's contents without reading the whole thing? Roger Pack Ruby 3 12-02-2009 01:33 AM
reading from fifo with no writers blocks the whole process Tomasz Wrobel Ruby 1 04-30-2009 05:35 PM
*WITHOUT* using: ValidateRequest="False" for the whole page (or my whole site).... \A_Michigan_User\ ASP .Net 2 08-21-2006 02:13 PM
SAX - is there an equivalent to the DOM .nodeTypedValue for reading the whole node data at once? jimmyfishbean@yahoo.co.uk XML 2 09-13-2005 02:19 PM
reading a whole file? markspace C++ 3 05-24-2004 06:59 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57