Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Reading Words from File (http://www.velocityreviews.com/forums/t439667-reading-words-from-file.html)

dough 10-04-2005 06:39 PM

Reading Words from File
 
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:

In the face of criticism from the left and right, President Bush
insisted Tuesday that Harriet Miers is the nation's best-qualified
candidate for the Supreme Court and assured skeptical conservatives
that his lawyer...

I could get an input to a char *s such that s = "In" and then i do
something with s, then s = "the" and then i do something with that,
etc. With no idea the length of any string or line or whitespace.

Heres what I have so far.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void process(char *s) /* whats here is not really important *
{
printf("%s", s);
}

int main() {

char buffer[80];
FILE *f = fopen("readme.txt", "r");
char *s;

while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
{
while( sscanf(buffer, "%s", s) ) /* scans for words in line */
{
process(s); /* do stuff to the words */
}
}

fclose(f);
return 0;

}

Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.


Alexei A. Frounze 10-04-2005 07:19 PM

Re: Reading Words from File
 
"dough" <vicluo@gmail.com> wrote in message
news:1128451179.310759.89890@f14g2000cwb.googlegro ups.com...
> I want to read in lines from a file and then seperate the words so i
> can do a process on each of the words. Say the text file "readme.txt"
> contains the following:
>
> In the face of criticism from the left and right, President Bush
> insisted Tuesday that Harriet Miers is the nation's best-qualified
> candidate for the Supreme Court and assured skeptical conservatives
> that his lawyer...
>
> I could get an input to a char *s such that s = "In" and then i do
> something with s, then s = "the" and then i do something with that,
> etc. With no idea the length of any string or line or whitespace.


I don't want to be harsh, but it seems to me the 2nd paragraph is off topic
and unwise for a poster looking for help...

Alex



Walter Roberson 10-04-2005 07:30 PM

Re: Reading Words from File
 
In article <1128451179.310759.89890@f14g2000cwb.googlegroups. com>,
dough <vicluo@gmail.com> wrote:
:I want to read in lines from a file and then seperate the words so i
:can do a process on each of the words.

There is often a non-trivial semantic problem in deciding what
a "word" is in such matters. For example, in

"Oh!," he yelled (into his Hello-Kitty phone.)

then if you go by whitespace you get "words" such as

"Oh!," and (into and phone.) and Hello-Kitty

which is usually not the breakdown you want.
--
These .signatures are sold by volume, and not by weight.

Eric Sosman 10-04-2005 07:45 PM

Re: Reading Words from File
 


dough wrote On 10/04/05 14:39,:
> I want to read in lines from a file and then seperate the words so i
> can do a process on each of the words. Say the text file "readme.txt"
> contains the following:
>
> In the face of criticism from the left and right, President Bush
> insisted Tuesday that Harriet Miers is the nation's best-qualified
> candidate for the Supreme Court and assured skeptical conservatives
> that his lawyer...
>
> I could get an input to a char *s such that s = "In" and then i do
> something with s, then s = "the" and then i do something with that,
> etc. With no idea the length of any string or line or whitespace.
>
> Heres what I have so far.
>
> #include <ctype.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> void process(char *s) /* whats here is not really important *
> {
> printf("%s", s);
> }
>
> int main() {
>
> char buffer[80];
> FILE *f = fopen("readme.txt", "r");
> char *s;


It would be a good idea to test `f == NULL' before
proceeding ...

> while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
> {
> while( sscanf(buffer, "%s", s) ) /* scans for words in line */


Here's a problem: `s' doesn't point to anything, so
when scanf() locates a word and tries to copy it to the
memory `s' points at, all manner of mischief can ensue.

> {
> process(s); /* do stuff to the words */
> }
> }
>
> fclose(f);
> return 0;
>
> }



> Also, is there anyway to adjust the size of the buffer or reallocate
> the memory so it doesn't overflow and get a seg error.


If you used malloc() to create the space for `buffer', you
could use realloc() to enlarge it. But the immediate problem
is not the size of `buffer', but the uninitialized `s'.

Your overall task sounds like a job for the much-maligned
strtok() function. However, see Walter Roberson's post for
some of the pitfalls of using simple string-bashing to separate
"words" from their surroundings.

--
Eric.Sosman@sun.com


Christopher Benson-Manica 10-04-2005 07:50 PM

Re: Reading Words from File
 
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:

> There is often a non-trivial semantic problem in deciding what
> a "word" is in such matters. For example, in


> "Oh!," he yelled (into his Hello-Kitty phone.)


I must say that that is a truly bizarre example sentence :-) That
aside, it seems to me that assuming a "word" is a sequence of
consecutive alpha characters would yield better results, at least
depending on what OP wants to do with the "words" once he has them.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Hemanth 10-04-2005 07:58 PM

Re: Reading Words from File
 
dough wrote:
> I want to read in lines from a file and then seperate the words so i
> can do a process on each of the words.



.......use strtok() function to split a string into words (use
whitespace or any other separator you want)


> char buffer[80];
> FILE *f = fopen("readme.txt", "r");
> while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
>
> Also, is there anyway to adjust the size of the buffer or reallocate
> the memory so it doesn't overflow and get a seg error.



........the fgets statement reads until num-1 characters are read (in
this case 79) or a newline or EOF is reached (whichever happens first).
So I don't think you need a realloc in this case.


HTH,
Hemanth


Michael Mair 10-04-2005 08:27 PM

Re: Reading Words from File
 
dough wrote:
> I want to read in lines from a file and then seperate the words so i
> can do a process on each of the words. Say the text file "readme.txt"
> contains the following:
>
> In the face of criticism from the left and right, President Bush
> insisted Tuesday that Harriet Miers is the nation's best-qualified
> candidate for the Supreme Court and assured skeptical conservatives
> that his lawyer...
>
> I could get an input to a char *s such that s = "In" and then i do
> something with s, then s = "the" and then i do something with that,
> etc. With no idea the length of any string or line or whitespace.


I am not sure what your problem is.
When you have a problem, please help us help you:
State what you want to achieve (this part seems clear) and
what about your solution did not work.
Otherwise, everyone tells you about A because you seemed to
ask for B while meaning C...

>
> Heres what I have so far.
>
> #include <ctype.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> void process(char *s) /* whats here is not really important *
> {
> printf("%s", s);
> }
>
> int main() {
>
> char buffer[80];
> FILE *f = fopen("readme.txt", "r");
> char *s;


Check whether f is != NULL. If you omitted the check for
brevity, then write a comment.

> while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
> {
> while( sscanf(buffer, "%s", s) ) /* scans for words in line */
> {
> process(s); /* do stuff to the words */
> }
> }


Okay, so what is the problem here? About everything:
1) you may inadvertently separate a word if your buffer is not
long enough (uncritical)
2) You scan always from the same position (buffer is effectively &buffer[0])
3) You read your string into memory pointed to by an unitialized pointer.

Consider
char s[sizeof buffer] = "", *tmp = NULL;
while (....)
{
tmp = buffer;
while ( sscanf(tmp, "%s", s) )
{
process(s);
tmp += strlen(s);
}
/* a */
}
This solves 2) and 3).
Another solution is the use of strtok() etc.

If you check at point "a" whether buffer[strlen(buffer)-1]=='\n',
then you can also detect instances of 1).
However, this may not be what you are looking for (see below)

>
> fclose(f);
> return 0;
>
> }
>
> Also, is there anyway to adjust the size of the buffer or reallocate
> the memory so it doesn't overflow and get a seg error.


realloc() helps you do that.
Have a look at the comp.lang.c archives to see how to use it.

If you do not need the words in context, you also use getc() which
may be clearer:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define START_BUFSIZE 20


void process(const char *s);
int resize_buffer (char **buf, size_t *len);


int main (void)
{
FILE *f;
char *s = NULL;
size_t length = 0;
int input;

if (NULL == (f = fopen("readme.txt", "r")))
{
fprintf(stderr, "Cannot open file\n");
exit(EXIT_FAILURE);
}
if (NULL == (s = malloc((START_BUFSIZE+1) * sizeof *s)))
{
fprintf(stderr, "Error on allocating memory for s\n");
fclose(f);
exit(EXIT_FAILURE);
}
length = START_BUFSIZE;

do /* ... while (input != EOF) */
{
size_t curr = 0;

/* Read up to the first whitespace */
while (!isspace(input = getc(f)) && input != EOF)
{
s[curr++] = input;
if (curr == length)
{
if (resize_buffer(&s, &length))
{
/* perform error handling */
break;
}
}
}
/* Make s a string */
s[curr] = '\0';

if (curr)
process(s);

/* Read up to the first non-whitespace */
while ((input = getc(f)) != EOF)
{
putchar('*');
if (!isspace(input))
{
ungetc(input, f);
break;
}
}
} while (input != EOF);

free(s);
fclose(f);

putchar('\n');

return 0;
}

void process(const char *s) /* whats here is not really important */
{
printf("%s", s); fflush (stdout);
}

int resize_buffer (char **buf, size_t *len)
{
/* Using mybuf and mylen for readability */
char *mybuf = *buf;
size_t mylen = *len;

char *tmp;
size_t destlen = 2*mylen+1;

/* A */
if (NULL == (tmp = realloc(mybuf, destlen)))
{
return 1;
}
mybuf = tmp;
mylen = destlen - 1;

/* write back to parameters */
*buf = mybuf;
*len = mylen;

return 0;
}


Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Walter Roberson 10-04-2005 08:49 PM

Re: Reading Words from File
 
In article <dhumdl$j2o$1@chessie.cirr.com>,
Christopher Benson-Manica <ataru@nospam.cyberspace.org> wrote:
>Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:


>> There is often a non-trivial semantic problem in deciding what
>> a "word" is in such matters.


>aside, it seems to me that assuming a "word" is a sequence of
>consecutive alpha characters would yield better results, at least
>depending on what OP wants to do with the "words" once he has them.


Using "alpha" as the boundary definition runs into difficulties
with possessives, contractions, joined-words, and words such as
re-enter in which the dash indicates seperation of vowels that
would otherwise form a diapthong. It would likely also run
into problems with Mr. Salutation, and abbreviations such as etc.
in which the period is really part of the word.
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson

Eric Sosman 10-04-2005 08:54 PM

Re: Reading Words from File
 


Christopher Benson-Manica wrote On 10/04/05 15:50,:
> Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
>
>
>>There is often a non-trivial semantic problem in deciding what
>>a "word" is in such matters. For example, in

>
>
>> "Oh!," he yelled (into his Hello-Kitty phone.)

>
>
> I must say that that is a truly bizarre example sentence :-) That
> aside, it seems to me that assuming a "word" is a sequence of
> consecutive alpha characters would yield better results, at least
> depending on what OP wants to do with the "words" once he has them.


This is a reasonable 1st approximation, but its tend-
ency to generate non-words (e.g., "st") isn't desirable.

--
Eric.Sosman@sun.com



Barry 10-04-2005 08:56 PM

Re: Reading Words from File
 

"dough" <vicluo@gmail.com> wrote in message
news:1128451179.310759.89890@f14g2000cwb.googlegro ups.com...
> I want to read in lines from a file and then seperate the words so i
> can do a process on each of the words. Say the text file "readme.txt"
> contains the following:
>
> In the face of criticism from the left and right, President Bush
> insisted Tuesday that Harriet Miers is the nation's best-qualified
> candidate for the Supreme Court and assured skeptical conservatives
> that his lawyer...
>
> I could get an input to a char *s such that s = "In" and then i do
> something with s, then s = "the" and then i do something with that,
> etc. With no idea the length of any string or line or whitespace.
>
> Heres what I have so far.
>
> #include <ctype.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> void process(char *s) /* whats here is not really important *
> {
> printf("%s", s);
> }
>
> int main() {
>
> char buffer[80];
> FILE *f = fopen("readme.txt", "r");
> char *s;
>
> while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
> {
> while( sscanf(buffer, "%s", s) ) /* scans for words in line */
> {
> process(s); /* do stuff to the words */
> }
> }
>
> fclose(f);
> return 0;
>
> }
>
> Also, is there anyway to adjust the size of the buffer or reallocate
> the memory so it doesn't overflow and get a seg error.
>


"process" is a terrible name for a function in any context.

Barry




All times are GMT. The time now is 04:45 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.