![]() |
Reading Words from File
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt" contains the following: In the face of criticism from the left and right, President Bush insisted Tuesday that Harriet Miers is the nation's best-qualified candidate for the Supreme Court and assured skeptical conservatives that his lawyer... I could get an input to a char *s such that s = "In" and then i do something with s, then s = "the" and then i do something with that, etc. With no idea the length of any string or line or whitespace. Heres what I have so far. #include <ctype.h> #include <stdio.h> #include <stdlib.h> #include <string.h> void process(char *s) /* whats here is not really important * { printf("%s", s); } int main() { char buffer[80]; FILE *f = fopen("readme.txt", "r"); char *s; while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */ { while( sscanf(buffer, "%s", s) ) /* scans for words in line */ { process(s); /* do stuff to the words */ } } fclose(f); return 0; } Also, is there anyway to adjust the size of the buffer or reallocate the memory so it doesn't overflow and get a seg error. |
Re: Reading Words from File
"dough" <vicluo@gmail.com> wrote in message
news:1128451179.310759.89890@f14g2000cwb.googlegro ups.com... > I want to read in lines from a file and then seperate the words so i > can do a process on each of the words. Say the text file "readme.txt" > contains the following: > > In the face of criticism from the left and right, President Bush > insisted Tuesday that Harriet Miers is the nation's best-qualified > candidate for the Supreme Court and assured skeptical conservatives > that his lawyer... > > I could get an input to a char *s such that s = "In" and then i do > something with s, then s = "the" and then i do something with that, > etc. With no idea the length of any string or line or whitespace. I don't want to be harsh, but it seems to me the 2nd paragraph is off topic and unwise for a poster looking for help... Alex |
Re: Reading Words from File
In article <1128451179.310759.89890@f14g2000cwb.googlegroups. com>,
dough <vicluo@gmail.com> wrote: :I want to read in lines from a file and then seperate the words so i :can do a process on each of the words. There is often a non-trivial semantic problem in deciding what a "word" is in such matters. For example, in "Oh!," he yelled (into his Hello-Kitty phone.) then if you go by whitespace you get "words" such as "Oh!," and (into and phone.) and Hello-Kitty which is usually not the breakdown you want. -- These .signatures are sold by volume, and not by weight. |
Re: Reading Words from File
dough wrote On 10/04/05 14:39,: > I want to read in lines from a file and then seperate the words so i > can do a process on each of the words. Say the text file "readme.txt" > contains the following: > > In the face of criticism from the left and right, President Bush > insisted Tuesday that Harriet Miers is the nation's best-qualified > candidate for the Supreme Court and assured skeptical conservatives > that his lawyer... > > I could get an input to a char *s such that s = "In" and then i do > something with s, then s = "the" and then i do something with that, > etc. With no idea the length of any string or line or whitespace. > > Heres what I have so far. > > #include <ctype.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > > void process(char *s) /* whats here is not really important * > { > printf("%s", s); > } > > int main() { > > char buffer[80]; > FILE *f = fopen("readme.txt", "r"); > char *s; It would be a good idea to test `f == NULL' before proceeding ... > while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */ > { > while( sscanf(buffer, "%s", s) ) /* scans for words in line */ Here's a problem: `s' doesn't point to anything, so when scanf() locates a word and tries to copy it to the memory `s' points at, all manner of mischief can ensue. > { > process(s); /* do stuff to the words */ > } > } > > fclose(f); > return 0; > > } > Also, is there anyway to adjust the size of the buffer or reallocate > the memory so it doesn't overflow and get a seg error. If you used malloc() to create the space for `buffer', you could use realloc() to enlarge it. But the immediate problem is not the size of `buffer', but the uninitialized `s'. Your overall task sounds like a job for the much-maligned strtok() function. However, see Walter Roberson's post for some of the pitfalls of using simple string-bashing to separate "words" from their surroundings. -- Eric.Sosman@sun.com |
Re: Reading Words from File
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
> There is often a non-trivial semantic problem in deciding what > a "word" is in such matters. For example, in > "Oh!," he yelled (into his Hello-Kitty phone.) I must say that that is a truly bizarre example sentence :-) That aside, it seems to me that assuming a "word" is a sequence of consecutive alpha characters would yield better results, at least depending on what OP wants to do with the "words" once he has them. -- Christopher Benson-Manica | I *should* know what I'm talking about - if I ataru(at)cyberspace.org | don't, I need to know. Flames welcome. |
Re: Reading Words from File
dough wrote:
> I want to read in lines from a file and then seperate the words so i > can do a process on each of the words. .......use strtok() function to split a string into words (use whitespace or any other separator you want) > char buffer[80]; > FILE *f = fopen("readme.txt", "r"); > while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */ > > Also, is there anyway to adjust the size of the buffer or reallocate > the memory so it doesn't overflow and get a seg error. ........the fgets statement reads until num-1 characters are read (in this case 79) or a newline or EOF is reached (whichever happens first). So I don't think you need a realloc in this case. HTH, Hemanth |
Re: Reading Words from File
dough wrote:
> I want to read in lines from a file and then seperate the words so i > can do a process on each of the words. Say the text file "readme.txt" > contains the following: > > In the face of criticism from the left and right, President Bush > insisted Tuesday that Harriet Miers is the nation's best-qualified > candidate for the Supreme Court and assured skeptical conservatives > that his lawyer... > > I could get an input to a char *s such that s = "In" and then i do > something with s, then s = "the" and then i do something with that, > etc. With no idea the length of any string or line or whitespace. I am not sure what your problem is. When you have a problem, please help us help you: State what you want to achieve (this part seems clear) and what about your solution did not work. Otherwise, everyone tells you about A because you seemed to ask for B while meaning C... > > Heres what I have so far. > > #include <ctype.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > > void process(char *s) /* whats here is not really important * > { > printf("%s", s); > } > > int main() { > > char buffer[80]; > FILE *f = fopen("readme.txt", "r"); > char *s; Check whether f is != NULL. If you omitted the check for brevity, then write a comment. > while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */ > { > while( sscanf(buffer, "%s", s) ) /* scans for words in line */ > { > process(s); /* do stuff to the words */ > } > } Okay, so what is the problem here? About everything: 1) you may inadvertently separate a word if your buffer is not long enough (uncritical) 2) You scan always from the same position (buffer is effectively &buffer[0]) 3) You read your string into memory pointed to by an unitialized pointer. Consider char s[sizeof buffer] = "", *tmp = NULL; while (....) { tmp = buffer; while ( sscanf(tmp, "%s", s) ) { process(s); tmp += strlen(s); } /* a */ } This solves 2) and 3). Another solution is the use of strtok() etc. If you check at point "a" whether buffer[strlen(buffer)-1]=='\n', then you can also detect instances of 1). However, this may not be what you are looking for (see below) > > fclose(f); > return 0; > > } > > Also, is there anyway to adjust the size of the buffer or reallocate > the memory so it doesn't overflow and get a seg error. realloc() helps you do that. Have a look at the comp.lang.c archives to see how to use it. If you do not need the words in context, you also use getc() which may be clearer: #include <stdio.h> #include <stdlib.h> #include <ctype.h> #define START_BUFSIZE 20 void process(const char *s); int resize_buffer (char **buf, size_t *len); int main (void) { FILE *f; char *s = NULL; size_t length = 0; int input; if (NULL == (f = fopen("readme.txt", "r"))) { fprintf(stderr, "Cannot open file\n"); exit(EXIT_FAILURE); } if (NULL == (s = malloc((START_BUFSIZE+1) * sizeof *s))) { fprintf(stderr, "Error on allocating memory for s\n"); fclose(f); exit(EXIT_FAILURE); } length = START_BUFSIZE; do /* ... while (input != EOF) */ { size_t curr = 0; /* Read up to the first whitespace */ while (!isspace(input = getc(f)) && input != EOF) { s[curr++] = input; if (curr == length) { if (resize_buffer(&s, &length)) { /* perform error handling */ break; } } } /* Make s a string */ s[curr] = '\0'; if (curr) process(s); /* Read up to the first non-whitespace */ while ((input = getc(f)) != EOF) { putchar('*'); if (!isspace(input)) { ungetc(input, f); break; } } } while (input != EOF); free(s); fclose(f); putchar('\n'); return 0; } void process(const char *s) /* whats here is not really important */ { printf("%s", s); fflush (stdout); } int resize_buffer (char **buf, size_t *len) { /* Using mybuf and mylen for readability */ char *mybuf = *buf; size_t mylen = *len; char *tmp; size_t destlen = 2*mylen+1; /* A */ if (NULL == (tmp = realloc(mybuf, destlen))) { return 1; } mybuf = tmp; mylen = destlen - 1; /* write back to parameters */ *buf = mybuf; *len = mylen; return 0; } Cheers Michael -- E-Mail: Mine is an /at/ gmx /dot/ de address. |
Re: Reading Words from File
In article <dhumdl$j2o$1@chessie.cirr.com>,
Christopher Benson-Manica <ataru@nospam.cyberspace.org> wrote: >Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote: >> There is often a non-trivial semantic problem in deciding what >> a "word" is in such matters. >aside, it seems to me that assuming a "word" is a sequence of >consecutive alpha characters would yield better results, at least >depending on what OP wants to do with the "words" once he has them. Using "alpha" as the boundary definition runs into difficulties with possessives, contractions, joined-words, and words such as re-enter in which the dash indicates seperation of vowels that would otherwise form a diapthong. It would likely also run into problems with Mr. Salutation, and abbreviations such as etc. in which the period is really part of the word. -- Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson |
Re: Reading Words from File
Christopher Benson-Manica wrote On 10/04/05 15:50,: > Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote: > > >>There is often a non-trivial semantic problem in deciding what >>a "word" is in such matters. For example, in > > >> "Oh!," he yelled (into his Hello-Kitty phone.) > > > I must say that that is a truly bizarre example sentence :-) That > aside, it seems to me that assuming a "word" is a sequence of > consecutive alpha characters would yield better results, at least > depending on what OP wants to do with the "words" once he has them. This is a reasonable 1st approximation, but its tend- ency to generate non-words (e.g., "st") isn't desirable. -- Eric.Sosman@sun.com |
Re: Reading Words from File
"dough" <vicluo@gmail.com> wrote in message news:1128451179.310759.89890@f14g2000cwb.googlegro ups.com... > I want to read in lines from a file and then seperate the words so i > can do a process on each of the words. Say the text file "readme.txt" > contains the following: > > In the face of criticism from the left and right, President Bush > insisted Tuesday that Harriet Miers is the nation's best-qualified > candidate for the Supreme Court and assured skeptical conservatives > that his lawyer... > > I could get an input to a char *s such that s = "In" and then i do > something with s, then s = "the" and then i do something with that, > etc. With no idea the length of any string or line or whitespace. > > Heres what I have so far. > > #include <ctype.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > > void process(char *s) /* whats here is not really important * > { > printf("%s", s); > } > > int main() { > > char buffer[80]; > FILE *f = fopen("readme.txt", "r"); > char *s; > > while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */ > { > while( sscanf(buffer, "%s", s) ) /* scans for words in line */ > { > process(s); /* do stuff to the words */ > } > } > > fclose(f); > return 0; > > } > > Also, is there anyway to adjust the size of the buffer or reallocate > the memory so it doesn't overflow and get a seg error. > "process" is a terrible name for a function in any context. Barry |
| All times are GMT. The time now is 10:15 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.