Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > taking a "word" as input

Reply
Thread Tools

taking a "word" as input

 
 
Nick Keighley
Guest
Posts: n/a
 
      04-29-2008
On 29 Apr, 13:27, arnuld <NoS...@NoPain.com> wrote:

> C takes input character by character.


nope. It can read lines (fgets()) or arbitary blocks (fread())

> I did not find any Standard Library
> function that can take a word as input.


correct, there aren't any.


> So I want to write one of my own
> to be used with "Self Referential Structures" of section 6.5 of K&R2. K&R2
> has their own version of <getword> which, I think, *is quite different
> from what I need:
>
> <getword> will have following properties:
>
> *1.) If the word contains any number like "beauty1" or "win2e" it will
> *discard it, K&R2's <getword> does not. My <getword> will only take
> *pure-words like "beauty", "wine" etc.


take a look at isalpha()


> 2.) we can store each word by using <array of pointers> pointing to those
> words and since words themselves are *strings, which in
> reality, are <arrays of chars>, so we will have <array of pointers> to
> those <arrays of chars>.


char *word_table [100];


> or you think using a 2D array is a better idea ?


are all your words the same size?
If you use the array of pointers you'll have to get the memory
for each word from somewhere (eg. malloc())


--
Nick Keighley

there may have been other things between sliced bread and Java
 
Reply With Quote
 
 
 
 
arnuld
Guest
Posts: n/a
 
      04-29-2008
C takes input character by character. I did not find any Standard Library
function that can take a word as input. So I want to write one of my own
to be used with "Self Referential Structures" of section 6.5 of K&R2. K&R2
has their own version of <getword> which, I think, is quite different
from what I need:

<getword> will have following properties:


1.) If the word contains any number like "beauty1" or "win2e" it will
discard it, K&R2's <getword> does not. My <getword> will only take
pure-words like "beauty", "wine" etc.


2.) we can store each word by using <array of pointers> pointing to those
words and since words themselves are strings, which in
reality, are <arrays of chars>, so we will have <array of pointers> to
those <arrays of chars>.


or you think using a 2D array is a better idea ?



--
http://lispmachine.wordpress.com/
my email ID is at the above address

 
Reply With Quote
 
 
 
 
santosh
Guest
Posts: n/a
 
      04-29-2008
arnuld wrote:

> C takes input character by character. I did not find any Standard
> Library function that can take a word as input. So I want to write one
> of my own to be used with "Self Referential Structures" of section 6.5
> of K&R2. K&R2
> has their own version of <getword> which, I think, is quite different
> from what I need:
>
> <getword> will have following properties:
>
>
> 1.) If the word contains any number like "beauty1" or "win2e" it will
> discard it, K&R2's <getword> does not. My <getword> will only take
> pure-words like "beauty", "wine" etc.


What about words with other characters like hyphen? What about
constructs like "get_name"? Will you discard them too. What about words
that end with a ; or ...? What about words that contain symbols like #@
etc? Or words that end with an exclamation mark? Or words within
parenthesis or braces?

Just giving you some food for thought as to what exactly you are going
to consider a word and what you will reject. This can be far trickier
than one first imagines.

> 2.) we can store each word by using <array of pointers> pointing to
> those words and since words themselves are strings, which in
> reality, are <arrays of chars>, so we will have <array of pointers> to
> those <arrays of chars>.


That's one way yes, suitable when you don't know the lengths of words in
advance, or you don't want to possibly waste storage with statically
allocated arrays.

> or you think using a 2D array is a better idea ?


Depends on your requirements really, and the type and frequency of input
you expect. Will you put an upper limit on the length of words? It
hardly makes sense to accept words longer than about 64 characters if
you are dealing with normal English text. Static 2D arrays are
undoubtedly easier to work with but are less flexible than dynamically
allocated arrays. Since statically allocated arrays are of fixed size
it's possible for some elements to remain unused and hence wasted. OTOH
a large number of small allocations may lead to memory fragmentation
and also some wastage due to malloc bookeeping and possibly also a
slowdown in speed if you'll be reading a very large number of words
from a file. For input from a human it will not matter.

One efficient method is to use a single dynamically allocated array in
which words are stored sequentially. The length of each word could be
specified by either one or two bytes prefixing the word itself. This
results in very efficient storage, but is grossly inefficient if you
want to insert and delete words at random. For this a hash table based
approach is probably the best. OTOH a tree is very convenient for quick
searching and sorting.

If you tell us more details about the type and volume of input you
expect and the facilities (like searching, insertion, etc.) you plan to
implement, perhaps a tailored approach can be suggested.

 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      04-30-2008
On 30 Apr, 23:02, arnuld <NoS...@NoPain.com> wrote:
> .> On Tue, 29 Apr 2008 02:47:18 -0700,Nick Keighleywrote:


> > are all your words the same size?

>
> It depends on the user, what he likes to input at run-time.


in other words, no.

santosh has pointed out some of the design drivers for this.
So decide do you want a fixed size (limits word size and wastes space)
or a variable size (harder to program).


> > If you use the array of pointers you'll have to get the memory
> > for each word from somewhere (eg. malloc())


Note Well


> yes. I came up with this code and as you can see it does not do what I
> want. I want to take every word into the input but it only takes 1st for
> obvious reasons. I am not able to think of the way to take all the words
> of the input:


1. after you read a word you need to skip to the next word.

eg. read until you get a letter

2. you need somewhere to store the words. Either a 2D array or
use malloc().


> #include <stdio.h>
> #include <ctype.h>
>
> enum MAXSIZE { MAXWORD = 100 };
>
> char *getword( char *, int );
>
> int main(void) {
>
> * char buffer[MAXWORD];


this only holds one word

char buffer[MAXNUMWORDS][MAXWORD];
OR char* buffer [MAXWORD]


> * getword( buffer, MAXWORD );


pass the appropriate argument

<snip>

--
Nick keighley
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      04-30-2008
On 30 Apr, 23:02, arnuld <NoS...@NoPain.com> wrote:
> > On Tue, 29 Apr 2008 20:53:50 +0530, santosh wrote:


> > What about words with other characters like hyphen? What about
> > constructs like "get_name"? Will you discard them too. What about words
> > that end with a ; or ...? What about words that contain symbols like #@
> > etc? Or words that end with an exclamation mark? Or words within
> > parenthesis or braces?

>
> all of them will be discarded. Only words containing letters like
> "santosh" will be considered, nothing else.
>
> > That's one way yes, suitable when you don't know the lengths of words in
> > advance, or you don't want to possibly waste storage with statically
> > allocated arrays.

>
> yes, exactly, input will be at run-time only.


I don't understand what you mean here


> > Depends on your requirements really, and the type and frequency of input
> > you expect. Will you put an upper limit on the length of words? * It
> > hardly makes sense to accept words longer than about 64 characters if
> > you are dealing with normal English text.

>
> ok, make the upper limit to 64 , I usually take it 100 as my style.
>
> > Static 2D arrays are
> > undoubtedly easier to work with but are less flexible than dynamically
> > allocated arrays. Since statically allocated arrays are of fixed size
> > it's possible for some elements to remain unused and hence wasted. OTOH
> > a large number of small allocations may lead to memory fragmentation and
> > also some wastage due to malloc bookeeping and possibly also a slowdown
> > in speed if you'll be reading a very large number of words from a file.
> > For input from a human it will not matter.

>
> you want to say that there will be 2 types of implementations if
> efficiency is my concern:
>
> * 1.) input from human
> * 2.) input from a text-file
>
> *??
>
> couldn't there be a single implementation for both types of inputs ?


yes. But file or human might influence your design. People type
v e r y s l o w l y so a human input only program doesn't need to
be fast (for this problem). The file input one should work just fine
with people.


> > One efficient method is to use a single dynamically allocated array in
> > which words are stored sequentially. The length of each word could be
> > specified by either one or two bytes prefixing the word itself. This
> > results in very efficient storage, but is grossly inefficient if you
> > want to insert and delete words at random. For this a hash table based
> > approach is probably the best. OTOH a tree is very convenient for quick
> > searching and sorting.
> > If you tell us more details about the type and volume of input you
> > expect and the facilities (like searching, insertion, etc.) you plan to
> > implement, perhaps a tailored approach can be suggested.

>
> The basic problem is to sort, count and print the sorted words. *We are
> not going to save a word in an array if it has already appeared, we will
> just increase the count for that word. *


that didn't really answer the question...

> K&R2 seems to suggest that a *doubly-linked list using binary search is
> the most efficient method to use, described in section 6.5 and is already
> solved. Actually I am not able to understand the <getword> function of the
> authors which actually is different from what I want, hence I need to
> create one of my own.



--
Nick Keighley
 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      04-30-2008
> On Thu, 01 May 2008 03:02:23 +0500, arnuld wrote:

> .... SNIP...


> The basic problem is to sort, count and print the sorted words. We are
> not going to save a word in an array if it has already appeared, we will
> just increase the count for that word.


> .....SNIP....


by accident, it is actually exercise 6-4 of K&R2



--
http://lispmachine.wordpress.com/
my email ID is at the above address

 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      04-30-2008
> On Wed, 30 Apr 2008 20:54:48 +0500, arnuld wrote:

> by accident, it is actually exercise 6-4 of K&R2



How about this code. It works fine:


/* A program that takes a single word as input. It will discard
* the whole input if it contains anything other than the 26 alphabets
* of English. If the input word contains more than 30 letters then only
* the extra letters will be discarded . For general purpose usage of
* English it does not make any sense to use a word larger than this size.
* Nearly every general purpose word can be expressed in a word with less
* than or equal to 30 letters.
*
* version 1.1
*
*/


#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>


enum MAXSIZE { WORDSIZE = 30 };

int getword( char *, int );


int main( void )
{
char ac[WORDSIZE];

if( getword( ac, WORDSIZE ) )
{
printf("%s\n", ac);
}

return EXIT_SUCCESS;

}


int getword( char *word, int max_length )
{
int c;
char *w = word;


while( isspace( c = getchar() ) )
{
;
}

while( --max_length )
{
if( isalpha( c ) )
{
*w++ = c;
}
else if( c == '\n' || c == EOF || isspace( c ) )
{
*w = '\0';
break;
}
else
{
return 0;
}

c = getchar();
}

/* I can simply ignore the if condition and directly write the '\0'
onto the last element because in worst case it will only rewrite
the '\n' that is put in there by else if clause.

or in else if clause, I could replace break with return word[0].

I thought these 2 ideas will be either inefficient or
a bad programming practice, so I did not do it.
*/
if( *w != '\0' )
{
*w = '\0';
}



return word[0];
}


========== OUTPUT ============
Welcome to the Emacs shell

/home/arnuld/programs/C $ gcc -ansi -pedantic -Wall -Wextra getword.c
/home/arnuld/programs/C $ ./a.out
like this
like
/home/arnuld/programs/C $ ./a.out
like3
/home/arnuld/programs/C $ ./a.out
9like
/home/arnuld/programs/C $ ./a.out
like ll
like
/home/arnuld/programs/C $



--
http://lispmachine.wordpress.com/
my email ID is @ the above address

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      04-30-2008
arnuld <> writes:

>> On Wed, 30 Apr 2008 20:54:48 +0500, arnuld wrote:

>
>> by accident, it is actually exercise 6-4 of K&R2


I don't have K&R2 so I don't know the end point of this exercise, so I
may have this wrong...

> How about this code. It works fine:
>
> /* A program that takes a single word as input. It will discard
> * the whole input if it contains anything other than the 26 alphabets
> * of English. If the input word contains more than 30 letters then only
> * the extra letters will be discarded . For general purpose usage of
> * English it does not make any sense to use a word larger than this size.
> * Nearly every general purpose word can be expressed in a word with less
> * than or equal to 30 letters.
> *
> * version 1.1
> *
> */
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <ctype.h>
>
>
> enum MAXSIZE { WORDSIZE = 30 };
>
> int getword( char *, int );
>
>
> int main( void )
> {
> char ac[WORDSIZE];
>
> if( getword( ac, WORDSIZE ) )
> {
> printf("%s\n", ac);
> }
>
> return EXIT_SUCCESS;
>
> }
>
>
> int getword( char *word, int max_length )
> {
> int c;
> char *w = word;
>
>
> while( isspace( c = getchar() ) )
> {
> ;
> }


I find { ; } a messy way of saying nothing, but that is a style
point. More important, if this will be used to read more than one
word (eventually) you need to skip anything that you don't count as a
word character, not just spaces.

> while( --max_length )
> {
> if( isalpha( c ) )
> {
> *w++ = c;
> }
> else if( c == '\n' || c == EOF || isspace( c ) )
> {
> *w = '\0';
> break;
> }
> else
> {
> return 0;


When the word ends because of this condition, why do you return 0
rather than the word you have read? You do have a word to return.

> }
>
> c = getchar();
> }
>
> /* I can simply ignore the if condition and directly write the '\0'
> onto the last element because in worst case it will only rewrite
> the '\n' that is put in there by else if clause.


I think the comment is confusing. Without the if below, you re-write
a 0 that is already there. A \n is never put into the buffer.

> or in else if clause, I could replace break with return word[0].
>
> I thought these 2 ideas will be either inefficient or
> a bad programming practice, so I did not do it.
> */
> if( *w != '\0' )
> {
> *w = '\0';
> }


I'd just write *w = '\0';

> return word[0];


That's a char. Given what you said about conversions and clarity, you
should really write return word[0] != '\0'; or maybe return !!word[0];

> }


--
Ben.
 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      04-30-2008
..> On Tue, 29 Apr 2008 02:47:18 -0700, Nick Keighley wrote:


> are all your words the same size?


It depends on the user, what he likes to input at run-time.


> If you use the array of pointers you'll have to get the memory
> for each word from somewhere (eg. malloc())


yes. I came up with this code and as you can see it does not do what I
want. I want to take every word into the input but it only takes 1st for
obvious reasons. I am not able to think of the way to take all the words
of the input:




#include <stdio.h>
#include <ctype.h>


enum MAXSIZE { MAXWORD = 100 };

char *getword( char *, int );


int main(void) {

char buffer[MAXWORD];

getword( buffer, MAXWORD );

printf("--------------------\n");
printf("%s\n", buffer);

return 0;
}



char *getword( char *word, int max )
{
int c, i;

i = 0;

while( isalpha(c = getchar()) && i < max - 1 )
{
word[i++] = c;
}

word[i] = '\0';

return word;
}

============= OUTPUT =================
/home/arnuld/programs/C $ gcc -ansi -pedantic -Wall -Wextra test.c
/home/arnuld/programs/C $ ./a.out
like that
--------------------
like
/home/arnuld/programs/C $



--
http://lispmachine.wordpress.com/
my email ID is at the above address

 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      04-30-2008
> On Tue, 29 Apr 2008 20:53:50 +0530, santosh wrote:


> What about words with other characters like hyphen? What about
> constructs like "get_name"? Will you discard them too. What about words
> that end with a ; or ...? What about words that contain symbols like #@
> etc? Or words that end with an exclamation mark? Or words within
> parenthesis or braces?


all of them will be discarded. Only words containing letters like
"santosh" will be considered, nothing else.




> That's one way yes, suitable when you don't know the lengths of words in
> advance, or you don't want to possibly waste storage with statically
> allocated arrays.


yes, exactly, input will be at run-time only.


> Depends on your requirements really, and the type and frequency of input
> you expect. Will you put an upper limit on the length of words? It
> hardly makes sense to accept words longer than about 64 characters if
> you are dealing with normal English text.


ok, make the upper limit to 64 , I usually take it 100 as my style.



> Static 2D arrays are
> undoubtedly easier to work with but are less flexible than dynamically
> allocated arrays. Since statically allocated arrays are of fixed size
> it's possible for some elements to remain unused and hence wasted. OTOH
> a large number of small allocations may lead to memory fragmentation and
> also some wastage due to malloc bookeeping and possibly also a slowdown
> in speed if you'll be reading a very large number of words from a file.
> For input from a human it will not matter.



you want to say that there will be 2 types of implementations if
efficiency is my concern:

1.) input from human
2.) input from a text-file

??

couldn't there be a single implementation for both types of inputs ?


> One efficient method is to use a single dynamically allocated array in
> which words are stored sequentially. The length of each word could be
> specified by either one or two bytes prefixing the word itself. This
> results in very efficient storage, but is grossly inefficient if you
> want to insert and delete words at random. For this a hash table based
> approach is probably the best. OTOH a tree is very convenient for quick
> searching and sorting.


> If you tell us more details about the type and volume of input you
> expect and the facilities (like searching, insertion, etc.) you plan to
> implement, perhaps a tailored approach can be suggested.



The basic problem is to sort, count and print the sorted words. We are
not going to save a word in an array if it has already appeared, we will
just increase the count for that word.

K&R2 seems to suggest that a doubly-linked list using binary search is
the most efficient method to use, described in section 6.5 and is already
solved. Actually I am not able to understand the <getword> function of the
authors which actually is different from what I want, hence I need to
create one of my own.





--
http://lispmachine.wordpress.com/
my email ID is at the above address

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
taking step by step input from multiple files adi.norules Java 0 06-24-2008 07:00 AM
taking input from several files adi.norules Java 0 06-14-2008 11:14 AM
Problem Taking input:( Sudip C Programming 4 04-12-2006 07:14 AM
Which is faster in ASIC: 2-input AND gate or a 2-input multiplexer Weng Tianxiang VHDL 12 08-11-2005 10:50 AM
Difference in module_eval taking block vs. taking string (1.8 bug?) Jim Cain Ruby 1 07-18-2003 02:01 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57