Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > counting words in input

Reply
Thread Tools

counting words in input

 
 
arnuld
Guest
Posts: n/a
 
      12-16-2007
I am able to create the 90% of this program and it runs fine. In its
present implementation, it reads from standard input. I am not able to
complete this program as last part requires to read from a file. All I
know about file-streams is that I need to use:
<int main(int argc, char**argv)>
and nothing more than that. I will appreciate if someone can help me:

/* C++ Primer - 4/e
*
* chapter 11, exercise 11.9
* STATEMENT
* Write a program to count word size of greater than or equal to 4
including printing the list of unique words in the input. Test your
program by running it on program's source file.
*
*/


#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>


/* this functions appends the 3rd argument to its 2nd argument if 1st
argument is true */
std::string make_plural( size_t ctr,
const std::string &word,
const std::string & ending )
{
return (ctr == 1) ? word : word + ending;
}



bool isShorter( const std::string &s1, const std::string &s2 ) {
return s1.size() < s2.size();
}


bool GT4( const std::string &s )
{
return s.size() >= 4;
}


int main( )
{
std::vector<std::string> svec;
/* input some words */
std::copy( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>(), std::back_inserter( svec ) );



/* copy the vector, to be used later for printing */
std::vector<std::string> svec_old( svec );
std::sort( svec.begin(), svec.end() );

/* to eliminate th dupilcate words we 1st, rearrange the words by
putting all the duplicate words in the end of vector and then we will
use vector operation ERASE to remove them */

std::vector<std::string>::iterator begin_duplicates =
std::unique( svec.begin(), svec.end() );

svec.erase( begin_duplicates, svec.end() );

/* sort the words by size while maintaining the alphabetical order */
std::stable_sort( svec.begin(), svec.end(), isShorter );

std::vector<std::string>::size_type unique_count =
std::count_if( svec.begin(), svec.end(), GT4 );


std::cout << unique_count << " "
<< make_plural( unique_count, "word", "s" )
<< " 4 characters or longer"
<< std::endl;

for( std::vector<std::string>::const_iterator iter = svec_old.begin();
iter != svec_old.end();
++iter )
{
if( GT4( *iter ))
{
std::cout << *iter << std::endl;
}
}

return 0;

}



 
Reply With Quote
 
 
 
 
Alf P. Steinbach
Guest
Posts: n/a
 
      12-16-2007
* arnuld:
> I am able to create the 90% of this program and it runs fine. In its
> present implementation, it reads from standard input. I am not able to
> complete this program as last part requires to read from a file. All I
> know about file-streams is that I need to use:
> <int main(int argc, char**argv)>
> and nothing more than that. I will appreciate if someone can help me:
>
> /* C++ Primer - 4/e
> *
> * chapter 11, exercise 11.9
> * STATEMENT
> * Write a program to count word size of greater than or equal to 4
> including printing the list of unique words in the input. Test your
> program by running it on program's source file.
> *
> */


Assuming your program's executable is 'myprogram', and the program's
source file is 'myprogram.cpp', on a *nix system try

$ ./myprogram <myprogram.cpp

or

$ cat myprogram.cpp | ./myprogram

or on a Windows system

C:\wherever> myprogram <myprogram.cpp

or

C:\wherever> type myprogram.cpp | myprogram

That said, you can do program arguments simply like (disclaimer:
off-the-cuff code, not touched by compiler's hands):

#include <fstream>
#include <iostream>
#include <ostream>
#include <cstddef>

void doThings( std::istream& input )
{
// ...
}

int main( int argc, char* argv[] )
{
using namespace std;

switch( argc )
{
case 1:
{
doThings( cin );
return EXIT_SUCCESS;
}

case 2:
{
ifstream input( argv[1] );
if( !input )
{
cerr << "Unable to open [" << argv[1] << "]." << endl;
return EXIT_FAILURE;
}
else
{
doThings( input );
return EXIT_SUCCESS;
}
}

case default:
{
cerr << "Usage: plingplong [FILENAME]" << endl;
return EXIT_FAILURE;
}
}
}

Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
Reply With Quote
 
 
 
 
Juha Nieminen
Guest
Posts: n/a
 
      12-16-2007
Alf P. Steinbach wrote:
> $ cat myprogram.cpp | ./myprogram


You really shouldn't be teaching useless use of cat.

If you really want to express the input file before the program you
can do it like this:

$ < myprogram.cpp ./myprogram
 
Reply With Quote
 
Alf P. Steinbach
Guest
Posts: n/a
 
      12-16-2007
* Juha Nieminen:
> Alf P. Steinbach wrote:
>> $ cat myprogram.cpp | ./myprogram

>
> You really shouldn't be teaching useless use of cat.
>
> If you really want to express the input file before the program you
> can do it like this:
>
> $ < myprogram.cpp ./myprogram


Well, the cat is idiomatic and easy to read, whereas the command you
show isn't.

C++ related: the impossibility of writing a copy-input-to-output-exactly
program in a portable way for those systems where it's meaningful.

Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      12-16-2007
> On Sun, 16 Dec 2007 09:25:43 +0100, Alf P. Steinbach wrote:


> int main( int argc, char* argv[] )
> {
> using namespace std;
>
> switch( argc )
> {
> case 1:
> {
> doThings( cin );
> return EXIT_SUCCESS;
> }


in C++ Primer 4/e, I read that <argv[0]> is always going to be reserved.
It means that argv[] is always going to have one element and If I expect
one input file then there will be 2 elements: argv[0] and argv[1] and
argv[1] will be the input file.

So if "case 1" holds then it means there is no input file, then why
EXIT_SUCCESS ?



 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      12-16-2007
> On Sun, 16 Dec 2007 09:25:43 +0100, Alf P. Steinbach wrote:

> case default:
> {
> cerr << "Usage: plingplong [FILENAME]" << endl;
> return EXIT_FAILURE;
> }
> }


I am sure you left that "case" just before "default" to teach me a lesson.
Well, It took me 20 min to figure out the error from GCC and it really
taught me some lesson


 
Reply With Quote
 
arnuld
Guest
Posts: n/a
 
      12-16-2007
> On Sun, 16 Dec 2007 09:25:43 +0100, Alf P. Steinbach wrote:

> Assuming your program's executable is 'myprogram', and the program's
> source file is 'myprogram.cpp', on a *nix system try
>
> $ ./myprogram <myprogram.cpp


> ............ [SNIP].............



> case default:
> {
> cerr << "Usage: plingplong [FILENAME]" << endl;
> return EXIT_FAILURE;
> }
> }
> }



switch( argc )
{
case 1:
{
std::cerr << "No input file ?\n";
return EXIT_FAILURE;
}
case 2:
{
std::ifstream infile( argv[1] );
if ( !infile )
{
std::cerr << "Can't open file \n" << std::endl;
return EXIT_FAILURE;
}
else
{
save_to_vec(infile, svec);
return EXIT_SUCCESS;
}
}
default:
{
std::cerr << "Usage Pling-Plong\n";
return EXIT_FAILURE;
}
}



it always outputs this:

[arnuld@arch programs]$ ls
10.01.cpp 11.09.cpp 11.09.cpp~ 11.09_using-std-input.cpp a.out post.txt
[arnuld@arch programs]$ ./a.out <10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out < 10.01.cpp
No input file ?
[arnuld@arch programs]$ ./a.out
No input file ?
[arnuld@arch programs]$ cat 10.01.cpp | ./a.out
No input file ?
[arnuld@arch programs]$


 
Reply With Quote
 
Pete Becker
Guest
Posts: n/a
 
      12-16-2007
On 2007-12-16 11:41:34 -0500, arnuld <(E-Mail Removed)> said:

>
> it always outputs this:
>
> [arnuld@arch programs]$ ls
> 10.01.cpp 11.09.cpp 11.09.cpp~ 11.09_using-std-input.cpp a.out post.txt
> [arnuld@arch programs]$ ./a.out <10.01.cpp
> No input file ?
> [arnuld@arch programs]$ ./a.out < 10.01.cpp
> No input file ?
> [arnuld@arch programs]$ ./a.out
> No input file ?
> [arnuld@arch programs]$ cat 10.01.cpp | ./a.out
> No input file ?
> [arnuld@arch programs]$


As it should. <g> The third example should be obvious: there is no
input whatsoever. The rest all do the same thing: they put data on the
standard input stream. The program doesn't read standard input, though,
so it complains that there's no input file. (The earlier suggested
version, with case 1: would read the standard input stream in that
case).

../a.out 10.0.1.cpp

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

 
Reply With Quote
 
Daniel T.
Guest
Posts: n/a
 
      12-16-2007
arnuld <(E-Mail Removed)> wrote:

> I am able to create the 90% of this program and it runs fine. In its
> present implementation, it reads from standard input. I am not able to
> complete this program as last part requires to read from a file. All I
> know about file-streams is that I need to use:
> <int main(int argc, char**argv)>
> and nothing more than that. I will appreciate if someone can help me:


To answer your specific question, you would need to do something like
this:

int main( int argc, char** argv ) {
if ( argc < 2 ) {
cout << "format: " << argv[0] << " filename\n";
return -1;
}
ifstream file( argv[1] );

// from here on out, use 'file' instead of 'cin'...

> /* C++ Primer - 4/e
> *
> * chapter 11, exercise 11.9
> * STATEMENT
> * Write a program to count word size of greater than or equal to 4
> including printing the list of unique words in the input. Test your
> program by running it on program's source file.
> *
> */


I take the above to mean, count the total number of words >= 4
(including duplicates) and list all unique words. So for example:

"this this this" would print 3 words >= 4, unique words: 'this'

I am currently dealing with a lot of text because I am involved with
converting/writing several programs to handle multiple languages right
now with work.

In the real world, such a problem statement would expect you to handle
upper/lower case letters properly, and deal with punctuation. On top of
that, several languages (even strictly Western European ones) use
letters that are not in the ASCII or Latin-1 character sets. You are
likely to get input that is in either UTF-16BE, UTF-16LE, or UTF-8
formats. To solve the problem statement, you would need to know what
input formats you must support... In other words, that deceptively
simple problem statement, has the potential of producing a very
complicated program.

> #include <iostream>
> #include <vector>
> #include <string>
> #include <algorithm>
> #include <iterator>
>
>
> /* this functions appends the 3rd argument to its 2nd argument if 1st
> argument is true */
> std::string make_plural( size_t ctr,
> const std::string &word,
> const std::string & ending )
> {
> return (ctr == 1) ? word : word + ending;
> }
>
>
>
> bool isShorter( const std::string &s1, const std::string &s2 ) {
> return s1.size() < s2.size();
> }
>
>
> bool GT4( const std::string &s )
> {
> return s.size() >= 4;
> }
>
>
> int main( )
> {
> std::vector<std::string> svec;
> /* input some words */
> std::copy( std::istream_iterator<std::string>( std::cin ),
> std::istream_iterator<std::string>(), std::back_inserter( svec ) );
>
>
>
> /* copy the vector, to be used later for printing */
> std::vector<std::string> svec_old( svec );


Rather than make a copy like above, put the stuff that uses svec in a
separate function. Pass the vector by value and a copy will
automatically be made. (Caveat: I don't think a copy is necessary to
solve the problem though.)

> std::sort( svec.begin(), svec.end() );
>
> /* to eliminate th dupilcate words we 1st, rearrange the words by
> putting all the duplicate words in the end of vector and then we will
> use vector operation ERASE to remove them */
>
> std::vector<std::string>::iterator begin_duplicates =
> std::unique( svec.begin(), svec.end() );
>
> svec.erase( begin_duplicates, svec.end() );


The erase, remove idiom is pretty well known. No need to break it up.

svec.erase( unique( svec.begin(), svec.end() ), svec.end() );

> /* sort the words by size while maintaining the alphabetical order */
> std::stable_sort( svec.begin(), svec.end(), isShorter );


The above is wholly unnecessary.

> std::vector<std::string>::size_type unique_count =
> std::count_if( svec.begin(), svec.end(), GT4 );


Strictly speaking that should be &GT4 in the above, not putting the '&'
on the function name is deprecated.

> std::cout << unique_count << " "
> << make_plural( unique_count, "word", "s" )
> << " 4 characters or longer"
> << std::endl;


If I understand the problem correctly, you are supposed to print a list
of all the unique words in the input. The below prints all the words
with a size of 4+ and prints duplicates...

> for( std::vector<std::string>::const_iterator iter = svec_old.begin();
> iter != svec_old.end();
> ++iter )
> {
> if( GT4( *iter ))
> {
> std::cout << *iter << std::endl;
> }
> }
>
> return 0;
>
> }



Here is one of the solutions I came up with. Note especially how easy
the "count_if" line is. "count_if... size_greater_than( 3 )" makes a lot
of sense grammatically. Also note that I didn't make a separate
'make_plural' function. It is not that easy to pluralize a word so I
tend to do it on a case-by-case basis.

struct size_greater_than : unary_function< string, bool >
{
size_t x;
size_greater_than( size_t x ): x( x ) { }
bool operator()( const string& s ) const {
return s.size() > x;
}
};

int main( int argc, char** argv ) {
if ( argc < 2 ) {
cout << "format: " << argv[0] << " filename\n";
return -1;
}
ifstream file( argv[1] );

vector< string > words;
copy( istream_iterator< string >( file ), istream_iterator<string>(),
back_inserter( words ) );

// count and output the total number of words >= 4
size_t count = count_if( words.begin(), words.end(),
size_greater_than( 3 ) );

cout << count << " word" << ( count == 1 ? " " : "s " )
<< "4 characters or longer\n";

// sort and remove duplicates
sort( words.begin(), words.end() );
words.erase( unique( words.begin(), words.end() ), words.end() );

// output unique words
cout << "Unique words: \n";
copy( words.begin(), words.end(),
ostream_iterator<string>( cout, "\n" ) );
}
 
Reply With Quote
 
Rolf Magnus
Guest
Posts: n/a
 
      12-16-2007
arnuld wrote:

>> On Sun, 16 Dec 2007 09:25:43 +0100, Alf P. Steinbach wrote:

>
>> Assuming your program's executable is 'myprogram', and the program's
>> source file is 'myprogram.cpp', on a *nix system try
>>
>> $ ./myprogram <myprogram.cpp

>
>> ............ [SNIP].............

>
>
>> case default:
>> {
>> cerr << "Usage: plingplong [FILENAME]" << endl;
>> return EXIT_FAILURE;
>> }
>> }
>> }

>
>
> switch( argc )
> {
> case 1:
> {
> std::cerr << "No input file ?\n";
> return EXIT_FAILURE;
> }
> case 2:
> {
> std::ifstream infile( argv[1] );
> if ( !infile )
> {
> std::cerr << "Can't open file \n" << std::endl;
> return EXIT_FAILURE;
> }
> else
> {
> save_to_vec(infile, svec);
> return EXIT_SUCCESS;
> }
> }
> default:
> {
> std::cerr << "Usage Pling-Plong\n";
> return EXIT_FAILURE;
> }
> }
>
>
>
> it always outputs this:
>
> [arnuld@arch programs]$ ls
> 10.01.cpp 11.09.cpp 11.09.cpp~ 11.09_using-std-input.cpp a.out
> post.txt
> [arnuld@arch programs]$ ./a.out <10.01.cpp
> No input file ?
> [arnuld@arch programs]$ ./a.out < 10.01.cpp
> No input file ?
> [arnuld@arch programs]$ ./a.out
> No input file ?
> [arnuld@arch programs]$ cat 10.01.cpp | ./a.out
> No input file ?
> [arnuld@arch programs]$


In all those cases except the third, you are redirecting a file to your
program's standard input stream and giving it no command line argument.
Try: ./a.out 10.01.cpp

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
counting input words arnuld C++ 1 11-23-2007 11:07 AM
counting repeated words in input arnuld C++ 10 08-03-2007 02:58 PM
counting up instead of counting down edwardfredriks Javascript 6 09-07-2005 03:30 PM
Counting words in an Html Document Francois Java 13 10-14-2004 04:59 PM
Counting words in Acrobat Reader. Constantine Computer Information 0 09-27-2004 10:49 AM



Advertisments