Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > counting word occurances

Reply
Thread Tools

counting word occurances

 
 
Rodrick Brown
Guest
Posts: n/a
 
      06-03-2005
Hello,

Just learning Perl so bare with me.

I have the following output file:

pear
apple
apple
orange
mango
mango
pear
cherry
apple

ill would like the count the ammount of occurances for each fruit.

I spent a few hours trying to do this and just gave up if someone can help
me out with an example or a better way to do this than the method i'm trying
to use

This is as far as I got

#!/usr/bin/perl -w

use strict;

my @keys;
my @fruits;
my %cnt;
my $types;
my $f = 1;
my $m;

open(LOG,"/tmp/fruits.txt") || die("Can't open file: $!\n");
while(<LOG>)
{
next if(/^\s+/);
push(@fruits,$_);
}

# Give all fruits a default value of 1
foreach my $types (@fruits)
{
$cnt{$types} = $f;
}

foreach $types (@fruits)
{
@keys = keys %cnt;
while(@keys)
{
my $fruitnames = pop(@keys);
if($types =~ m/$fruitnames/)
{
$cnt{$types}++;
print "$cnt{$types} $fruitnames";
}
}
}

The code doesnt work and i'm a bit fustrated that I couldnt get it working,
many times I thought I had it but I never did get the results I expected.

--
RB


 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      06-03-2005
Rodrick Brown wrote:

> Hello,
>
> Just learning Perl so bare with me.
>
> I have the following output file:
>
> pear
> apple
> apple
> orange
> mango
> mango
> pear
> cherry
> apple
>
> ill would like the count the ammount of occurances for each fruit.
>
> I spent a few hours trying to do this and just gave up if someone can
> help me out with an example or a better way to do this than the method
> i'm trying to use
>
> This is as far as I got
>
> #!/usr/bin/perl -w


don't use -w, use warnings; instead:

> use strict;


use warnings;

> my @keys;
> my @fruits;
> my %cnt;
> my $types;
> my $f = 1;
> my $m;


do this when you need it, not ahead of time. I replace them with:

my $filename = '/tmp/fruits.txt';
my %count;

> open(LOG,"/tmp/fruits.txt") || die("Can't open file: $!\n");


open my $fh, $filename or die "Can't open '$filename' for reading: $!";

while ( my $line = <$fh> ) {

$line =~ s/^\s+//; # remove leading whitespace
$line =~ s/\s+$//; # remove trailing whitespace (and \n)

next if $line eq ''; # skip empty lines

$count{ $line }++;
}

close $fh or die "Can't close '$filename' after reading: $!"

Note that some magic happens here and there, like incrementing an
undefined entry in a hash table (%count) assumes it had a value of zero.

Also note that I use an undefined variable in open, so it can be used as
a file handle.

> foreach $types (@fruits)
> {
> @keys = keys %cnt;
> while(@keys)
> {
> my $fruitnames = pop(@keys);
> if($types =~ m/$fruitnames/)
> {
> $cnt{$types}++;
> print "$cnt{$types} $fruitnames";
> }
> }
> }


I don't even want to guess what's going on here

print "$count{$_} $_\n"
for sort { $count{ $b } <=> $count{ $a } } keys %count;

Since I have $b to the left, it sorts the keys of %count descending
based on the count of each item.

I recommend reading a bit more on hash tables, the use of for(each), and
open.

(all code untested)

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      06-03-2005
Rodrick Brown wrote:
> Just learning Perl so bare with me.


There isn't really much Perl involved here except for the hash.

> I have the following output file:


I guess you mean input file?

> pear
> apple
> apple
> orange
> mango
> mango
> pear
> cherry
> apple
>
> ill would like the count the ammount of occurances for each fruit.
>
> I spent a few hours trying to do this and just gave up if someone can
> help me out with an example or a better way to do this than the
> method i'm trying to use
>
> This is as far as I got

[code snipped]

Sorry, this is so convoluted, I'm not even trying to understand what you may
have been thinking when writing it.

The following code works:

use warnings; use strict;
my %cnt;
open(LOG,"/tmp/fruits.txt") or die("Can't open file: $!\n");
while(<LOG>){
s/^\s*//; #remove leading white space
s/\s*$//; #remove trailing white space
$cnt{$_}++; #count this fruit
}
delete $cnt{''}; #delete empty key in case we picked up an empty line

for (keys(%cnt)){#print the whole set
print "$cnt{$_} $_\n";
}

jue


 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      06-03-2005
Jürgen Exner wrote:

> delete $cnt{''}; #delete empty key in case we picked up an empty line


Must remember that one, more readable then next if $line eq '';

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-03-2005
"Jürgen Exner" <(E-Mail Removed)> wrote in
news:ngRne.38532$GN3.26737@trnddc04:

> Rodrick Brown wrote:
>> Just learning Perl so bare with me.


I'd rather not be naked with strangers

....

>> This is as far as I got

> [code snipped]
>
> Sorry, this is so convoluted,


Agreed.

> while(<LOG>){
> s/^\s*//; #remove leading white space
> s/\s*$//; #remove trailing white space
> $cnt{$_}++; #count this fruit
> }
> delete $cnt{''}; #delete empty key in case we picked up an empty line


Or:

while(<LOG>) {
next unless /^\s*(\w+)\s*$/;
$cnt{$1}++;
}

Sinan
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      06-03-2005
A. Sinan Unur wrote:
> "Jürgen Exner" <(E-Mail Removed)> wrote in
>> s/^\s*//; #remove leading white space
>> s/\s*$//; #remove trailing white space

>
> Or:
> next unless /^\s*(\w+)\s*$/;


See
perldoc -q "strip blank"

Another difference between our solutions would be the handling of lines that
contain more than one single word, e.g. "green grapes" or "mini-tomatos".
Which behaviour the OP wants is everybody's guess.

jue


 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      06-03-2005
John Bokma <(E-Mail Removed)> wrote:
> Jürgen Exner wrote:
>
>> delete $cnt{''}; #delete empty key in case we picked up an empty line

>
> Must remember that one, more readable then next if $line eq '';



More readable than what I use too:

next unless length $line;


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      06-03-2005
Tad McClellan wrote:
> John Bokma <(E-Mail Removed)> wrote:
>> Jürgen Exner wrote:
>>
>>> delete $cnt{''}; #delete empty key in case we picked up an empty
>>> line

>>
>> Must remember that one, more readable then next if $line eq '';

>
> More readable than what I use too:
> next unless length $line;


I think my approach should be faster, too, because it eliminates the "if"
test for every single line.

jue


 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-03-2005
"Jürgen Exner" <(E-Mail Removed)> wrote in
news:V3Yne.1300$mb2.1255@trnddc07:

> A. Sinan Unur wrote:
>> "Jürgen Exner" <(E-Mail Removed)> wrote in
>>> s/^\s*//; #remove leading white space
>>> s/\s*$//; #remove trailing white space

>>
>> Or:
>> next unless /^\s*(\w+)\s*$/;

>
> See
> perldoc -q "strip blank"


Hasty post on my part. However, note a couple of differences between the
comparison in the FAQ and my suggestion:

Although the simplest approach would seem to be

$string =~ s/^\s*(.*?)\s*$/$1/;

not only is this unnecessarily slow and destructive, it also fails
with embedded newlines.

Well, my suggestion does not involve s///, so the bit about
'destructive' is not applicable. Embedded newlines also are not an issue
because we are reading line-by-line from a file. As for speed:

#! /usr/bin/perl

use strict;
use warnings;

use Benchmark ':all';

my $INPUT = [
'pear ',
' apple ',
'apple',
' orange ',
' mango ',
'mango',
' pear',
' cherry ',
'apple',
'',
];

sub capture {
my @input = @{ $INPUT };
my %counts;

for (@input) {
if( /^\s*(\w+)\s*$/ ) {
$counts{$1}++;
}
}
}

sub strip {
my @input = @{ $INPUT };
my %counts;

for (@input) {
s/^\s*//;
s/\s*$//;
$counts{$_}++;
}
delete $counts{''};
}

cmpthese 0, {
capture => \&capture,
strip => \&strip,
};

__END__

D:\Home> perl -v
This is perl, v5.8.6 built for MSWin32-x86-multi-thread

D:\Home> st
Rate capture strip
capture 29936/s -- -2%
strip 30640/s 2% --

OK, you have a point there (and I knew it even before I ran the
benchmark.

> Another difference between our solutions would be the handling of
> lines that contain more than one single word, e.g. "green grapes" or
> "mini-tomatos". Which behaviour the OP wants is everybody's guess.


On the other hand, *this* is the crux of the matter, isn't it? Being as
expressive as one can be (in Perl) about what part of the input string
one wants to use enables others to be able to figure out what the code
was meant to do. So, in that sense, me using (\w+) is not such a good
idea. After all, words really do not contain digits.

So, I might even use:

if( /^\s*([[:alpha:]]+)\s*$/ ) {
$counts{$1}++;
}

or even

my %accept = map { $_ => 1 } qw{pear apple mango cherry};

....

if( /^\s*(.+?)\s*$/ and $accept{$1}) {
$counts{$1}++;
}

This is even slower, but it allows me to count only the input I want to
count.

There is some value in that.

Sinan

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-03-2005
Jürgen Exner wrote:
> A. Sinan Unur wrote:
>> "Jürgen Exner" <(E-Mail Removed)> wrote in
>>>
>>> s/^\s*//; #remove leading white space
>>> s/\s*$//; #remove trailing white space

>>
>> Or:
>> next unless /^\s*(\w+)\s*$/;

>
> See
> perldoc -q "strip blank"


That FAQ entry comments on the s/// operator. Is that applicable to
capturing a value via the m// operator too?

> Another difference between our solutions would be the handling of lines that
> contain more than one single word, e.g. "green grapes" or "mini-tomatos".


while (<LOG>) { /(\S(?:.*\S))/ and $cnt{$1}++ or next }

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
counting number of (overlapping) occurances John Python 8 03-11-2006 09:13 AM
counting up instead of counting down edwardfredriks Javascript 6 09-07-2005 03:30 PM
counting word occurances Rodrick Brown Perl 2 06-02-2005 02:49 AM
Counting occurances of string A in string B, and adding it to string B Sandman Perl Misc 7 08-03-2004 08:46 PM
Multiple Occurances Of Value In String =?Utf-8?B?SmltIEhlYXZleQ==?= ASP .Net 4 06-29-2004 02:55 PM



Advertisments