Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > finding common words

Reply
Thread Tools

finding common words

 
 
viv2k
Guest
Posts: n/a
 
      02-22-2004
I'm new to Perl but a friend has told me that the task I wana do can
be done very efficiently by Perl.

Right now, I have two very long lists of words with comments attached
to each of the words and I want to find the words that are common in
both of them.

example:

ListA ListB

apple 1.1 apple 100
banana 2.2 boy 500
cat 3.3 cat 1000

And I want the result to look something like:

ListA ListB
apple 1.1 apple 100
cat 3.3 cat 1000

any idea on how to tackle this? I'm off to reading some Perl
introductory books now koz I really need that list asap. Any help
would be greatly appreciated.

thanks
viv
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-22-2004
viv2k wrote:
> Right now, I have two very long lists of words with comments
> attached to each of the words and I want to find the words that are
> common in both of them.
>
> example:
>
> ListA ListB
>
> apple 1.1 apple 100
> banana 2.2 boy 500
> cat 3.3 cat 1000
>
> And I want the result to look something like:
>
> ListA ListB
> apple 1.1 apple 100
> cat 3.3 cat 1000


my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
my %ListB = ( apple => 100, boy => 500, cat => 1000 );

for (keys %ListA) {
delete $ListA{$_} unless exists $ListB{$_};
}
for (keys %ListB) {
delete $ListB{$_} unless exists $ListA{$_};
}

print "ListA\n";
print "$_\t$ListA{$_}\n" for sort keys %ListA;
print "\n";
print "ListB\n";
print "$_\t$ListB{$_}\n" for sort keys %ListB;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
 
 
 
David K. Wall
Guest
Posts: n/a
 
      02-23-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (viv2k) wrote:

> Right now, I have two very long lists of words with comments attached
> to each of the words and I want to find the words that are common in
> both of them.


perldoc -q intersection

Gunnar Hjalmarsson has already posted working code. I just wanted to point to
the FAQ entry, as the code there is readily adapted to the above problem.

--
David Wall
 
Reply With Quote
 
Matt Garrish
Guest
Posts: n/a
 
      02-23-2004

"Gunnar Hjalmarsson" <(E-Mail Removed)> wrote in message
news:c1b7qa$1fpfks$(E-Mail Removed)-berlin.de...
> viv2k wrote:
> > Right now, I have two very long lists of words with comments
> > attached to each of the words and I want to find the words that are
> > common in both of them.
> >
> > example:
> >
> > ListA ListB
> >
> > apple 1.1 apple 100
> > banana 2.2 boy 500
> > cat 3.3 cat 1000
> >
> > And I want the result to look something like:
> >
> > ListA ListB
> > apple 1.1 apple 100
> > cat 3.3 cat 1000

>
> my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
> my %ListB = ( apple => 100, boy => 500, cat => 1000 );
>
> for (keys %ListA) {
> delete $ListA{$_} unless exists $ListB{$_};
> }
> for (keys %ListB) {
> delete $ListB{$_} unless exists $ListA{$_};
> }
>


I've never been a big fan of looping over both sets like that (especially if
you want to retain the original hashes). If you just want to extract the
common elements, my personal preference would be to do something like the
following instead:

my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
my %ListB = ( apple => 100, boy => 500, cat => 1000 );
my %common;

for (keys %ListA) {
if ($ListB{$_}) { $common{$_} = [$ListA{$_}, $ListB{$_}] };
}

print "ListA\n";
print "$_\t$common{$_}[0]\n" for sort keys %common;

print "\nListB\n";
print "$_\t$common{$_}[1]\n" for sort keys %common;


But there's certainly nothing wrong with your code...

Matt


 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-23-2004
David K. Wall wrote:
> (E-Mail Removed) (viv2k) wrote:
>> Right now, I have two very long lists of words with comments
>> attached to each of the words and I want to find the words that
>> are common in both of them.

>
> perldoc -q intersection
>
> Gunnar Hjalmarsson has already posted working code. I just wanted
> to point to the FAQ entry, as the code there is readily adapted to
> the above problem.


Well, applying that FAQ entry, you could do something like this:

my ($elem, %count, %intersection);
for $elem (keys %ListA, keys %ListB) { $count{$elem}++ }
for $elem (keys %count) {
$intersection{$elem} = "$ListA{$elem} : $ListB{$elem}"
if $count{$elem} > 1;
}
print "$_\t$intersection{$_}\n" for sort keys %intersection;

But provided that it makes sense to start with populating two hashes
with the lists of words + comments, I don't really see that the FAQ
entry is very well adapted, since applying it would not take advantage
of the initial hashes.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-23-2004
Matt Garrish wrote:
> I've never been a big fan of looping over both sets like that
> (especially if you want to retain the original hashes). If you just
> want to extract the common elements, my personal preference would
> be to do something like the following instead:
>
> my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
> my %ListB = ( apple => 100, boy => 500, cat => 1000 );
> my %common;
>
> for (keys %ListA) {
> if ($ListB{$_}) { $common{$_} = [$ListA{$_}, $ListB{$_}] };
> }
>
> print "ListA\n";
> print "$_\t$common{$_}[0]\n" for sort keys %common;
>
> print "\nListB\n";
> print "$_\t$common{$_}[1]\n" for sort keys %common;
>
>
> But there's certainly nothing wrong with your code...


Maybe not, but I like your solution. It would only require looping
through one of the lists.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
David K. Wall
Guest
Posts: n/a
 
      02-23-2004
Gunnar Hjalmarsson <(E-Mail Removed)> wrote:

> David K. Wall wrote:
>> (E-Mail Removed) (viv2k) wrote:
>>> Right now, I have two very long lists of words with comments
>>> attached to each of the words and I want to find the words that
>>> are common in both of them.

>>
>> perldoc -q intersection
>>
>> Gunnar Hjalmarsson has already posted working code. I just wanted
>> to point to the FAQ entry, as the code there is readily adapted to
>> the above problem.

>
> Well, applying that FAQ entry, you could do something like this:
>
> my ($elem, %count, %intersection);
> for $elem (keys %ListA, keys %ListB) { $count{$elem}++ }
> for $elem (keys %count) {
> $intersection{$elem} = "$ListA{$elem} : $ListB{$elem}"
> if $count{$elem} > 1;
> }
> print "$_\t$intersection{$_}\n" for sort keys %intersection;
>
> But provided that it makes sense to start with populating two hashes
> with the lists of words + comments, I don't really see that the FAQ
> entry is very well adapted, since applying it would not take advantage
> of the initial hashes.


How about this?

my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
my %ListB = ( apple => 100, boy => 500, cat => 1000 );
my %count;
my (%count, @intersection);
for my $element (keys %ListA, keys %ListB) {
push @intersection, $element if ++$count{$element} > 1;
}
my (%new_ListA, %new_ListB);
@new_ListA{@intersection} = @ListA{@intersection};
@new_ListB{@intersection} = @ListB{@intersection};

It's a bit different from the FAQ, but was directly inspired by it...
<shrug>

If memory is a consideration I'd go with deleting the non-intersecting
elements. Neat idea, and one that didn't occur to me.

--
David Wall
 
Reply With Quote
 
David K. Wall
Guest
Posts: n/a
 
      02-23-2004
"David K. Wall" <(E-Mail Removed)> wrote:

> my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
> my %ListB = ( apple => 100, boy => 500, cat => 1000 );
> my %count;
> my (%count, @intersection);


Oops, extra declaration. I originally wrote it as a for() loop followed by a
grep(), but then saw that the grep() could be eliminated and just re-pasted
the part I changed.

> for my $element (keys %ListA, keys %ListB) {
> push @intersection, $element if ++$count{$element} > 1;
> }
> my (%new_ListA, %new_ListB);
> @new_ListA{@intersection} = @ListA{@intersection};
> @new_ListB{@intersection} = @ListB{@intersection};

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-23-2004
David K. Wall wrote:
> How about this?
>
> my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
> my %ListB = ( apple => 100, boy => 500, cat => 1000 );
> my (%count, @intersection);
> for my $element (keys %ListA, keys %ListB) {
> push @intersection, $element if ++$count{$element} > 1;
> }
> my (%new_ListA, %new_ListB);
> @new_ListA{@intersection} = @ListA{@intersection};
> @new_ListB{@intersection} = @ListB{@intersection};
>
> It's a bit different from the FAQ, but was directly inspired by
> it... <shrug>


I still don't think that the FAQ approach is good here. The FAQ deals
with arrays, and since we are starting with hashes here, you'd better
take advantage of the ability to look up elements in a hash.

> If memory is a consideration


The FAQ approach is indeed memory expensive. OP mentioned "two very
long lists of words", and this solution creates 6(!) variables with
lists: %ListA, %ListB, %count, @intersection, %new_ListA and
%new_ListB.

> I'd go with deleting the non-intersecting elements. Neat idea, and
> one that didn't occur to me.


If you want to keep the original hashes intact, I personally find
Matt's solution to be the neatest.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
viv2k
Guest
Posts: n/a
 
      02-23-2004
Thanks for the tips. But does that mean I will have to manually write
and create the two lists before comparing them?

Because my list is actually a txt file where after each word there is
a space and then the number associated and then a comma and then the
next word and so on. Example:

apple 1.1, banana 2.2, cat 3.3, etc

Is there a way of taking the whole text file as input and using
'space' and 'comma' as delimiters to do the task?

Thanks
viv

Gunnar Hjalmarsson <(E-Mail Removed)> wrote in message news:<c1b7qa$1fpfks$(E-Mail Removed)-berlin.de>...
> my %ListA = ( apple => 1.1, banana => 2.2, cat => 3.3 );
> my %ListB = ( apple => 100, boy => 500, cat => 1000 );
>
> for (keys %ListA) {
> delete $ListA{$_} unless exists $ListB{$_};
> }
> for (keys %ListB) {
> delete $ListB{$_} unless exists $ListA{$_};
> }
>
> print "ListA\n";
> print "$_\t$ListA{$_}\n" for sort keys %ListA;
> print "\n";
> print "ListB\n";
> print "$_\t$ListB{$_}\n" for sort keys %ListB;

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can *common* struct-members of 2 different struct-types, that are thesame for the first common members, be accessed via pointer cast to either struct-type? John Reye C Programming 28 05-08-2012 12:24 AM
Finding K most common words from a collection of Documents. AbidDF C++ 0 02-19-2010 07:39 AM
java.lang.NoSuchMethodError: wm.common.session.Common.getCustRptListFromMax Denny Java 1 05-01-2008 07:33 AM
Spelling suggestions for common words - ispell, etc. sftriman Perl Misc 3 04-04-2008 03:58 PM
remove common words from a string sfu Java 15 09-14-2003 03:42 PM



Advertisments