Velocity Reviews > Perl > Identifying common elements from multiple hashes

# Identifying common elements from multiple hashes

Neil
Guest
Posts: n/a

 12-15-2005
Hello,

I'm new to this group and I greatly need and would appreciate your
help. I am trying to write a program that will compare multiple
hashes. Each hash is basically a table of 'x' and 'y' values. For
example, Table 1 could look something like this:

Table 1
x-values y-values
1 10
5 21
11 1000
17 43
21 10000

First, the program needs to identify the values of 'x' that appear in
all 'n' hashes. Once identified, it needs to compute the average of
the y-values that correspond to that x-value.
This is better illustrated by an example. Consider, Table 2:

Table 2
x-values y-values
1 8
7 21
12 1000
17 45
22 10000

Both Table 1 and Table 2 (hashes) contain the x-values: 1 and 17.
Therefore, Table 3 (generated by the program) would be:

Table 3
x-values y-values
1 9 # as (10 + /2 = 9
17 44 # as (43 + 45)/2 = 44

This program needs to be able to be extended to any number of hashes
though.
If someone could just outline the general approach, I would be greatly
indebted. Thank you so much.

Sincerely, Neal

usenet@DavidFilmer.com
Guest
Posts: n/a

 12-15-2005
Neil wrote:
> Hello,
>
> I'm new to this group and I greatly need and would appreciate your help.

Welcome to comp.lang.perl.misc. Being new to the group, you may not be
aware that many of the regular posters here encourage you to read and
abide by the group posting guidelines, which you may read on the web
here:
These guidelines are for YOUR benefit, because they show you how to ask
a good question which is highly likely to get a good answer.

> I am trying to write a program that will compare multiple hashes.
> Each hash is basically a table of 'x' and 'y' values. For example...

You are speaking English. It is hard to understand exactly what your
data structure looks like. That's why the posting guidelines encourage
you to speak Perl (ie, show us the code that creates your hashes, or
show us a Data:umper representation of the hash).

> First, the program needs to identify the values of 'x' that appear in
> all 'n' hashes. Once identified, it needs to compute the average of
> the y-values that correspond to that x-value.

I hope you're familiar with CPAN; I would refer you to the
List::Compare module to easily compute the intersection of your keys:

http://search.cpan.org/~jkeenan/List...ist/Compare.pm

This works (but it may not mimic your data structure, since you didn't

#!/usr/bin/perl
use warnings; use strict;
use List::Compare;

my %hash1 = qw/1 10 5 21 11 1000 17 43 21 10000/;
my %hash2 = qw/1 8 7 21 12 1000 17 45 22 10000/;

my \$lc = List::Compare -> new([keys %hash1], [keys %hash2]);

my%hash3;
for (\$lc->get_intersection) {
\$hash3{\$_} = (\$hash1{\$_} + \$hash2{\$_}) /2;
print "\$_\t\$hash3{\$_}\n";
}

__END__

John W. Krahn
Guest
Posts: n/a

 12-16-2005
Neil wrote:
>
> I'm new to this group and I greatly need and would appreciate your
> help. I am trying to write a program that will compare multiple
> hashes. Each hash is basically a table of 'x' and 'y' values. For
> example, Table 1 could look something like this:
>
> Table 1
> x-values y-values
> 1 10
> 5 21
> 11 1000
> 17 43
> 21 10000
>
> First, the program needs to identify the values of 'x' that appear in
> all 'n' hashes. Once identified, it needs to compute the average of
> the y-values that correspond to that x-value.
> This is better illustrated by an example. Consider, Table 2:
>
> Table 2
> x-values y-values
> 1 8
> 7 21
> 12 1000
> 17 45
> 22 10000
>
> Both Table 1 and Table 2 (hashes) contain the x-values: 1 and 17.
> Therefore, Table 3 (generated by the program) would be:
>
> Table 3
> x-values y-values
> 1 9 # as (10 + /2 = 9
> 17 44 # as (43 + 45)/2 = 44
>
> This program needs to be able to be extended to any number of hashes
> though.
> If someone could just outline the general approach, I would be greatly
> indebted. Thank you so much.

\$ perl -e'
my %hash1 = qw/
1 10
5 21
11 1000
17 43
21 10000
/;
my %hash2 = qw/
1 8
7 21
12 1000
17 45
22 10000
/;

use Data:umper;
my @hashes = \( %hash1, %hash2 );

my %common_keys;
for my \$hash_ref ( @hashes ) {
\$common_keys{ \$_ }++ for keys %\$hash_ref;
}

my %averages;
for my \$hash_ref ( @hashes ) {
\$averages{ \$_ } += \$hash_ref->{ \$_ } for grep \$common_keys{ \$_ } ==
@hashes, keys %common_keys;
}
\$_ /= @hashes for values %averages;

print Dumper \%averages;
'
\$VAR1 = {
'1' => '9',
'17' => '44'
};

John
--
use Perl;
program
fulfillment

Neil
Guest
Posts: n/a

 12-16-2005
Hi everyone,

Thank you to those who gave me your excellent suggestions. My problem
is that I want to analyze 'n' number of tables that are composed of 'x'
and 'y' values and make a new hash that consists of only 'x' values
that are common to all 'n' hashes and that values that correpond to the
key 'x' is an average of all 'n' y values. The code that I am
including below illustrates how I generate the data structure that I
want to analyze. I tried to use the module List::Compare, but this
module requires that you use the format as given:

@Al = qw(abel abel baker camera delta edward fargo golfer);
@Bob = qw(baker camera delta delta edward fargo golfer hilton);
@Carmen = qw(fargo golfer hilton icon icon jerky kappa);
@Don = qw(fargo icon jerky);
@Ed = qw(fargo icon icon jerky);

\$lcm = List::Compare->new(\@Al, \@Bob, \@Carmen, \@Don, \@Ed);

However, if you will look at the way that I am generating my data
structure and how the number of lists can be totally different each
time, it is apparent that this method will not work. I would really
appreciate it if someone could please offer me some advice. Thank you
so much.

#!/usr/bin/perl
use List::Compare;
print "Please enter the number of related tables that you want to
analyze.";
print "\n";

my \$num_of_spectra = <STDIN>;
chomp \$num_of_spectra;

my @dtafilename_array;
\$c1=0;
\$c2=0;
\$c3=0;

while (\$c1 < \$num_of_spectra) {
print "Please enter the name of the .dta file.\n";
\$dtafilename_array[\$c1] = <STDIN>;
chomp \$dtafilename_array[\$c1];
\$c1++;
}

foreach \$dta (@dtafilename_array) {

\$dtafile = \$dtafilename_array[\$c2];

unless ( -e \$dtafile) {
print "File \"\$dtafile\" doesn\'t seem to exist!!\n";
exit;
}
unless ( open(DTAFILE, \$dtafile) ) {
print "Cannot open file \"\$dtafile\"\n\n";
exit;
}

\$c2++;

while (\$dtafileline = <DTAFILE>) {
chomp \$dtafileline;
@columns = split("\t", \$dtafileline);
\$ms_data[\$c3]{\$columns[0]} = \$columns[1];
}

\$c3++;

close DTAFILE;

}

# Here is some sample data.

File 1:

6 100
7 95
8 96

File 2:

6 109
11 87
12 45

File 3:

6 103
7 87
15 43

Neil
Guest
Posts: n/a

 12-16-2005
Thank you very much, David, John and Jim!