Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > convenient module to take statistics for hashed structures?

Reply
Thread Tools

convenient module to take statistics for hashed structures?

 
 
ela
Guest
Posts: n/a
 
      03-13-2011
__DATA__
ID B C D E F G H
1 3 7 9 3 4 2 3
1 3 7 9 3 4 2 2
1 3 7 9 5 8 6 6
1 3 7 9 3 4 2 3
2 4 7 9 3 4 2 1
2 4 7 9 3 4 2 2
2 4 7 9 3 4 2 3
2 4 7 9 3 4 2 3

For each ID (the above example has two (1 and 2)), I want to identify the
"last common
ancestor: LCA, H being higher preference than B " based on some defined
threshold. If the threshold is set to 100%, then the LCA of ID 1 is D=9; if
set it to 75%, then it is F=2. For ID 2 (100%, G=2; 75%, G=2; 50% H=3)

While hash is good at allocating different instance easily, I don't know
whether perl supports simple architecture to get the max/min. For the
following code,

@array

For each row
for ($i=0; $i<$numcol; $i++)
$array[$i]{$key}++;


if just for B, I know the following should be written in this way:

foreach $key (keys %B) { $Bpertpot{$key} = $B{$key}/$total; }

for ($i=0; $i<$numcol; $i++) {
$maxcol[$i] = 0;
foreach $key (keys %Bpertpot) { if ($Bpertpot{$key}> $maxcol[$i]) {
$maxcol[$i] = $Bpertpot{$key}; }
}

but then I don't know how to do that for array of hash to traverse... i.e.
replace the %B and %Bpertpot to something that is compatible with the array
structure... In fact, I wonder if there is already well-established modules
that may have handled this kind of max-min statistics problems that seem to
encounter frequently in the business sector...



 
Reply With Quote
 
 
 
 
smallpond
Guest
Posts: n/a
 
      03-14-2011
On Mar 13, 11:32*am, "ela" <(E-Mail Removed)> wrote:
> __DATA__
> ID * *B * *C * *D * *E * *F * *G * *H
> 1 * *3 * *7 * *9 * *3 * *4 * *2 * *3
> 1 * *3 * *7 * *9 * *3 * *4 * *2 * *2
> 1 * *3 * *7 * *9 * *5 * *8 * *6 * *6
> 1 * *3 * *7 * *9 * *3 * *4 * *2 * *3
> 2 * *4 * *7 * *9 * *3 * *4 * *2 * *1
> 2 * *4 * *7 * *9 * *3 * *4 * *2 * *2
> 2 * *4 * *7 * *9 * *3 * *4 * *2 * *3
> 2 * *4 * *7 * *9 * *3 * *4 * *2 * *3
>
> For each ID (the above example has two (1 and 2)), I want to identify the
> "last common
> ancestor: LCA, H being higher preference than B " based on some defined
> threshold. If the threshold is set to 100%, then the LCA of ID 1 is D=9; if
> set it to 75%, then it is F=2. For ID 2 (100%, G=2; 75%, G=2; 50% H=3)
>
> While hash is good at allocating different instance easily, I don't know
> whether perl supports simple architecture to get the max/min. For the
> following code,
>
> @array
>
> For each row
> * * for ($i=0; $i<$numcol; $i++)
> * * * * $array[$i]{$key}++;
>
> if just for B, I know the following should be written in this way:
>
> foreach $key (keys %B) { * * $Bpertpot{$key} = $B{$key}/$total; **}
>
> for ($i=0; $i<$numcol; $i++) {
> $maxcol[$i] = 0;
> foreach $key (keys %Bpertpot) { if ($Bpertpot{$key}> $maxcol[$i]) {
> $maxcol[$i] = $Bpertpot{$key}; * *}
>
> }
>
> but then I don't know how to do that for array of hash to traverse... *i.e.
> replace the %B and %Bpertpot to something that is compatible with the array
> structure... In fact, I wonder if there is already well-established modules
> that may have handled this kind of max-min statistics problems that seem to
> encounter frequently in the business sector...


Have a look at List::Util which is a core module - it has max and
first
functions that you will find useful.

Any book on Perl will explain how to create and use a hash of arrays
or an
array of arrays.


 
Reply With Quote
 
 
 
 
George Mpouras
Guest
Posts: n/a
 
      03-14-2011

"ela" <(E-Mail Removed)> wrote in message
news:ilgvng$7ec$(E-Mail Removed)...
> __DATA__
> ID B C D E F G H
> 1 3 7 9 3 4 2 3
> 1 3 7 9 3 4 2 2
> 1 3 7 9 5 8 6 6
> 1 3 7 9 3 4 2 3
> 2 4 7 9 3 4 2 1
> 2 4 7 9 3 4 2 2
> 2 4 7 9 3 4 2 3
> 2 4 7 9 3 4 2 3
>
> For each ID (the above example has two (1 and 2)), I want to identify the
> "last common
> ancestor: LCA, H being higher preference than B " based on some defined
> threshold. If the threshold is set to 100%, then the LCA of ID 1 is D=9;
> if set it to 75%, then it is F=2. For ID 2 (100%, G=2; 75%, G=2; 50% H=3)
>
> While hash is good at allocating different instance easily, I don't know
> whether perl supports simple architecture to get the max/min. For the
> following code,
>
> @array
>
> For each row
> for ($i=0; $i<$numcol; $i++)
> $array[$i]{$key}++;
>
>
> if just for B, I know the following should be written in this way:
>
> foreach $key (keys %B) { $Bpertpot{$key} = $B{$key}/$total; }
>
> for ($i=0; $i<$numcol; $i++) {
> $maxcol[$i] = 0;
> foreach $key (keys %Bpertpot) { if ($Bpertpot{$key}> $maxcol[$i]) {
> $maxcol[$i] = $Bpertpot{$key}; }
> }
>
> but then I don't know how to do that for array of hash to traverse...
> i.e.
> replace the %B and %Bpertpot to something that is compatible with the
> array
> structure... In fact, I wonder if there is already well-established
> modules
> that may have handled this kind of max-min statistics problems that seem
> to
> encounter frequently in the business sector...
>
>
>



set it to 75%, then it is F=2
I do not see any 2 at F column. I have problem to undestand what you
mean/what you want.


 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      03-14-2011

"ela" <(E-Mail Removed)> wrote in message
news:ilkui6$mhk$(E-Mail Removed)...
>
> "George Mpouras" <(E-Mail Removed)> wrote in message
> news:ilklas$qos$(E-Mail Removed)...
>>
>> "ela" <(E-Mail Removed)> wrote in message
>> news:ilgvng$7ec$(E-Mail Removed)...
>>> __DATA__
>>> ID B C D E F G H
>>> 1 3 7 9 3 4 2 3
>>> 1 3 7 9 3 4 2 2
>>> 1 3 7 9 5 8 6 6
>>> 1 3 7 9 3 4 2 3
>>> 2 4 7 9 3 4 2 1
>>> 2 4 7 9 3 4 2 2
>>> 2 4 7 9 3 4 2 3
>>> 2 4 7 9 3 4 2 3
>>>
>>> For each ID (the above example has two (1 and 2)), I want to identify
>>> the "last common
>>> ancestor: LCA, H being higher preference than B " based on some defined
>>> threshold. If the threshold is set to 100%, then the LCA of ID 1 is D=9;
>>> if set it to 75%, then it is F=2. For ID 2 (100%, G=2; 75%, G=2; 50%
>>> H=3)
>>>
>>> While hash is good at allocating different instance easily, I don't know
>>> whether perl supports simple architecture to get the max/min. For the
>>> following code,
>>>
>>> @array
>>>
>>> For each row
>>> for ($i=0; $i<$numcol; $i++)
>>> $array[$i]{$key}++;
>>>
>>>
>>> if just for B, I know the following should be written in this way:
>>>
>>> foreach $key (keys %B) { $Bpertpot{$key} = $B{$key}/$total; }
>>>
>>> for ($i=0; $i<$numcol; $i++) {
>>> $maxcol[$i] = 0;
>>> foreach $key (keys %Bpertpot) { if ($Bpertpot{$key}> $maxcol[$i]) {
>>> $maxcol[$i] = $Bpertpot{$key}; }
>>> }
>>>
>>> but then I don't know how to do that for array of hash to traverse...
>>> i.e.
>>> replace the %B and %Bpertpot to something that is compatible with the
>>> array
>>> structure... In fact, I wonder if there is already well-established
>>> modules
>>> that may have handled this kind of max-min statistics problems that seem
>>> to
>>> encounter frequently in the business sector...
>>>
>>>
>>>

>>
>>
>> set it to 75%, then it is F=2
>> I do not see any 2 at F column. I have problem to undestand what you
>> mean/what you want.

> Thanks for correcting the mistake. It is G=2 (2,2,6,2; so fulfilling the
> 75% requirement) and not F=2. Always check from H (or the last column
> first). H's majority is 3, for only 50% abundant, and then look up one by
> one (F, E, D, ...). Each ID (without knowing how many incidents
> beforehand) has to repeat the same process again and again.
>



#!/usr/bin/perl
#
# ok here is your homework .
# next time try not cheat , because even if
# you pass the lesson, will not learn !



my %col;
my %data;
ReadData();

$_ = query(1,100);
print "id=1, thr=100% -> Field=$_->[0],Value=@{$_->[1]}\n";

$_ = query(1,75);
print "id=1, thr=100% -> Field=$_->[0],Value=@{$_->[1]}\n";

$_ = query(2,100);
print "id=2, thr=100% -> Field=$_->[0],Value=@{$_->[1]}\n";

$_ = query(2,75);
print "id=2, thr=75% -> Field=$_->[0],Value=@{$_->[1]}\n";

$_ = query(2,50);
print "id=2, thr=50% -> Field=$_->[0],Value=@{$_->[1]}\n";

$_ = query(2,25);
print "id=2, thr=25% -> Field=$_->[0],Value=@{$_->[1]}\n";



sub ReadData {
while(<DATA>){
chomp;
my @a = split /\s+/;
unless (exists $col{1}){@col{1..$#a}=@a[1..$#a];next}
++$data{$a[0]}->{lines};
for(my $i=1;$i<=$#a;$i++){
++$data{$a[0]}->{field}->{$col{$i}}->{data}->{$a[$i]} } }
foreach my $id (keys %data) {
foreach my $field (keys %{$data{$id}->{field}} ) {
foreach my $item (keys %{$data{$id}->{field}->{$field}->{data}} ) {
push @{ $data{$id}->{field}->{$field}->{rank}->{ 100*(
$data{$id}->{field}->{$field}->{data}->{$item} / $data{$id}->{lines} ) } } ,
$item}}}
#use Data:umper; print Dumper(\%data);exit;
}


sub query {
my ($id,$rank)=@_;
foreach my $field (reverse sort keys %col) {
if ( exists $data{$id}->{field}->{$col{$field}}->{rank}->{ $rank } ) {
return [ $col{$field},
$data{$id}->{field}->{$col{$field}}->{rank}->{$rank}] }
}
['',[]]
}




__DATA__
ID B C D E F G H
1 3 7 9 3 4 2 3
1 3 7 9 3 4 2 2
1 3 7 9 5 8 6 6
1 3 7 9 3 4 2 3
2 4 7 9 3 4 2 1
2 4 7 9 3 4 2 2
2 4 7 9 3 4 2 3
2 4 7 9 3 4 2 3









 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      03-14-2011
Sorry for the silly joke at the comment.


> $col{1} #what does 1 refer to?

this is a just a check to see if we are reading the first line with the
column names


> $#a

is the last item index of an array. Synonymous are
$array[ -1 + scalar @array ]
$array[-1]

> @col{1..$#a} #array of hash?

This is called hash slice; used to create a hash from an array
my @array = qw/a b c/
my %hash = ();
@hash{ @array } = some values


> @a[1..$#a] #array of what?

Oh some array elements
@array[2..4] -> $array[2], $array[3], $array[4]


>
> ++$data{$a[0]}->{lines}; #hash of hash? and an arbitrary name "line" is
> given?

Lets keep the total lines of every ID to a hash reference with key "lines"



> ++$data{$a[0]}->{field}->{$col{$i}}->{data}->{$a[$i]} } } #oh, this line
> is really... hard to know why arrow can be used again and again....

Arrows are not neccassery, but I found them beautifull
we want to keep our data isolate to a different sub-hash with key data


> push @{ $data{$id}->{field}->{$field}->{rank}->{ 100*(
> $data{$id}->{field}->{$field}->{data}->{$item} / $data{$id}->{lines} ) } } ,
> $item}}} #what advantage of using push here?


Here we want to keep all the occasions with the same threshold !
So we if for example there are four different numbers , we can report
back all of the, if the questioned threshold is 25%


> ['',[]] #what is this...?!

This the default answer if no threshiold is found. They are to items the
'' , and an empty array representing the (no) found values.



If you check what I ve done you will find out that it can be re-written
to be almost 10 times faster, but it is goog enough for a start.


Peace.
 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      03-14-2011
uncomment the line
#use Data:umper; print Dumper(\%data);exit;
and you will undestand the underlying logic by your own.

 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      03-14-2011
George Mpouras wrote:
> Sorry for the silly joke at the comment.
>
>
> [ snip ]
>
>
>> @col{1..$#a} #array of hash?

> This is called hash slice;


Correct.

> used to create a hash


Only my() can create a hash.

Used to add keys and values to a hash.

> from an array


from a LIST of keys and a LIST of values.




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
 
Reply With Quote
 
news.ntua.gr
Guest
Posts: n/a
 
      03-14-2011


Ο "John W. Krahn" *γραψε στο μήνυμα
news:gqwfp.59406$(E-Mail Removed)...

>
>> @col{1..$#a} #array of hash?

> This is called hash slice;


Correct.

> used to create a hash


Only my() can create a hash.


I thought that local, our, state, could also do the job

 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-14-2011
>>>>> "nng" == news ntua gr <(E-Mail Removed)> writes:

nng> Ο "John W. Krahn" *γραψε στο μήνυμα
nng> news:gqwfp.59406$(E-Mail Removed)...

>>
>>> @col{1..$#a} #array of hash?

>> This is called hash slice;


nng> Correct.

>> used to create a hash


nng> Only my() can create a hash.

nng> I thought that local, our, state, could also do the job

our doesn't create a variable. it only creates a lexical alias to the
variable of the same name in the current package.

local doesn't create a variable. it pushes the value of a variable and
allows for a new value to be put in its place.

state variables are just like my but they don't get reinitialized when
the enclosing block is entered.

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
ela
Guest
Posts: n/a
 
      03-15-2011

"George Mpouras" <(E-Mail Removed)> wrote in message
news:ilklas$qos$(E-Mail Removed)...
>
> "ela" <(E-Mail Removed)> wrote in message
> news:ilgvng$7ec$(E-Mail Removed)...
>> __DATA__
>> ID B C D E F G H
>> 1 3 7 9 3 4 2 3
>> 1 3 7 9 3 4 2 2
>> 1 3 7 9 5 8 6 6
>> 1 3 7 9 3 4 2 3
>> 2 4 7 9 3 4 2 1
>> 2 4 7 9 3 4 2 2
>> 2 4 7 9 3 4 2 3
>> 2 4 7 9 3 4 2 3
>>
>> For each ID (the above example has two (1 and 2)), I want to identify the
>> "last common
>> ancestor: LCA, H being higher preference than B " based on some defined
>> threshold. If the threshold is set to 100%, then the LCA of ID 1 is D=9;
>> if set it to 75%, then it is F=2. For ID 2 (100%, G=2; 75%, G=2; 50% H=3)
>>
>> While hash is good at allocating different instance easily, I don't know
>> whether perl supports simple architecture to get the max/min. For the
>> following code,
>>
>> @array
>>
>> For each row
>> for ($i=0; $i<$numcol; $i++)
>> $array[$i]{$key}++;
>>
>>
>> if just for B, I know the following should be written in this way:
>>
>> foreach $key (keys %B) { $Bpertpot{$key} = $B{$key}/$total; }
>>
>> for ($i=0; $i<$numcol; $i++) {
>> $maxcol[$i] = 0;
>> foreach $key (keys %Bpertpot) { if ($Bpertpot{$key}> $maxcol[$i]) {
>> $maxcol[$i] = $Bpertpot{$key}; }
>> }
>>
>> but then I don't know how to do that for array of hash to traverse...
>> i.e.
>> replace the %B and %Bpertpot to something that is compatible with the
>> array
>> structure... In fact, I wonder if there is already well-established
>> modules
>> that may have handled this kind of max-min statistics problems that seem
>> to
>> encounter frequently in the business sector...
>>
>>
>>

>
>
> set it to 75%, then it is F=2
> I do not see any 2 at F column. I have problem to undestand what you
> mean/what you want.

Thanks for correcting the mistake. It is G=2 (2,2,6,2; so fulfilling the 75%
requirement) and not F=2. Always check from H (or the last column first).
H's majority is 3, for only 50% abundant, and then look up one by one (F, E,
D, ...). Each ID (without knowing how many incidents beforehand) has to
repeat the same process again and again.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert hashed password to 128-bit binary =?Utf-8?B?QmlsbCBCb3Jn?= ASP .Net 2 04-22-2005 09:23 PM
Advice on converting hashed packages to pseudo-hashed packages Ian Perl Misc 3 02-12-2005 12:17 AM
Importing 80+ hashed and 1 array into several perl scripts Matt Breedlove Perl 1 11-24-2003 09:47 PM
hashed array in array need the keys... and length Daniel Perl 1 08-14-2003 06:49 PM
Is Cookies hashed by default Tommy ASP .Net 0 08-06-2003 06:26 PM



Advertisments