Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   restrict a hash to 15 pairs and iterate over it (http://www.velocityreviews.com/forums/t909730-restrict-a-hash-to-15-pairs-and-iterate-over-it.html)

Marek 02-15-2009 07:51 AM

restrict a hash to 15 pairs and iterate over it
 


Hello all!


I am still a beginner, so please be patient with me.

I have a big file with numbers and dates like follows here:


01.01.98
31
33
14
7
35
16
20
20
13
55
1
1
7


etc etc

I need a complicate hash to know the occurrences of numbers in a scope
of 15:

We skip the dates, and we count the lines. The structure of my %hash
looks like follows:

($number{$line, $line, ...}) => $how_many_times

In my example the 20 occurs in line 7 and 8 -> two times:

20{7,8} => 2

And we iterate over it, and keep only 15 numbers in the hash and count
each time the occurrences of each number.

Could somebody help me with this?


Thank you in advance


marek

Marek 02-15-2009 12:21 PM

Re: restrict a hash to 15 pairs and iterate over it
 
On Feb 15, 10:06 am, Ben Morrow <b...@morrow.me.uk> wrote:

>
> What do you mean by 'the occurrences of numbers'? Do you mean the number
> of times each number shows up, so for a list like
>
> 1 2 1 4 2
>
> you would get the results
>
> 1: 2 times
> 2: 2 times
> 4: 1 time
>


yes! :-)

>
> If you need a data structure like this (I'm not yet sure whether you do)
> you want something more like
>
> my %numbers = (
> 20 => [7, 8]
> );
>
> You don't need to record the count separately: Perl arrays know how long
> they are.
>


Thank you! Good idea!

> How do you choose which 15 numbers to keep?
>


>
> Post the code you've got so far, in as close to working condition as you
> can get it (that is, make sure there aren't any syntax errors, or bits
> left over from things you've tried before).
>
> Ben


Thank you Ben for your patience!

1. You are asking for a code. But I am ashamed to post it here,
because it is too childish. I have tried with an array of a reference
to a hash.

2. What I need, you guessed already well:

I have a large file with many numbers in it. Each line a number and
sometimes some dates. I read in from the beginning 15 numbers, here
separated with a tab:

1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^


So the first step would be to read in the numbers until 15th line and
see if there are double numbers or triple numbers. In my example here
there are two 1 and two 2 ... Then I do something with this result and
I read in the next 15 numbers starting with 2 ...


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

next step we read in starting with 3


1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
^ ^

But you gave already a valuable hint: we only need a hash of each

%number = (1 => [1,5]);

And see how long is the anonymous array. No need to add one level more
to each number and counting their occurrences.

Hope this is clearer now? Thank you again!



marek

Tad J McClellan 02-15-2009 03:14 PM

Re: restrict a hash to 15 pairs and iterate over it
 
Marek <mstep@podiuminternational.org> wrote:
> On Feb 15, 10:06 am, Ben Morrow <b...@morrow.me.uk> wrote:
>
>>
>> so for a list like
>>
>> 1 2 1 4 2
>>
>> you would get the results
>>
>> 1: 2 times
>> 2: 2 times
>> 4: 1 time
>>

>
> yes! :-)



> So the first step would be to read in the numbers until 15th line and
> see if there are double numbers or triple numbers. In my example here
> there are two 1 and two 2 ... Then I do something with this result and
> I read in the next 15 numbers starting with 2 ...
>
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^
>
> next step we read in starting with 3
>
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^
>
> But you gave already a valuable hint: we only need a hash of each
>
> %number = (1 => [1,5]);



If you instead use a hash that buffers 15 elements at a time, then
you can generate a hash with the counts directly, as you don't seem
to want to know the actual line numbers...

I thinks this does what you are asking for:


-----------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my $size = 5; # 5 instead of 15
my $line = 0; # "line" counter
my %lines; # buffer up $size lines

while ( <DATA> ) {
next if /\./; # skip dates
chomp;
$line++;

$lines{$line} = $_;

if ( keys %lines == $size ) {

print Dumper \%lines; # for debugging

# count what is in the buffer
my %nums;
$nums{$_}++ for values %lines;

# display what is in the (counted) buffer
foreach my $num ( sort { $a <=> $b } keys %nums ) {
printf "%3d: %3d times\n", $num, $nums{$num};
}
print "---------\n";

# maintain buffer size
delete $lines{ $line - $size + 1};
}
}

__DATA__
01.01.98
31
33
01.02.98
01.03.98
14
7
35
16
20
20
13
55
1
1
7
-----------------------------


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Xho Jingleheimerschmidt 02-15-2009 07:06 PM

Re: restrict a hash to 15 pairs and iterate over it
 
Marek wrote:
> On Feb 15, 10:06 am, Ben Morrow <b...@morrow.me.uk> wrote:
>
>> What do you mean by 'the occurrences of numbers'? Do you mean the number
>> of times each number shows up, so for a list like
>>
>> 1 2 1 4 2
>>
>> you would get the results
>>
>> 1: 2 times
>> 2: 2 times
>> 4: 1 time
>>

>
> yes! :-)
>
>> If you need a data structure like this (I'm not yet sure whether you do)
>> you want something more like
>>
>> my %numbers = (
>> 20 => [7, 8]
>> );
>>
>> You don't need to record the count separately: Perl arrays know how long
>> they are.
>>

>
> Thank you! Good idea!


If you need to know both the counts of occurence and the locations, then
this is a good idea. The list of locations automatically includes the
count of occurences. But if *only* need the count, then I wouldn't
store the list of locations as well.


>
> I have a large file with many numbers in it. Each line a number and
> sometimes some dates. I read in from the beginning 15 numbers, here
> separated with a tab:
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^
>
>
> So the first step would be to read in the numbers until 15th line and
> see if there are double numbers or triple numbers. In my example here
> there are two 1 and two 2 ... Then I do something with this result and
> I read in the next 15 numbers starting with 2 ...
>
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^


So it is a sliding window.

my @window;
my %count;
while (<>) { chomp;
next if looks_like_date_not_number($_);
push @window, $_;
$count{$_}++;
if (@window>15) {
## window is too big, get rid of the first one;
$count{$window[0]}--;
shift @window;
};
if (@window==15) {
# do whatever you want to do with %count
}
};

Xho

Peter J. Holzer 02-15-2009 08:30 PM

Re: restrict a hash to 15 pairs and iterate over it
 
On 2009-02-15 12:21, Marek <mstep@podiuminternational.org> wrote:
> I have a large file with many numbers in it. Each line a number and
> sometimes some dates. I read in from the beginning 15 numbers, here
> separated with a tab:
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^
>
>
> So the first step would be to read in the numbers until 15th line and
> see if there are double numbers or triple numbers. In my example here
> there are two 1 and two 2 ... Then I do something with this result and
> I read in the next 15 numbers starting with 2 ...
>
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^
>
> next step we read in starting with 3
>
>
> 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17
> ^ ^


So you have a sliding window of 15 lines, and you always want to know
how often a number occurs within that window? That is you want an output
similar to this:

line 1 - 15:
1: 2
2: 2
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
9: 1
10: 1
11: 1
12: 1
13: 1
line 2 - 16:
1: 1
2: 2
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
9: 1
10: 1
11: 1
12: 1
13: 1
14: 1
line 3 - 17:
1: 1
2: 1
3: 1
4: 1
5: 1
6: 1
7: 1
8: 1
9: 1
10: 1
11: 1
12: 1
13: 1
14: 1
15: 1

One way to achieve this is to keep the window in an array. Add new lines
with push, and remove old lines with shift. For each line added,
increment the count of the corresponding number. For each line removed,
decrement it. The counts can be kept in an array or a hash.

hp

Marek 02-15-2009 08:51 PM

Re: restrict a hash to 15 pairs and iterate over it
 


Wow! I am impressed! I would never have found these solutions on
myself!

Special thanx to Tad; so short and elegant! I love these lines:

if ( keys %lines == $size )

and

$nums{$_}++ for values %lines;

and this one is my favourite :-)

delete $lines{ $line - $size + 1};

Also Xho's suggestion is really tricky! Thank you!

Good evening to all!


marek


sln@netherlands.com 02-16-2009 03:25 AM

Re: restrict a hash to 15 pairs and iterate over it
 
On Sat, 14 Feb 2009 23:51:34 -0800 (PST), Marek <mstep@podiuminternational.org> wrote:

>
>
>Hello all!
>
>
>I am still a beginner, so please be patient with me.
>
>I have a big file with numbers and dates like follows here:
>
>
>01.01.98
>31
>33
>14
>7
>35
>16
>20
>20
>13
>55
>1
>1
>7
>
>
>etc etc
>
>I need a complicate hash to know the occurrences of numbers in a scope
>of 15:
>
>We skip the dates, and we count the lines. The structure of my %hash
>looks like follows:
>
>($number{$line, $line, ...}) => $how_many_times
>
>In my example the 20 occurs in line 7 and 8 -> two times:
>
>20{7,8} => 2
>
>And we iterate over it, and keep only 15 numbers in the hash and count
>each time the occurrences of each number.
>
>Could somebody help me with this?
>
>
>Thank you in advance
>
>
>marek


A rolling Frame that tracks line's of occurances is not as easy as you think.
The concept is simple, the implementation is another thing altogether.
This would not be a problem to present in a beginner Perl class.
Its not actually Perl that would be a problem, its the implemtation of a rolling
frame and tracking of line numbers from a given criteria.

The below code is just a rudimentary framework to demonstrate the constructs that
would be necessary. You might need a hardened programmer with large application
experience to deal with rolling frames and data tracking.

Could this rough code be thinned out? Sure. It just demonstrates the concept, its
not production quality.

Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is
your doing.

Well, good luck and have fun!
-sln

------------------------

# Frames.pl
# -------------------------
# Template:
# We assume a valid frame of 5 (not based on line count) This could be 15 or any number
# @Frame_Cache = (number, number, number, ...); ## 5 elements
# %Items = (number => [line,line,line], number => [line,line,line],...);


use strict;
use warnings;


my @Frame_Cache = ();
my %Items = ();
my ($cache_size, $lncount, $framesize) = (0, 0, 5);

while (<DATA>)
{
++$lncount;

# Digits only, anything else is invalid
/^\s*(\d+)\s*$/;
next if (!$1);

# Add item to frame cache
push @Frame_Cache, $1;

# Add line number onto item array stack (in hash)
push @{$Items{$1}}, $lncount;

print "\nAdding $1 (line $lncount)\n";

# Continue until full frame
++$cache_size;
next if ($cache_size < $framesize);

# First full frame, the roll starts on next one
# Show Frame, do something with %Items
if ($cache_size == $framesize)
{
PrintItems();
next;
}

# Frame is moving, take head off cache
my $item_number = shift @Frame_Cache;

# Adjust lines going out of frame (all array's in hash).
# Delete the item number line array if it is empty.

print "Taking $item_number off (line ".${$Items{$item_number}}[0].")\n";

my $line_going_out_of_frame = ${$Items{$item_number}}[0];
for my $nbr (keys %Items)
{
shift @{$Items{$nbr}} if (${$Items{$nbr}}[0] <= $line_going_out_of_frame);
delete $Items{$nbr} if (!@{$Items{$nbr}});
}

# Show Frame, do something with %Items
PrintItems();
}

# You could print items down here if there is no full frame
# ...

# end of program ...


# This prints the items hash (could use Data::Dumper), but more importantly
# gives a template to access the data.
# When your through with debug printing, just comment the print part out.
# Process the data here, refactor this sub when done.
# No sub should access global data imho.
# -----------------
sub PrintItems
{
print "Frame ".($cache_size-$framesize+1)." - $cache_size\n";
for my $nbr (sort {$a<=>$b} keys %Items) {
print "number = $nbr, on lines = [ @{$Items{$nbr}} ]\n";
}
}

__DATA__

01.01.98
99
31
33
14
7
35
16
20
20
13
55
1
1
7
0
2
3
0
2
3
0
2
3
0
2
3
0
2
3
0


---------------
Output:


c:\temp>perl frames.pl

Adding 99 (line 3)

Adding 31 (line 4)

Adding 33 (line 5)

Adding 14 (line 6)

Adding 7 (line 7)
Frame 1 - 5
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 31, on lines = [ 4 ]
number = 33, on lines = [ 5 ]
number = 99, on lines = [ 3 ]

Adding 35 (line 8)
Taking 99 off (line 3)
Frame 2 - 6
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 31, on lines = [ 4 ]
number = 33, on lines = [ 5 ]
number = 35, on lines = [ 8 ]

Adding 16 (line 9)
Taking 31 off (line 4)
Frame 3 - 7
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 16, on lines = [ 9 ]
number = 33, on lines = [ 5 ]
number = 35, on lines = [ 8 ]

Adding 20 (line 10)
Taking 33 off (line 5)
Frame 4 - 8
number = 7, on lines = [ 7 ]
number = 14, on lines = [ 6 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 ]
number = 35, on lines = [ 8 ]

Adding 20 (line 11)
Taking 14 off (line 6)
Frame 5 - 9
number = 7, on lines = [ 7 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 11 ]
number = 35, on lines = [ 8 ]

Adding 13 (line 12)
Taking 7 off (line 7)
Frame 6 - 10
number = 13, on lines = [ 12 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 11 ]
number = 35, on lines = [ 8 ]

Adding 55 (line 13)
Taking 35 off (line 8)
Frame 7 - 11
number = 13, on lines = [ 12 ]
number = 16, on lines = [ 9 ]
number = 20, on lines = [ 10 11 ]
number = 55, on lines = [ 13 ]

Adding 1 (line 14)
Taking 16 off (line 9)
Frame 8 - 12
number = 1, on lines = [ 14 ]
number = 13, on lines = [ 12 ]
number = 20, on lines = [ 10 11 ]
number = 55, on lines = [ 13 ]

Adding 1 (line 15)
Taking 20 off (line 10)
Frame 9 - 13
number = 1, on lines = [ 14 15 ]
number = 13, on lines = [ 12 ]
number = 20, on lines = [ 11 ]
number = 55, on lines = [ 13 ]

Adding 7 (line 16)
Taking 20 off (line 11)
Frame 10 - 14
number = 1, on lines = [ 14 15 ]
number = 7, on lines = [ 16 ]
number = 13, on lines = [ 12 ]
number = 55, on lines = [ 13 ]

Adding 2 (line 18)
Taking 13 off (line 12)
Frame 11 - 15
number = 1, on lines = [ 14 15 ]
number = 2, on lines = [ 18 ]
number = 7, on lines = [ 16 ]
number = 55, on lines = [ 13 ]

Adding 3 (line 19)
Taking 55 off (line 13)
Frame 12 - 16
number = 1, on lines = [ 14 15 ]
number = 2, on lines = [ 18 ]
number = 3, on lines = [ 19 ]
number = 7, on lines = [ 16 ]

Adding 2 (line 21)
Taking 1 off (line 14)
Frame 13 - 17
number = 1, on lines = [ 15 ]
number = 2, on lines = [ 18 21 ]
number = 3, on lines = [ 19 ]
number = 7, on lines = [ 16 ]

Adding 3 (line 22)
Taking 1 off (line 15)
Frame 14 - 18
number = 2, on lines = [ 18 21 ]
number = 3, on lines = [ 19 22 ]
number = 7, on lines = [ 16 ]

Adding 2 (line 24)
Taking 7 off (line 16)
Frame 15 - 19
number = 2, on lines = [ 18 21 24 ]
number = 3, on lines = [ 19 22 ]

Adding 3 (line 25)
Taking 2 off (line 18)
Frame 16 - 20
number = 2, on lines = [ 21 24 ]
number = 3, on lines = [ 19 22 25 ]

Adding 2 (line 27)
Taking 3 off (line 19)
Frame 17 - 21
number = 2, on lines = [ 21 24 27 ]
number = 3, on lines = [ 22 25 ]

Adding 3 (line 28)
Taking 2 off (line 21)
Frame 18 - 22
number = 2, on lines = [ 24 27 ]
number = 3, on lines = [ 22 25 28 ]

Adding 2 (line 30)
Taking 3 off (line 22)
Frame 19 - 23
number = 2, on lines = [ 24 27 30 ]
number = 3, on lines = [ 25 28 ]

Adding 3 (line 31)
Taking 2 off (line 24)
Frame 20 - 24
number = 2, on lines = [ 27 30 ]
number = 3, on lines = [ 25 28 31 ]

c:\temp>

sln@netherlands.com 02-16-2009 03:37 AM

Re: restrict a hash to 15 pairs and iterate over it
 
On Mon, 16 Feb 2009 03:25:00 GMT, sln@netherlands.com wrote:

>On Sat, 14 Feb 2009 23:51:34 -0800 (PST), Marek <mstep@podiuminternational.org> wrote:
>
>>
>>
>>Hello all!
>>
>>
>>I am still a beginner, so please be patient with me.
>>
>>I have a big file with numbers and dates like follows here:
>>
>>
>>01.01.98
>>31
>>33
>>14
>>7
>>35
>>16
>>20
>>20
>>13
>>55
>>1
>>1
>>7
>>
>>
>>etc etc
>>
>>I need a complicate hash to know the occurrences of numbers in a scope
>>of 15:
>>
>>We skip the dates, and we count the lines. The structure of my %hash
>>looks like follows:
>>
>>($number{$line, $line, ...}) => $how_many_times
>>
>>In my example the 20 occurs in line 7 and 8 -> two times:
>>
>>20{7,8} => 2
>>
>>And we iterate over it, and keep only 15 numbers in the hash and count
>>each time the occurrences of each number.
>>
>>Could somebody help me with this?
>>
>>
>>Thank you in advance
>>
>>
>>marek

>
>A rolling Frame that tracks line's of occurances is not as easy as you think.
>The concept is simple, the implementation is another thing altogether.
>This would not be a problem to present in a beginner Perl class.
>Its not actually Perl that would be a problem, its the implemtation of a rolling
>frame and tracking of line numbers from a given criteria.
>
>The below code is just a rudimentary framework to demonstrate the constructs that
>would be necessary. You might need a hardened programmer with large application
>experience to deal with rolling frames and data tracking.
>
>Could this rough code be thinned out? Sure. It just demonstrates the concept, its
>not production quality.
>
>Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is
>your doing.
>
>Well, good luck and have fun!
>-sln
>
>------------------------
>
># Frames.pl
># -------------------------
># Template:
># We assume a valid frame of 5 (not based on line count) This could be 15 or any number
># @Frame_Cache = (number, number, number, ...); ## 5 elements
># %Items = (number => [line,line,line], number => [line,line,line],...);
>
>

[snip]
> # Digits only, anything else is invalid
> /^\s*(\d+)\s*$/;
> next if (!$1);

^^^^^
next if (!defined $1)

Oops. I always say 'check your work'. Gotcha on me!

-sln


All times are GMT. The time now is 08:52 AM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.