![]() |
restrict a hash to 15 pairs and iterate over it
Hello all! I am still a beginner, so please be patient with me. I have a big file with numbers and dates like follows here: 01.01.98 31 33 14 7 35 16 20 20 13 55 1 1 7 etc etc I need a complicate hash to know the occurrences of numbers in a scope of 15: We skip the dates, and we count the lines. The structure of my %hash looks like follows: ($number{$line, $line, ...}) => $how_many_times In my example the 20 occurs in line 7 and 8 -> two times: 20{7,8} => 2 And we iterate over it, and keep only 15 numbers in the hash and count each time the occurrences of each number. Could somebody help me with this? Thank you in advance marek |
Re: restrict a hash to 15 pairs and iterate over it
On Feb 15, 10:06 am, Ben Morrow <b...@morrow.me.uk> wrote:
> > What do you mean by 'the occurrences of numbers'? Do you mean the number > of times each number shows up, so for a list like > > 1 2 1 4 2 > > you would get the results > > 1: 2 times > 2: 2 times > 4: 1 time > yes! :-) > > If you need a data structure like this (I'm not yet sure whether you do) > you want something more like > > my %numbers = ( > 20 => [7, 8] > ); > > You don't need to record the count separately: Perl arrays know how long > they are. > Thank you! Good idea! > How do you choose which 15 numbers to keep? > > > Post the code you've got so far, in as close to working condition as you > can get it (that is, make sure there aren't any syntax errors, or bits > left over from things you've tried before). > > Ben Thank you Ben for your patience! 1. You are asking for a code. But I am ashamed to post it here, because it is too childish. I have tried with an array of a reference to a hash. 2. What I need, you guessed already well: I have a large file with many numbers in it. Each line a number and sometimes some dates. I read in from the beginning 15 numbers, here separated with a tab: 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 ^ ^ So the first step would be to read in the numbers until 15th line and see if there are double numbers or triple numbers. In my example here there are two 1 and two 2 ... Then I do something with this result and I read in the next 15 numbers starting with 2 ... 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 ^ ^ next step we read in starting with 3 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 ^ ^ But you gave already a valuable hint: we only need a hash of each %number = (1 => [1,5]); And see how long is the anonymous array. No need to add one level more to each number and counting their occurrences. Hope this is clearer now? Thank you again! marek |
Re: restrict a hash to 15 pairs and iterate over it
Marek <mstep@podiuminternational.org> wrote:
> On Feb 15, 10:06 am, Ben Morrow <b...@morrow.me.uk> wrote: > >> >> so for a list like >> >> 1 2 1 4 2 >> >> you would get the results >> >> 1: 2 times >> 2: 2 times >> 4: 1 time >> > > yes! :-) > So the first step would be to read in the numbers until 15th line and > see if there are double numbers or triple numbers. In my example here > there are two 1 and two 2 ... Then I do something with this result and > I read in the next 15 numbers starting with 2 ... > > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ > > next step we read in starting with 3 > > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ > > But you gave already a valuable hint: we only need a hash of each > > %number = (1 => [1,5]); If you instead use a hash that buffers 15 elements at a time, then you can generate a hash with the counts directly, as you don't seem to want to know the actual line numbers... I thinks this does what you are asking for: ----------------------------- #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $size = 5; # 5 instead of 15 my $line = 0; # "line" counter my %lines; # buffer up $size lines while ( <DATA> ) { next if /\./; # skip dates chomp; $line++; $lines{$line} = $_; if ( keys %lines == $size ) { print Dumper \%lines; # for debugging # count what is in the buffer my %nums; $nums{$_}++ for values %lines; # display what is in the (counted) buffer foreach my $num ( sort { $a <=> $b } keys %nums ) { printf "%3d: %3d times\n", $num, $nums{$num}; } print "---------\n"; # maintain buffer size delete $lines{ $line - $size + 1}; } } __DATA__ 01.01.98 31 33 01.02.98 01.03.98 14 7 35 16 20 20 13 55 1 1 7 ----------------------------- -- Tad McClellan email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/" |
Re: restrict a hash to 15 pairs and iterate over it
Marek wrote:
> On Feb 15, 10:06 am, Ben Morrow <b...@morrow.me.uk> wrote: > >> What do you mean by 'the occurrences of numbers'? Do you mean the number >> of times each number shows up, so for a list like >> >> 1 2 1 4 2 >> >> you would get the results >> >> 1: 2 times >> 2: 2 times >> 4: 1 time >> > > yes! :-) > >> If you need a data structure like this (I'm not yet sure whether you do) >> you want something more like >> >> my %numbers = ( >> 20 => [7, 8] >> ); >> >> You don't need to record the count separately: Perl arrays know how long >> they are. >> > > Thank you! Good idea! If you need to know both the counts of occurence and the locations, then this is a good idea. The list of locations automatically includes the count of occurences. But if *only* need the count, then I wouldn't store the list of locations as well. > > I have a large file with many numbers in it. Each line a number and > sometimes some dates. I read in from the beginning 15 numbers, here > separated with a tab: > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ > > > So the first step would be to read in the numbers until 15th line and > see if there are double numbers or triple numbers. In my example here > there are two 1 and two 2 ... Then I do something with this result and > I read in the next 15 numbers starting with 2 ... > > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ So it is a sliding window. my @window; my %count; while (<>) { chomp; next if looks_like_date_not_number($_); push @window, $_; $count{$_}++; if (@window>15) { ## window is too big, get rid of the first one; $count{$window[0]}--; shift @window; }; if (@window==15) { # do whatever you want to do with %count } }; Xho |
Re: restrict a hash to 15 pairs and iterate over it
On 2009-02-15 12:21, Marek <mstep@podiuminternational.org> wrote:
> I have a large file with many numbers in it. Each line a number and > sometimes some dates. I read in from the beginning 15 numbers, here > separated with a tab: > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ > > > So the first step would be to read in the numbers until 15th line and > see if there are double numbers or triple numbers. In my example here > there are two 1 and two 2 ... Then I do something with this result and > I read in the next 15 numbers starting with 2 ... > > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ > > next step we read in starting with 3 > > > 1 2 3 4 1 5 6 7 8 9 10 2 11 12 13 14 15 16 17 > ^ ^ So you have a sliding window of 15 lines, and you always want to know how often a number occurs within that window? That is you want an output similar to this: line 1 - 15: 1: 2 2: 2 3: 1 4: 1 5: 1 6: 1 7: 1 8: 1 9: 1 10: 1 11: 1 12: 1 13: 1 line 2 - 16: 1: 1 2: 2 3: 1 4: 1 5: 1 6: 1 7: 1 8: 1 9: 1 10: 1 11: 1 12: 1 13: 1 14: 1 line 3 - 17: 1: 1 2: 1 3: 1 4: 1 5: 1 6: 1 7: 1 8: 1 9: 1 10: 1 11: 1 12: 1 13: 1 14: 1 15: 1 One way to achieve this is to keep the window in an array. Add new lines with push, and remove old lines with shift. For each line added, increment the count of the corresponding number. For each line removed, decrement it. The counts can be kept in an array or a hash. hp |
Re: restrict a hash to 15 pairs and iterate over it
Wow! I am impressed! I would never have found these solutions on myself! Special thanx to Tad; so short and elegant! I love these lines: if ( keys %lines == $size ) and $nums{$_}++ for values %lines; and this one is my favourite :-) delete $lines{ $line - $size + 1}; Also Xho's suggestion is really tricky! Thank you! Good evening to all! marek |
Re: restrict a hash to 15 pairs and iterate over it
On Sat, 14 Feb 2009 23:51:34 -0800 (PST), Marek <mstep@podiuminternational.org> wrote:
> > >Hello all! > > >I am still a beginner, so please be patient with me. > >I have a big file with numbers and dates like follows here: > > >01.01.98 >31 >33 >14 >7 >35 >16 >20 >20 >13 >55 >1 >1 >7 > > >etc etc > >I need a complicate hash to know the occurrences of numbers in a scope >of 15: > >We skip the dates, and we count the lines. The structure of my %hash >looks like follows: > >($number{$line, $line, ...}) => $how_many_times > >In my example the 20 occurs in line 7 and 8 -> two times: > >20{7,8} => 2 > >And we iterate over it, and keep only 15 numbers in the hash and count >each time the occurrences of each number. > >Could somebody help me with this? > > >Thank you in advance > > >marek A rolling Frame that tracks line's of occurances is not as easy as you think. The concept is simple, the implementation is another thing altogether. This would not be a problem to present in a beginner Perl class. Its not actually Perl that would be a problem, its the implemtation of a rolling frame and tracking of line numbers from a given criteria. The below code is just a rudimentary framework to demonstrate the constructs that would be necessary. You might need a hardened programmer with large application experience to deal with rolling frames and data tracking. Could this rough code be thinned out? Sure. It just demonstrates the concept, its not production quality. Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is your doing. Well, good luck and have fun! -sln ------------------------ # Frames.pl # ------------------------- # Template: # We assume a valid frame of 5 (not based on line count) This could be 15 or any number # @Frame_Cache = (number, number, number, ...); ## 5 elements # %Items = (number => [line,line,line], number => [line,line,line],...); use strict; use warnings; my @Frame_Cache = (); my %Items = (); my ($cache_size, $lncount, $framesize) = (0, 0, 5); while (<DATA>) { ++$lncount; # Digits only, anything else is invalid /^\s*(\d+)\s*$/; next if (!$1); # Add item to frame cache push @Frame_Cache, $1; # Add line number onto item array stack (in hash) push @{$Items{$1}}, $lncount; print "\nAdding $1 (line $lncount)\n"; # Continue until full frame ++$cache_size; next if ($cache_size < $framesize); # First full frame, the roll starts on next one # Show Frame, do something with %Items if ($cache_size == $framesize) { PrintItems(); next; } # Frame is moving, take head off cache my $item_number = shift @Frame_Cache; # Adjust lines going out of frame (all array's in hash). # Delete the item number line array if it is empty. print "Taking $item_number off (line ".${$Items{$item_number}}[0].")\n"; my $line_going_out_of_frame = ${$Items{$item_number}}[0]; for my $nbr (keys %Items) { shift @{$Items{$nbr}} if (${$Items{$nbr}}[0] <= $line_going_out_of_frame); delete $Items{$nbr} if (!@{$Items{$nbr}}); } # Show Frame, do something with %Items PrintItems(); } # You could print items down here if there is no full frame # ... # end of program ... # This prints the items hash (could use Data::Dumper), but more importantly # gives a template to access the data. # When your through with debug printing, just comment the print part out. # Process the data here, refactor this sub when done. # No sub should access global data imho. # ----------------- sub PrintItems { print "Frame ".($cache_size-$framesize+1)." - $cache_size\n"; for my $nbr (sort {$a<=>$b} keys %Items) { print "number = $nbr, on lines = [ @{$Items{$nbr}} ]\n"; } } __DATA__ 01.01.98 99 31 33 14 7 35 16 20 20 13 55 1 1 7 0 2 3 0 2 3 0 2 3 0 2 3 0 2 3 0 --------------- Output: c:\temp>perl frames.pl Adding 99 (line 3) Adding 31 (line 4) Adding 33 (line 5) Adding 14 (line 6) Adding 7 (line 7) Frame 1 - 5 number = 7, on lines = [ 7 ] number = 14, on lines = [ 6 ] number = 31, on lines = [ 4 ] number = 33, on lines = [ 5 ] number = 99, on lines = [ 3 ] Adding 35 (line 8) Taking 99 off (line 3) Frame 2 - 6 number = 7, on lines = [ 7 ] number = 14, on lines = [ 6 ] number = 31, on lines = [ 4 ] number = 33, on lines = [ 5 ] number = 35, on lines = [ 8 ] Adding 16 (line 9) Taking 31 off (line 4) Frame 3 - 7 number = 7, on lines = [ 7 ] number = 14, on lines = [ 6 ] number = 16, on lines = [ 9 ] number = 33, on lines = [ 5 ] number = 35, on lines = [ 8 ] Adding 20 (line 10) Taking 33 off (line 5) Frame 4 - 8 number = 7, on lines = [ 7 ] number = 14, on lines = [ 6 ] number = 16, on lines = [ 9 ] number = 20, on lines = [ 10 ] number = 35, on lines = [ 8 ] Adding 20 (line 11) Taking 14 off (line 6) Frame 5 - 9 number = 7, on lines = [ 7 ] number = 16, on lines = [ 9 ] number = 20, on lines = [ 10 11 ] number = 35, on lines = [ 8 ] Adding 13 (line 12) Taking 7 off (line 7) Frame 6 - 10 number = 13, on lines = [ 12 ] number = 16, on lines = [ 9 ] number = 20, on lines = [ 10 11 ] number = 35, on lines = [ 8 ] Adding 55 (line 13) Taking 35 off (line 8) Frame 7 - 11 number = 13, on lines = [ 12 ] number = 16, on lines = [ 9 ] number = 20, on lines = [ 10 11 ] number = 55, on lines = [ 13 ] Adding 1 (line 14) Taking 16 off (line 9) Frame 8 - 12 number = 1, on lines = [ 14 ] number = 13, on lines = [ 12 ] number = 20, on lines = [ 10 11 ] number = 55, on lines = [ 13 ] Adding 1 (line 15) Taking 20 off (line 10) Frame 9 - 13 number = 1, on lines = [ 14 15 ] number = 13, on lines = [ 12 ] number = 20, on lines = [ 11 ] number = 55, on lines = [ 13 ] Adding 7 (line 16) Taking 20 off (line 11) Frame 10 - 14 number = 1, on lines = [ 14 15 ] number = 7, on lines = [ 16 ] number = 13, on lines = [ 12 ] number = 55, on lines = [ 13 ] Adding 2 (line 18) Taking 13 off (line 12) Frame 11 - 15 number = 1, on lines = [ 14 15 ] number = 2, on lines = [ 18 ] number = 7, on lines = [ 16 ] number = 55, on lines = [ 13 ] Adding 3 (line 19) Taking 55 off (line 13) Frame 12 - 16 number = 1, on lines = [ 14 15 ] number = 2, on lines = [ 18 ] number = 3, on lines = [ 19 ] number = 7, on lines = [ 16 ] Adding 2 (line 21) Taking 1 off (line 14) Frame 13 - 17 number = 1, on lines = [ 15 ] number = 2, on lines = [ 18 21 ] number = 3, on lines = [ 19 ] number = 7, on lines = [ 16 ] Adding 3 (line 22) Taking 1 off (line 15) Frame 14 - 18 number = 2, on lines = [ 18 21 ] number = 3, on lines = [ 19 22 ] number = 7, on lines = [ 16 ] Adding 2 (line 24) Taking 7 off (line 16) Frame 15 - 19 number = 2, on lines = [ 18 21 24 ] number = 3, on lines = [ 19 22 ] Adding 3 (line 25) Taking 2 off (line 18) Frame 16 - 20 number = 2, on lines = [ 21 24 ] number = 3, on lines = [ 19 22 25 ] Adding 2 (line 27) Taking 3 off (line 19) Frame 17 - 21 number = 2, on lines = [ 21 24 27 ] number = 3, on lines = [ 22 25 ] Adding 3 (line 28) Taking 2 off (line 21) Frame 18 - 22 number = 2, on lines = [ 24 27 ] number = 3, on lines = [ 22 25 28 ] Adding 2 (line 30) Taking 3 off (line 22) Frame 19 - 23 number = 2, on lines = [ 24 27 30 ] number = 3, on lines = [ 25 28 ] Adding 3 (line 31) Taking 2 off (line 24) Frame 20 - 24 number = 2, on lines = [ 27 30 ] number = 3, on lines = [ 25 28 31 ] c:\temp> |
Re: restrict a hash to 15 pairs and iterate over it
On Mon, 16 Feb 2009 03:25:00 GMT, sln@netherlands.com wrote:
>On Sat, 14 Feb 2009 23:51:34 -0800 (PST), Marek <mstep@podiuminternational.org> wrote: > >> >> >>Hello all! >> >> >>I am still a beginner, so please be patient with me. >> >>I have a big file with numbers and dates like follows here: >> >> >>01.01.98 >>31 >>33 >>14 >>7 >>35 >>16 >>20 >>20 >>13 >>55 >>1 >>1 >>7 >> >> >>etc etc >> >>I need a complicate hash to know the occurrences of numbers in a scope >>of 15: >> >>We skip the dates, and we count the lines. The structure of my %hash >>looks like follows: >> >>($number{$line, $line, ...}) => $how_many_times >> >>In my example the 20 occurs in line 7 and 8 -> two times: >> >>20{7,8} => 2 >> >>And we iterate over it, and keep only 15 numbers in the hash and count >>each time the occurrences of each number. >> >>Could somebody help me with this? >> >> >>Thank you in advance >> >> >>marek > >A rolling Frame that tracks line's of occurances is not as easy as you think. >The concept is simple, the implementation is another thing altogether. >This would not be a problem to present in a beginner Perl class. >Its not actually Perl that would be a problem, its the implemtation of a rolling >frame and tracking of line numbers from a given criteria. > >The below code is just a rudimentary framework to demonstrate the constructs that >would be necessary. You might need a hardened programmer with large application >experience to deal with rolling frames and data tracking. > >Could this rough code be thinned out? Sure. It just demonstrates the concept, its >not production quality. > >Btw, the frame size was set to 5 for the example, change it to 15 or whatever it is >your doing. > >Well, good luck and have fun! >-sln > >------------------------ > ># Frames.pl ># ------------------------- ># Template: ># We assume a valid frame of 5 (not based on line count) This could be 15 or any number ># @Frame_Cache = (number, number, number, ...); ## 5 elements ># %Items = (number => [line,line,line], number => [line,line,line],...); > > [snip] > # Digits only, anything else is invalid > /^\s*(\d+)\s*$/; > next if (!$1); ^^^^^ next if (!defined $1) Oops. I always say 'check your work'. Gotcha on me! -sln |
| All times are GMT. The time now is 08:52 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.