Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Iterating hashes

Reply
Thread Tools

Iterating hashes

 
 
Dave Saville
Guest
Posts: n/a
 
      05-14-2013
If I have

foreach (sort keys %hash)

Then I know that there is a posible performance hit as the keys are
all extracted and then sorted. But is this because of the sort or are
they always all extracted first? If the latter is true then how do you
iterate over a hash without taking the hit?

I am having some problems with a script that has two hashes upon which
I am trying to do inner and outer joins amongst other things. The
hashes are roughly the same size and have over 60,000 keys the
majority of the keys have a length of approx 70 characters. The hash
values are a three element array: A mixed case copy of the key and two
integers.

--
Regards
Dave Saville
 
Reply With Quote
 
 
 
 
Peter Makholm
Guest
Posts: n/a
 
      05-14-2013
"Dave Saville" <(E-Mail Removed)> writes:

> Then I know that there is a posible performance hit as the keys are
> all extracted and then sorted. But is this because of the sort or are
> they always all extracted first? If the latter is true then how do you
> iterate over a hash without taking the hit?


You kan use the each() function in a while loop:

while (($key, $value) = each %hash) {
print $key, "\n";
}

See the relevant part of the perlfunc manual page or
http://perldoc.perl.org/functions/each.html

//Makholm
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      05-14-2013
Peter Makholm <(E-Mail Removed)> writes:
> "Dave Saville" <(E-Mail Removed)> writes:
>
>> Then I know that there is a posible performance hit as the keys are
>> all extracted and then sorted. But is this because of the sort or are
>> they always all extracted first? If the latter is true then how do you
>> iterate over a hash without taking the hit?

>
> You kan use the each() function in a while loop:
>
> while (($key, $value) = each %hash) {
> print $key, "\n";
> }


There's actually a variety of options and depending on the number of
pairs in the hash, they perform differently ('values' came out first
everwhere for the tests I made).

----------------------------
use Benchmark qw(cmpthese);

sub traversal_bench
{
my %h = map { $_, 1; } 0 .. $_[0];

print("\n===\n$_[0] keys\n===\n");

cmpthese(-4,
{
keys => sub {
my $v;

$v = $h{$_} for keys(%h);
},

sorted_keys => sub {
my $v;

$v = $h{$_} for sort(keys(%h));
},

values => sub {
1 for values(%h);
},

scalar_each => sub {
my ($v, $k);

$v = $h{$k} while $k = each(%h);
},

list_each => sub {
my ($v, $k);

1 while ($k, $v) = each(%h);
}});
}

traversal_bench($_) for 10, 100, 1000, 10000, 100000, 1000000;
 
Reply With Quote
 
Dave Saville
Guest
Posts: n/a
 
      05-14-2013
On Tue, 14 May 2013 14:49:46 UTC, Rainer Weikusat
<(E-Mail Removed)> wrote:

> Peter Makholm <(E-Mail Removed)> writes:
> > "Dave Saville" <(E-Mail Removed)> writes:
> >
> >> Then I know that there is a posible performance hit as the keys are
> >> all extracted and then sorted. But is this because of the sort or are
> >> they always all extracted first? If the latter is true then how do you
> >> iterate over a hash without taking the hit?

> >
> > You kan use the each() function in a while loop:
> >
> > while (($key, $value) = each %hash) {
> > print $key, "\n";
> > }

>
> There's actually a variety of options and depending on the number of
> pairs in the hash, they perform differently ('values' came out first
> everwhere for the tests I made).
>
> ----------------------------
> use Benchmark qw(cmpthese);
>
> sub traversal_bench
> {
> my %h = map { $_, 1; } 0 .. $_[0];
>
> print("\n===\n$_[0] keys\n===\n");
>
> cmpthese(-4,
> {
> keys => sub {
> my $v;
>
> $v = $h{$_} for keys(%h);
> },
>
> sorted_keys => sub {
> my $v;
>
> $v = $h{$_} for sort(keys(%h));
> },
>
> values => sub {
> 1 for values(%h);
> },
>
> scalar_each => sub {
> my ($v, $k);
>
> $v = $h{$k} while $k = each(%h);
> },
>
> list_each => sub {
> my ($v, $k);
>
> 1 while ($k, $v) = each(%h);
> }});
> }
>
> traversal_bench($_) for 10, 100, 1000, 10000, 100000, 1000000;


===
100 keys
===
Rate sorted_keys scalar_each list_each keys
values
sorted_keys 5715/s -- -28% -36% -46%
-85%
scalar_each 7956/s 39% -- -10% -25%
-80%
list_each 8863/s 55% 11% -- -16%
-77%
keys 10602/s 85% 33% 20% --
-73%
values 39161/s 585% 392% 342% 269%
--

Care to explain the numbers please?
--
Regards
Dave Saville
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      05-14-2013
Ben Morrow <(E-Mail Removed)> writes:
> Quoth Rainer Weikusat <(E-Mail Removed)>:
>> Peter Makholm <(E-Mail Removed)> writes:
>> > "Dave Saville" <(E-Mail Removed)> writes:
>> >
>> >> Then I know that there is a posible performance hit as the keys are
>> >> all extracted and then sorted. But is this because of the sort or are
>> >> they always all extracted first? If the latter is true then how do you
>> >> iterate over a hash without taking the hit?
>> >
>> > You kan use the each() function in a while loop:
>> >
>> > while (($key, $value) = each %hash) {
>> > print $key, "\n";
>> > }

>>
>> There's actually a variety of options and depending on the number of
>> pairs in the hash, they perform differently ('values' came out first
>> everwhere for the tests I made).
>>
>> ----------------------------
>> use Benchmark qw(cmpthese);
>>
>> sub traversal_bench
>> {
>> my %h = map { $_, 1; } 0 .. $_[0];
>>
>> print("\n===\n$_[0] keys\n===\n");
>>
>>
>> sorted_keys => sub {
>> my $v;
>>
>> $v = $h{$_} for sort(keys(%h));
>> },
>>
>> values => sub {
>> 1 for values(%h);
>> },

>
> That's hardly a fair comparison. In fact, 'values' coming out faster is
> a red herring as well: it's only happening because the values are all 1
> which is much faster to copy than a string.
>
> With a fairer test like
>
> use Benchmark qw/cmpthese/;
>
> my %h = map +("$_", "$_"), 1..60_000;
> my $x;
>
> cmpthese -5, {
> keys => sub { $x = $_ for keys %h },
> sort => sub { $x = $_ for sort keys %h },
> values => sub { $x = $_ for values %h },
> keach => sub { 1 while $x = each %h },
> veach => sub { 1 while $x = (each %h)[1] },
> };


Why would this be 'a fair test' when the keys are copied for no
particular reason while no attempt is made to determine the values
except for the 'values' and 'each in list context' cases? Iterating
over the keys of a hash while not looking at the values associated
with those keys at all seems to be a rather bizarre idea of a use
case.
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      05-14-2013
Rainer Weikusat wrote:
) Why would this be 'a fair test' when the keys are copied for no
) particular reason while no attempt is made to determine the values
) except for the 'values' and 'each in list context' cases? Iterating
) over the keys of a hash while not looking at the values associated
) with those keys at all seems to be a rather bizarre idea of a use
) case.

Perl doesn't have a 'set' type, and typically a hash is used for that, and
that is a perfectly legitimate use case for using only the keys of a hash.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      05-14-2013
On 14/05/2013 18:40, Ben Morrow wrote:

> I'm not sure why
> 'values' is cheaper than 'keys', but I suspect it has something to do
> with the fact that hash keys are shared.


perldoc -f keys:

The returned values are copies of the original keys in the hash, so
modifying them will not affect the original hash.


perldoc -f values:

Note that the values are not copied, which means modifying them will
modify the contents of the hash

--
Ruud

 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      05-14-2013
Willem <(E-Mail Removed)> writes:
> Rainer Weikusat wrote:
> ) Why would this be 'a fair test' when the keys are copied for no
> ) particular reason while no attempt is made to determine the values
> ) except for the 'values' and 'each in list context' cases? Iterating
> ) over the keys of a hash while not looking at the values associated
> ) with those keys at all seems to be a rather bizarre idea of a use
> ) case.
>
> Perl doesn't have a 'set' type, and typically a hash is used for that, and
> that is a perfectly legitimate use case for using only the keys of a
> hash.


While you're of course right on that, I regard this is 'rather
bizarre' use case for a hash.
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      05-14-2013
"Dave Saville" <(E-Mail Removed)> writes:
> On Tue, 14 May 2013 14:49:46 UTC, Rainer Weikusat
> <(E-Mail Removed)> wrote:


[...]

>> use Benchmark qw(cmpthese);


[...]

>> cmpthese(-4,


[...]

> ===
> 100 keys
> ===
> Rate sorted_keys scalar_each list_each keys
> values
> sorted_keys 5715/s -- -28% -36% -46%
> -85%
> scalar_each 7956/s 39% -- -10% -25%
> -80%
> list_each 8863/s 55% 11% -- -16%
> -77%
> keys 10602/s 85% 33% 20% --
> -73%
> values 39161/s 585% 392% 342% 269%
> --
>
> Care to explain the numbers please?


The first column is the 'speed' in terms of 'executions per second',
the other columns show the 'speed differences' of the code of the
current row relative to the others, expressed as percentage of the
speed of the 'column algorithm'. Eg, for 'keys', the sorted_keys
column says '85%'. This is calculated as

int((10602 - 5715) / 5715 * 100)

 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      05-14-2013
Ben Morrow <(E-Mail Removed)> writes:

[...]

> In any case, I thought the purpose here was to benchmark 'for (keys)'
> and 'while (each)' as methods of iterating over a hash. The difference
> between them is so small that a single extra assignment, or an
> assignment that copies a string rather than a number, will completely
> swamp the difference you are trying to measure, at least until you get
> to the point where the extra memory allocated by keys causes the process
> to start swapping.


Testing just this, the result is (as I already knew from a past
thread) that keys is (on the computer where I tested this)
significantly faster than each for hashes with up to 10,000 keys.

----------------
use Benchmark qw(cmpthese);

sub traversal_bench
{
my %h = map { $_, 1; } 0 .. $_[0];


print("\n===\n$_[0] keys\n===\n");

cmpthese(-5,
{
keys => sub {
1 for keys(%h);
},

each => sub {
my $k;

1 while $k = each(%h);
}});
}

traversal_bench($_) for 10, 100, 1000, 10000, 100000, 1000000;
-----------------

I'm somewhat uncertain what "it is possible to keep adding unrelated
code to each example until that dominates the execution time so
overwhelmingly that this difference can't be measured anymore" is
supposed to communicate in this context ...
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Iterating over an Array of Hashes Peter Hicks Ruby 14 06-11-2011 09:42 PM
Iterating a std::vector vs iterating a std::map? carl C++ 5 11-25-2009 09:55 AM
Iterating over a hash of hash of hashes Steven Hirsch Ruby 0 08-19-2008 12:34 AM
Hash of hashes, of hashes, of arrays of hashes Tim O'Donovan Perl Misc 5 10-28-2005 05:59 AM
iterating through hash of hashes Jeff Thies Perl Misc 7 09-01-2003 11:08 PM



Advertisments