Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > a trival array/ hash benchmark

Reply
Thread Tools

a trival array/ hash benchmark

 
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-20-2013
When running the trivial microbenchmark

-----------
use Benchmark;

my $h = { Worschtsupp => 4 };
my $a = [4];

timethese(-5,
{
h => sub { return $h->{Worschtsupp}; },
a => sub { return $a->[0]; }});
------------

on 'some computer', the result of three runs was that the hash lookup
ran at about 28.31% of the speed of the array access on average.
10,000,000 hash lookups are needed in order to spend 1s of processing
time solely on that and about 33,333,333 could have been done in the
same time.
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-21-2013
Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.perl.misc, Rainer Weikusat <(E-Mail Removed)> wrote:
>> When running the trivial microbenchmark

>
> I love these sorts of things, so I tried to duplicate your results.
>
>> -----------
>> use Benchmark;
>>
>> my $h = { Worschtsupp => 4 };
>> my $a = [4];
>>
>> timethese(-5,
>> {
>> h => sub { return $h->{Worschtsupp}; },
>> a => sub { return $a->[0]; }});
>> ------------
>>
>> on 'some computer', the result of three runs was that the hash lookup
>> ran at about 28.31% of the speed of the array access on average.

>
> Why not post actual results?


They are not really meaningful anywhere except on the computer where I
tested this _and_ with the perl version where I tested this.

> When I run it three times:
>
> a: 5 wallclock secs ( 5.30 usr + 0.00 sys = 5.30 CPU) @ 18706996.60/s (n=99147082)
> h: 4 wallclock secs ( 5.24 usr + 0.00 sys = 5.24 CPU) @ 15713758.59/s (n=82340095)
>
> a: 6 wallclock secs ( 5.57 usr + 0.00 sys = 5.57 CPU) @ 28009483.30/s (n=156012822)
> h: 6 wallclock secs ( 5.27 usr + 0.00 sys = 5.27 CPU) @ 24815075.90/s (n=130775450)
>
> a: 5 wallclock secs ( 5.21 usr + 0.00 sys = 5.21 CPU) @ 19115772.55/s (n=99593175)
> h: 4 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 22237998.61/s (n=111857133)


Eg, this suggests that 5.14.2 does something more intelligent (for a
certain definition of 'intelligent') for a static hash lookup than 'do
it from scratch every time', possibly even because hordes of "stupid
hash users" (like herds of mooses) made it more worthwhile to put an
effort into this. This could lead to an 'interesting' arms race where
some people devise more elaborate benchmarks in order to 'catch them
red-handed' while the people supposed to be caught devise more
elaborate ways to fool benchmarks.

The point of this posting was mainly to provide others with a way to
determine what the difference is for them with some simple test case
and to demonstrate that there's a difference at all (When in court, at
least in a civil case, a 'good' lawyer in a losing position will
simply deny everything the other party claims[*], including that the
sun rises in the morning, because this means everything had to be
proven which ensures that the case will grind down to a halt in
technicalities 'for a long time'. This tactic is even more useful to
cover unwelcome statements on USENET with surreptitious noise so that
nobody hears them).
[*] Real world example: I had the mispleasure to be in a court case
because somebody ran his car into my car because he ignored certain
traffic sign some ten years ago. Among the things which were denied
by the lawyer of the other party was that a road running uphill to the
next village right at the edge of the town would actually run uphill
and that the ignored traffic sign stood there at all.
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-21-2013
Ben Morrow <(E-Mail Removed)> writes:
> Quoth Rainer Weikusat <(E-Mail Removed)>:
>> When running the trivial microbenchmark
>>
>> -----------
>> use Benchmark;
>>
>> my $h = { Worschtsupp => 4 };
>> my $a = [4];
>>
>> timethese(-5,
>> {
>> h => sub { return $h->{Worschtsupp}; },
>> a => sub { return $a->[0]; }});
>> ------------
>>
>> on 'some computer', the result of three runs was that the hash lookup
>> ran at about 28.31% of the speed of the array access on average.
>> 10,000,000 hash lookups are needed in order to spend 1s of processing
>> time solely on that and about 33,333,333 could have been done in the
>> same time.

>
> Like most microbenchmarks, this tells you very little about real code.
> Try something a bit more realistic, like
>
> #!/opt/perl/bin/perl
>
> use Benchmark qw/cmpthese/;
>
> sub one_a { my ($self) = @_; $self->[0]; }
> sub one_h { my ($self) = @_; $self->{a}; }
> sub two_a { my ($self) = @_; $self->one_a; }
> sub two_h { my ($self) = @_; $self->one_h; }
> sub if_a { my ($self) = @_; if (rand > 0.2) { $self->two_a } }
> sub if_h { my ($self) = @_; if (rand > 0.2) { $self->two_h } }


This includes all kinds of unrelated effects in the test case and, as
you yourself admitted ('as I suspected ...'), even unrelated effects
you specifically included because you expected them to render the
result useless for comparing hash and array lookups. Unsurprisingly,
it became (almost) useless for this particular purpose.

[numbers deleted]

> the method-call overhead completely dominates the
> overhead of the hash lookup. If you can save one method call, you will
> save more time than you would have saved by using an arrayref;


Putting this in another way: The overhead of decoupling the so-called
'classs methods' from 'the object representation' with a layer of
'more privileged core class methods' aka 'accessors' is so high that
the actual representation doesn't matter anymore which means that the
theoretical benefit of that ('the representation could be changed
without affecting the 'set of general-purpose subroutines needlessly
tied to the so-called class) isn't worthwhile: The cure is worse than
the disease.
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-21-2013
Rainer Weikusat <(E-Mail Removed)> writes:
> Ben Morrow <(E-Mail Removed)> writes:
>> Quoth Rainer Weikusat <(E-Mail Removed)>:


[...]

>>> -----------
>>> use Benchmark;
>>>
>>> my $h = { Worschtsupp => 4 };
>>> my $a = [4];
>>>
>>> timethese(-5,
>>> {
>>> h => sub { return $h->{Worschtsupp}; },
>>> a => sub { return $a->[0]; }});
>>> ------------


[...]


>> Like most microbenchmarks, this tells you very little about real code.
>> Try something a bit more realistic, like


[...]


>> the method-call overhead completely dominates the
>> overhead of the hash lookup. If you can save one method call, you will
>> save more time than you would have saved by using an arrayref;

>
> Putting this in another way: The overhead of decoupling the so-called
> 'classs methods' from 'the object representation' with a layer of
> 'more privileged core class methods' aka 'accessors' is so high that
> the actual representation doesn't matter anymore which means that the
> theoretical benefit of that ('the representation could be changed
> without affecting the 'set of general-purpose subroutines needlessly
> tied to the so-called class) isn't worthwhile: The cure is worse than
> the disease.


Two things which kept bothering me about this:

1. My statement above is a little disingenious in this context: The
method call overhead is higher and the 'The cure ...' is - at best -
an attempt to change the topic of the converstation and - at worst -
some piece of not that easily discardable nonsense (both
intentionally).

2. The 'Like most ...' statement is wrong: The example code I gave
should be identical to the code in a typical 'getter' method.
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-21-2013
Ben Morrow <(E-Mail Removed)> writes:
> Quoth Rainer Weikusat <(E-Mail Removed)>:


[...]

>> 2. The 'Like most ...' statement is wrong: The example code I gave
>> should be identical to the code in a typical 'getter' method.

>
> That's not the point, though. The interesting question is not 'Are array
> derefs faster than hash derefs?': everyone already knows they are. The
> interesting questions are 'How much difference would it make to the
> overall performance of my application if I switched from hashref to
> arrayref objects?' and 'Does that make it worth putting up with the
> limitations of arrayref objects?'. My benchmark suggests the answer to
> the first is 'maybe 2 or 3%, for an app which is CPU-bound rather than
> IO-bound'; and unless the answer to the first was something huge, like a
> 50x speed increase, my answer to the second would be an unqualified
> 'No'.


This 'overall performance' idea is too simplistic. Eg, assume the
application is a server whose purpose is to make 'policy descisions'
for HTTP requests, that is, deny or allow them based on 'some
criterion' like 'URL black- or whitelisted' or 'classified as
belonging to a page of content-category ...'. Ideally, 'making the
policy descision shouldn't add any noticeably latency to requests AND
the measurable latency should be so small that it is possible to point
people at the fact that this can't be causing their 'internet
performance problems' even despite they suspect otherwise (which they
are going to do). Also, the descision really shouldn't add any latency
because only one such question can be processed at any given time and
this then becomes a question of 'how many requests can be processed per
second', considering that adding a latency of 1s is WAY too much. In
addition, all of this may need to run on relatively low-spec (that
is, cheap to mass-produce, hardware) and people are going to expect
that it will support at least a couple of thousands of concurrent
users (this is not a contrived example).

Taking this into account, what is more worthwhile? Using
datastructures with minimally faster access times (and less memory
requirements) or 'genuflecting before the god of code bumming', that
is "de-structuring the code" by 'inlining everything which can
conceivably be inlined'. Do I rather put up with the perceived
limitations of the former in order to avoid the latter for as long as
humanly possible, ie, am I willing to make a concsious effort to avoid
wasteing time on stuff where 'wasting time' doesn't buy me anything so
that I can afford to waste time on stuff where it does by me
something?

In line with this reflection, using anonymous arrays also enables
creating objects which retain direct access to their properties
without ad hoc hackery like

use constant NAME => __PACKAGE__.'name';

$h->{+NAME} = ...

because, just like structures in C, arrays provide an ordered set of
slots which means that something like 'use the next five free slots'
is possible which can't be done with hashes in a straight-forward way.

> Of course, *if* someone were to produce a system for building arrayref
> objects which *didn't* place any limits on their flexibility,


The idea to use anonymous arrays in order to enable 'nested superclass
attributes' is inherently limited to single-inheritance data
inheritance. This implies that everything which can be done in Java
can be done in this way in Perl, too, and even easier because
contortions like 'interfaces' aren't needed. Considering that this
language isn't entirely unused, discarding single-inheritance out of
hand as 'not good enough for me' seems a little preposterous. And -
since Perl isn't a "one size for all and users are fitted to it as
necessary" - the option to use other object representation where it
actually matters is always available.

[...]

> To change the subject a little back to something more interesting,
> here's a sketch of an idea I was playing with a while ago for a
> completely different way of representing attributes.


[...]

> package Foo;
> use Object::Closures;
>
> BUILD {
> my $att = $_[0];
>
> method get_att => sub { $att };
> method set_att => sub { $att = $_[0] };
> };
>
> sub new { construct @_ }
>
> The most important disadvantages are that it makes method calls even
> more expensive,


According to some text I read on the web some
years ago[*], the main problem with this is that every object uses
two additional lexical pads for each of its attributes in this way.
[*] Unfortunately, I can't quote this here because doing so would
seriously upset 'certain people' who prefer keeping their heads firmly
dug into the sand wrt what is or isn't available for free on the
internet (this means they claim that this 'free availability' would
be a serious problems but - for some reason - they don't consider it
worthwhile to do something against it ...).
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-22-2013
Ben Morrow <(E-Mail Removed)> writes:

[...]

> a. Who, exactly, do you think cares about this little rant?


Since you apparently can't get over this 'efficient' habit, any
attempt at discussing anyhting with you is evidently a waste of time.
 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      03-22-2013
On Fri, 22 Mar 2013 03:01:40 +0000 Ben Morrow <(E-Mail Removed)> wrote:

BM> [Rainer's] habit of continually digressing into unpleasant remarks
BM> about someone or something irrelevant is starting to get on my
BM> nerves.

Phew, I thought I was the only one annoyed by that. AWKWARD!

Ted
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
Hash#select returns an array but Hash#reject returns a hash... Srijayanth Sridhar Ruby 19 07-02-2008 12:49 PM
Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ] Michal Suchanek Ruby 6 06-13-2007 04:40 AM
Interesting trival example of why open classes are good? Peter Michaux Ruby 13 10-29-2006 06:29 PM
processor benchmark on 2621 router sideshowblah Cisco 1 04-17-2004 10:10 PM



Advertisments