Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Two questions about efficiency

Reply
Thread Tools

Two questions about efficiency

 
 
Steve M
Guest
Posts: n/a
 
      05-26-2004
1. Near the beginning of the document "Unifying types and classes in
Python 2.2" by GvR, which can be found at
http://www.python.org/2.2.2/descrintro.html, the author writes the
following about subclassing a built in type:
---
Here's an example of a simple dict subclass, which provides a "default
value" that is returned when a missing key is requested:

class defaultdict(dict):

def __init__(self, default=None):
dict.__init__(self)
self.default = default

def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

<snip>

The __getitem__() method could also be written as follows, using the
new "key in dict" test introduced in Python 2.2:

def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

I believe that this version is less efficient, because it does the key
lookup twice. The exception would be when we expect that the requested
key is almost never in the dictionary: then setting up the try/except
statement is more expensive than the failing "key in self" test.
---

I am interested in his claim that the second version is less efficient
unless the requested key is "almost never" in the dictionary. I would
have thought that the overhead to do a key lookup is quite a bit less
than the overhead required to create an exception object, so that the
first version would be on average more efficient only if the requested
key is in the dictionary, say, more than half the time.

Does the "key in dict" test internally raise an exception on failure,
thus incurring at least the same overhead as the first version? Or is
the overhead to raise an exception much less than I have naively
supposed?

I have a similar idiom buried in a nested loop, and I'd like to select
the approach that is likely to be most efficient. My best guess is
that the key lookup will fail half the time.

2. Is there any reason to prefer one of the following over the other?
Im interested in both a consensus on style and readability, and
considerations of efficiency at runtime.

for x in L:
if f(x):
pass

for x in L:
if f(x):
continue
 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      05-26-2004
"Steve M" wrote:

> 2. Is there any reason to prefer one of the following over the other?
> Im interested in both a consensus on style and readability, and
> considerations of efficiency at runtime.
>
> for x in L:
> if f(x):
> pass
>
> for x in L:
> if f(x):
> continue


did you really mean to write that?

as written, both constructs are pointless, and the "if f(x)"-statement can
be replaced with just "f(x)" in both cases. and if you add more code inside
the if statement, or after it, the constructs do different things.

did you perhaps mean:

for x in L:
if f(x):
do stuff

for x in L:
if not f(x):
continue
do stuff

if so, my rule of thumb is to use alternative 1 if "stuff" is short, and alter-
native 2 if "stuff" is long, or if you have multiple tests before you get to
the real stuff.

ymmv, as usual.

and yes, if "stuff" is trivial, and builds a new list, consider using a list
comprehension:

L2 = [stuff for x in L if f(x)]

</F>




 
Reply With Quote
 
 
 
 
Slawomir Nowaczyk
Guest
Posts: n/a
 
      05-26-2004
On 26 May 2004 13:39:30 -0700
http://www.velocityreviews.com/forums/(E-Mail Removed) (Steve M) wrote:

#> I am interested in his claim that the second version is less
#> efficient unless the requested key is "almost never" in the
#> dictionary. I would have thought that the overhead to do a key
#> lookup is quite a bit less than the overhead required to create an
#> exception object, so that the first version would be on average
#> more efficient only if the requested key is in the dictionary, say,
#> more than half the time.

#> Does the "key in dict" test internally raise an exception on
#> failure, thus incurring at least the same overhead as the first
#> version? Or is the overhead to raise an exception much less than I
#> have naively supposed?

#> I have a similar idiom buried in a nested loop, and I'd like to
#> select the approach that is likely to be most efficient. My best
#> guess is that the key lookup will fail half the time.

I suggest:

>>> import profile
>>> p = profile.profiler()
>>> def ver1():

... # put version 1 here
...
>>> def ver2():

... # put version 2 here
...
>>> p.run("ver1()").print_stats()
>>> p.run("ver2()").print_stats()


Do not forget to share the results

--
Best wishes,
Slawomir Nowaczyk
( (E-Mail Removed) )

The Main Library at Indiana University sinks over an inch every
year because when it was built, engineers failed to take into
account the weight of all the books that would occupy the building.


 
Reply With Quote
 
Paul Miller
Guest
Posts: n/a
 
      05-27-2004
(E-Mail Removed) (Steve M) wrote in message news:<(E-Mail Removed). com>...
> 1. Near the beginning of the document "Unifying types and classes in
> Python 2.2" by GvR, which can be found at
> http://www.python.org/2.2.2/descrintro.html, the author writes the
> following about subclassing a built in type:
> ---
> Here's an example of a simple dict subclass, which provides a "default
> value" that is returned when a missing key is requested:
>
> class defaultdict(dict):
>
> def __init__(self, default=None):
> dict.__init__(self)
> self.default = default
>
> def __getitem__(self, key):
> try:
> return dict.__getitem__(self, key)
> except KeyError:
> return self.default
>
> <snip>
>
> The __getitem__() method could also be written as follows, using the
> new "key in dict" test introduced in Python 2.2:
>
> def __getitem__(self, key):
> if key in self:
> return dict.__getitem__(self, key)
> else:
> return self.default
>
> I believe that this version is less efficient, because it does the key
> lookup twice. The exception would be when we expect that the requested
> key is almost never in the dictionary: then setting up the try/except
> statement is more expensive than the failing "key in self" test.
> ---
>[...]
> I have a similar idiom buried in a nested loop, and I'd like to select
> the approach that is likely to be most efficient. My best guess is
> that the key lookup will fail half the time.


Since Guido wrote Python, I would tend to believe him when he says the
first method is more efficient than the second. However, even if
you nested a similar construct several levels deep inside a loop, I'd
be willing to bet that if that particular loop were a bottleneck in
your program, that optimizing that particular section of code would
probably not make the difference between "too slow" and "fast enough."

Since both options are equally readable, if I thought about it, I
would select option 2 while coding, by default. I wouldn't make a big
deal out of changing code that uses option 1, though.
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      05-27-2004
Steve M wrote:

[two implementations of dictionary with default value]

> I am interested in his claim that the second version is less efficient
> unless the requested key is "almost never" in the dictionary. I would
> have thought that the overhead to do a key lookup is quite a bit less
> than the overhead required to create an exception object, so that the
> first version would be on average more efficient only if the requested
> key is in the dictionary, say, more than half the time.
>
> Does the "key in dict" test internally raise an exception on failure,
> thus incurring at least the same overhead as the first version? Or is
> the overhead to raise an exception much less than I have naively
> supposed?


Thinking/believing is definitely the wrong approach here. Python ships with
the nice little script timeit.py that let's you verify your assumptions.

<dicttime.py>
class DictBase(dict):
def __init__(self, default, *args, **kw):
dict.__init__(self, *args, **kw)
self.default = default

class DictTry(DictBase):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.default

class DictIn(DictBase):
def __getitem__(self, key):
if key in self:
return dict.__getitem__(self, key)
else:
return self.default

class DictGet(DictBase):
def __getitem__(self, key):
return self.get(key, self.default)

items = "hit alpha beta gamma delta epsilon zeta eta theta iota
kappa".split()
items = zip(items, map(str.upper, items))
dictTry = DictTry(None, items)
dictIn = DictIn(None, items)
dictGet = DictGet(None, items)
</dicttime.py>


$ timeit.py -s"from dicttime import dictTry as d" "d['hit']"
100000 loops, best of 3: 2.22 usec per loop
$ timeit.py -s"from dicttime import dictTry as d" "d['miss']"
100000 loops, best of 3: 11.2 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['hit']"
100000 loops, best of 3: 2.39 usec per loop
$ timeit.py -s"from dicttime import dictIn as d" "d['miss']"
1000000 loops, best of 3: 1.49 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['hit']"
100000 loops, best of 3: 2.11 usec per loop
$ timeit.py -s"from dicttime import dictGet as d" "d['miss']"
100000 loops, best of 3: 2.03 usec per loop

Two results stand out: misses are cheap for DictIn and expensive for
DictTry.

Peter

 
Reply With Quote
 
Steve M
Guest
Posts: n/a
 
      05-28-2004
Peter Otten <(E-Mail Removed)> wrote
<snip>
> Two results stand out: misses are cheap for DictIn and expensive for
> DictTry.


That's very interesting, thanks so much Peter and the others.

So, contrary to your (and, incidentally, Wittgenstein's) admonition to
look not think, why, would you speculate, is this the case? The
overhead from creating an exception object?
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      05-28-2004
Steve M wrote:

> Peter Otten <(E-Mail Removed)> wrote
> <snip>
>> Two results stand out: misses are cheap for DictIn and expensive for
>> DictTry.

>
> That's very interesting, thanks so much Peter and the others.
>
> So, contrary to your (and, incidentally, Wittgenstein's) admonition to
> look not think, why, would you speculate, is this the case? The
> overhead from creating an exception object?


Again, why would I speculate if I can measure?

try ... except:

$ timeit.py -s"key='a'" "try: raise KeyError('KeyError: %r' % key)" "except
KeyError: pass"
100000 loops, best of 3: 5.22 usec per loop

share of exception object creation:

$ timeit.py -s"key='a'" "KeyError('KeyError: %r' % key)"
100000 loops, best of 3: 2.96 usec per loop

So the exception mechanism accounts for more than 5 of 9 extra usec. You can
perhaps find out where the rest stems from if you dig deeper, but I'm too
lazy right now.
Which reminds me, timing with this simple tool is not only fun - I've found
my intuition proved wrong often enough, so it saves me from optimizing
portions of the code that are already fast.

Peter

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to compare two SOAP Envelope or two Document or two XML files GenxLogic Java 3 12-06-2006 08:41 PM
perl efficiency -- fastest grepping? Bryan Krone Perl 1 11-08-2004 11:15 PM
Re: Questions....questions....questions Patrick Michael A+ Certification 0 06-16-2004 04:53 PM
CISCO 1721 efficiency problem DarejDok \(Dariusz Dħbrowski\) Cisco 0 07-16-2003 07:15 AM
dataset efficiency question Trevor Hartman ASP .Net 0 07-03-2003 08:20 PM



Advertisments