Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Sorting: too different times. Why?

Reply
Thread Tools

Sorting: too different times. Why?

 
 
n00m
Guest
Posts: n/a
 
      11-22-2009
Any comment:

class Vector:
def __init__(self, x, y):
self.x = x
self.y = y
def __cmp__(self, v):
if self.x < v.x and self.y > v.y:
return -1
return 0

def v_cmp(v1, v2):
if v1.x < v2.x and v1.y > v2.y:
return -1
return 0

from random import randint
from time import time

a = []
for i in range(200000):
a += [Vector(randint(0, 500000), randint(0, 500000))]
b = a[:]
c = a[:]

print 'Sorting...'

t = time()
b.sort(cmp=v_cmp)
print time() - t

t = time()
c.sort()
print time() - t

print b == c



>>> ===================================== RESTART ======
>>>

Sorting...
0.906000137329
6.57799983025
True

 
Reply With Quote
 
 
 
 
n00m
Guest
Posts: n/a
 
      11-22-2009

I was expecting the 1st method would be *slower* than the 2nd one
Or at least equal... Just random ("intuitive") expectations
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      11-22-2009
In the subject line, you write "too different times". You actually want
"two", the number, not "too" as in "too many", "too much". Lots of native
English speakers get this wrong too

On Sun, 22 Nov 2009 01:21:42 -0800, n00m wrote:

> Any comment:
>
> class Vector:
> def __init__(self, x, y):
> self.x = x
> self.y = y
> def __cmp__(self, v):
> if self.x < v.x and self.y > v.y:
> return -1
> return 0



Modern versions of Python (since 2.2 I think?) use __lt__ or __gt__ for
sorting. If the class doesn't have a __lt__ method, Python falls back on
__cmp__.

> b.sort(cmp=v_cmp)


This is relatively fast, because you pass a comparison function directly,
so Python doesn't need to look for a __lt__ method then fall back to
__cmp__. It just uses v_cmp, every time.


> c.sort()


This is slower, because every comparison looks up the __lt__ and fails,
then tries the __cmp__.

If you change the definition of Vector to include rich comparison methods
as detailed here:

http://docs.python.org/reference/dat...#object.__lt__

sorting will probably be significantly faster still.



--
Steven
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      11-22-2009
n00m schrieb:
> Any comment:
>
> class Vector:
> def __init__(self, x, y):
> self.x = x
> self.y = y
> def __cmp__(self, v):
> if self.x < v.x and self.y > v.y:
> return -1
> return 0
>
> def v_cmp(v1, v2):
> if v1.x < v2.x and v1.y > v2.y:
> return -1
> return 0
>
> from random import randint
> from time import time
>
> a = []
> for i in range(200000):


Use xrange instead (unless you are under python3), because for loops you
don't need the full list range creates - xrange is just a generator.

> a += [Vector(randint(0, 500000), randint(0, 500000))]


Better use .append here, looks nicer and should also be a bit faster.

> b = a[:]
> c = a[:]
>
> print 'Sorting...'
>
> t = time()
> b.sort(cmp=v_cmp)
> print time() - t
>
> t = time()
> c.sort()
> print time() - t
>
> print b == c
>
>
>
>>>> ===================================== RESTART ======
>>>>

> Sorting...
> 0.906000137329
> 6.57799983025


I think the main reason is that method-dispatch is more expensive than
function-dispatch. The former must create a bound method before calling,
the latter just works out of the box.

Things get better if you do this:

t = time()
c.sort(cmp=Vector.__cmp__)
print time() - t


Although not the exact same performance - I get

Sorting...
0.677843093872
1.4283311367
True

Diez

 
Reply With Quote
 
Chris Rebert
Guest
Posts: n/a
 
      11-22-2009
On Sun, Nov 22, 2009 at 2:56 AM, n00m <> wrote:
> I was expecting the 1st method would be *slower* than the 2nd one
> Or at least equal... Just random ("intuitive") expectations


The second method repeatedly looks up left_item.__class__.__cmp__
(i.e. Vector.__cmp__) when doing the necessary comparisons between the
list items; while these lookups are optimized and are fast, they are
not free and cannot be skipped because Python doesn't know the list
contains only Vectors.
The first method uses the single provided comparison function and thus
does no such lookups; hence, it's faster.

That's my guess anyway.

Cheers,
Chris
--
http://blog.rebertia.com
 
Reply With Quote
 
Mark Dickinson
Guest
Posts: n/a
 
      11-22-2009
On Nov 22, 9:21*am, n00m <n...@narod.ru> wrote:
> Any comment:
>
> class Vector:
> * * def __init__(self, x, y):
> * * * * self.x = x
> * * * * self.y = y
> * * def __cmp__(self, v):
> * * * * if self.x < v.x and self.y > v.y:
> * * * * * * return -1
> * * * * return 0
>
> def v_cmp(v1, v2):
> * * if v1.x < v2.x and v1.y > v2.y:
> * * * * return -1
> * * return 0
>
> from random import randint
> from time import time
>
> a = []
> for i in range(200000):
> * * a += [Vector(randint(0, 500000), randint(0, 500000))]
> b = a[:]
> c = a[:]
>
> print 'Sorting...'
>
> t = time()
> b.sort(cmp=v_cmp)
> print time() - t
>
> t = time()
> c.sort()
> print time() - t
>
> print b == c
>
> >>> ===================================== RESTART ======

>
> Sorting...
> 0.906000137329
> 6.57799983025
> True


Do you get the same magnitude difference if you make Vector a new-
style
class? (I.e., use "class Vector(object)" instead of "class Vector
()".)

Mark
 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      11-22-2009
n00m wrote:
> Any comment:
>
> <snip>
> def v_cmp(v1, v2):
> if v1.x < v2.x and v1.y > v2.y:
> return -1
> return 0
>
>
>

The second part of the compound if is backwards. So if this is headed
for production code, it better get fixed.

DaveA

 
Reply With Quote
 
n00m
Guest
Posts: n/a
 
      11-22-2009
> Do you get the same magnitude difference
> if you make Vector a new-style class?


Yes (I mean "No"): new-style's much faster

And now it's elephants instead of vectors.
Def: an elephant is smarter than another one IIF
its size is strictly less but its IQ is strictly
greater

I.e. you can't compare (2, to (20, 50)
or let count them as equally smart elephants.
================================================

class Elephant(object):
def __init__(self, size, iq):
self.size = size
self.iq = iq
def __cmp__(self, e):
if self.size < e.size and self.iq > e.iq:
return -1
if self.size > e.size and self.iq < e.iq:
return 1
return 0

def e_cmp(e1, e2):
if e1.size < e2.size and e1.iq > e2.iq:
return -1
if e1.size > e2.size and e1.iq < e2.iq:
return 1
return 0

from random import randint
from time import time

a = []
for i in xrange(200000):
a.append(Elephant(randint(1, 50000), randint(1, 50000)))
b = a[:]
c = a[:]

print 'Sorting...'

t = time()
b.sort(cmp=e_cmp)
print time() - t

t = time()
c.sort()
print time() - t

print b == c



>>> ===================================== RESTART =====
>>>

Sorting...
1.56299996376
1.95300006866
True
 
Reply With Quote
 
n00m
Guest
Posts: n/a
 
      11-22-2009

> The second part of the compound if is backwards. *So if this is headed
> for production code, it better get fixed.
>
> DaveA


Not sure I'm understanding your remark.
 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      11-22-2009
Steven D'Aprano wrote:
> In the subject line, you write "too different times". You actually want
> "two", the number, not "too" as in "too many", "too much". Lots of native
> English speakers get this wrong too
>

[snip]
It could mean that the times are not just different, they're _too_
different, ie a lot more than they are expected to be.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Smart Client Assembly.. what is too big or too small ??? Martin ASP .Net 0 08-04-2004 08:47 AM
Are these pictures too dark or/and too large? Luigi Donatello Asero HTML 13 05-21-2004 04:54 AM
Problems with imaging (too slow or too much RAM) SB Java 0 08-05-2003 11:06 AM
Re: Is this just too hard for folks here? Or too stupid? Mark Parnell HTML 0 06-23-2003 11:02 PM
Re: Is this just too hard for folks here? Or too stupid? Mike Foster HTML 0 06-23-2003 07:14 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57