Velocity Reviews > pairs from a list

# pairs from a list

Arnaud Delobelle
Guest
Posts: n/a

 01-22-2008
On Jan 22, 1:19*pm, Alan Isaac <(E-Mail Removed)> wrote:
> I suppose my question should have been,
> is there an obviously faster way?
> Anyway, of the four ways below, the
> first is substantially fastest. *Is
> there an obvious reason why?

Can you post your results?

I get different ones (pairs1 and pairs2 rewritten slightly to avoid
unnecessary indirection).

====== pairs.py ===========
from itertools import *

def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)

def compare():
import timeit
for i in '1234':
t = timeit.Timer('list(pairs.pairs%s(l))' % i,
'import pairs; l=range(1000)')
print 'pairs%s: %s' % (i, t.timeit(10000))

if __name__ == '__main__':
compare()
=====================

marigoldython arno\$ python pairs.py
pairs1: 0.789824962616
pairs2: 4.08462786674
pairs3: 2.90438890457
pairs4: 0.536775827408

pairs4 wins.

--
Arnaud

Alan Isaac
Guest
Posts: n/a

 01-22-2008
Arnaud Delobelle wrote:
> According to the docs [1], izip is defined to be equivalent to:
>
> def izip(*iterables):
> iterables = map(iter, iterables)
> while iterables:
> result = [it.next() for it in iterables]
> yield tuple(result)
>
> This guarantees that it.next() will be performed from left to right,
> so there is no risk that e.g. pairs4([1, 2, 3, 4]) returns [(2, 1),
> (4, 3)].
>
> Is there anything else that I am overlooking?
>
> [1] http://docs.python.org/lib/itertools-functions.html

<URL:http://bugs.python.org/issue1121416>

fwiw,
Alan Isaac

Arnaud Delobelle
Guest
Posts: n/a

 01-22-2008
On Jan 22, 4:10*pm, Alan Isaac <(E-Mail Removed)> wrote:

> <URL:http://bugs.python.org/issue1121416>
>
> fwiw,
> Alan Isaac

Thanks. So I guess I shouldn't take the code snippet I quoted as a
specification of izip but rather as an illustration.

--
Arnaud

Alan Isaac
Guest
Posts: n/a

 01-22-2008
Arnaud Delobelle wrote:
> pairs4 wins.

Oops. I see a smaller difference,
but yes, pairs4 wins.

Alan Isaac

import time
from itertools import islice, izip

x = range(500001)

def pairs1(x):
return izip(islice(x,0,None,2),islice(x,1,None,2))

def pairs2(x):
xiter = iter(x)
while True:
yield xiter.next(), xiter.next()

def pairs3(x):
for i in range( len(x)//2 ):
yield x[2*i], x[2*i+1],

def pairs4(x):
xiter = iter(x)
return izip(xiter,xiter)

t = time.clock()
for x1, x2 in pairs1(x):
pass
t1 = time.clock() - t

t = time.clock()
for x1, x2 in pairs2(x):
pass
t2 = time.clock() - t

t = time.clock()
for x1, x2 in pairs3(x):
pass
t3 = time.clock() - t

t = time.clock()
for x1, x2 in pairs4(x):
pass
t4 = time.clock() - t

print t1, t2, t3, t4

Output:
0.317524154606 1.13436847421 1.07100930426 0.262926712753

Guest
Posts: n/a

 01-22-2008
On Jan 22, 5:34 am, George Sakkis <(E-Mail Removed)> wrote:
> On Jan 22, 12:15 am, Paddy <(E-Mail Removed)> wrote:
>
> > On Jan 22, 3:20 am, Alan Isaac <(E-Mail Removed)> wrote:> I want to generate sequential pairs from a list.
> > <<snip>>
> > > What is the fastest way? (Ignore the import time.)

>
> > 1) How fast is the method you have?
> > 2) How much faster does it need to be for your application?
> > 3) Are their any other bottlenecks in your application?
> > 4) Is this the routine whose smallest % speed-up would give the
> > largest overall speed up of your application?

>
> I believe the "what is the fastest way" question for such small well-
> defined tasks is worth asking on its own, regardless of whether it
> makes a difference in the application (or even if there is no
> application to begin with).

Hi George,
You need to 'get it right' first. Micro optimizations for speed
without thought of the wider context is a bad habit to form and a time
waster.
If the routine is all that needs to be delivered and it does not
perform at an acceptable speed then find out what is acceptable and
optimise towards that goal. My questions were set to get posters to
think more about the need for speed optimizations and where they
should be applied, (if at all).

A bit of forethought might justify leaving the routine alone, or

Arnaud Delobelle
Guest
Posts: n/a

 01-22-2008
On Jan 22, 6:34*pm, Paddy <(E-Mail Removed)> wrote:
[...]
> Hi George,
> You need to 'get it right' first. Micro optimizations for speed
> without thought of the wider context is a bad habit to form and a time
> waster.
> If the routine is all that needs to be delivered and it does not
> perform at an acceptable speed then find out what is acceptable and
> optimise towards that goal. My questions were set to get posters to
> think more about the need for speed optimizations and where they
> should be applied, (if at all).
>
> A bit of forethought might justify leaving the routine alone, or

But it's fun!

Some-of-us-can't-help-it'ly yours
--
Arnaud

Peter Otten
Guest
Posts: n/a

 01-22-2008
Arnaud Delobelle wrote:

> On Jan 22, 4:10Â*pm, Alan Isaac <(E-Mail Removed)> wrote:
>
>> <URL:http://bugs.python.org/issue1121416>
>>
>> fwiw,
>> Alan Isaac

>
> Thanks. So I guess I shouldn't take the code snippet I quoted as a
> specification of izip but rather as an illustration.

You can be bolder here as the izip() docs explicitly state

"""
Note, the left-to-right evaluation order of the iterables is
guaranteed. This makes possible an idiom for clustering a data series into
n-length groups using "izip(*[iter(s)]*n)".
"""

and the bug report with Raymond Hettinger saying

"""
Left the evaluation order as an unspecified, implementation
specific detail.
"""

is about zip(), not izip().

Peter

Raymond Hettinger
Guest
Posts: n/a

 01-22-2008
[Peter Otten]
> You can be bolder here as the izip() docs explicitly state
>
> """
> Note, the left-to-right evaluation order of the iterables is
> guaranteed. This makes possible an idiom for clustering a data series into
> n-length groups using "izip(*[iter(s)]*n)".
> """

. . .
> is about zip(), not izip().

FWIW, I just added a similar guarantee for zip().

Raymond

George Sakkis
Guest
Posts: n/a

 01-23-2008
On Jan 22, 1:34 pm, Paddy <(E-Mail Removed)> wrote:
> On Jan 22, 5:34 am, George Sakkis <(E-Mail Removed)> wrote:
>
>
>
> > On Jan 22, 12:15 am, Paddy <(E-Mail Removed)> wrote:

>
> > > On Jan 22, 3:20 am, Alan Isaac <(E-Mail Removed)> wrote:> I want to generate sequential pairs from a list.
> > > <<snip>>
> > > > What is the fastest way? (Ignore the import time.)

>
> > > 1) How fast is the method you have?
> > > 2) How much faster does it need to be for your application?
> > > 3) Are their any other bottlenecks in your application?
> > > 4) Is this the routine whose smallest % speed-up would give the
> > > largest overall speed up of your application?

>
> > I believe the "what is the fastest way" question for such small well-
> > defined tasks is worth asking on its own, regardless of whether it
> > makes a difference in the application (or even if there is no
> > application to begin with).

>
> Hi George,
> You need to 'get it right' first.

For such trivial problems, getting it right alone isn't a particularly
high expectation.

> Micro optimizations for speed
> without thought of the wider context is a bad habit to form and a time
> waster.

The OP didn't mention anything about the context; for all we know,
this might be a homework problem or the body of a tight inner loop.
There is this tendency on c.l.py to assume that every optimization
question is about a tiny subproblem of a 100 KLOC application. Without
further context, we just don't know.

> If the routine is all that needs to be delivered and it does not
> perform at an acceptable speed then find out what is acceptable
> and optimise towards that goal. My questions were set to get
> posters to think more about the need for speed optimizations and
> where they should be applied, (if at all).

I don't agree with this logic in general. Just because one can solve a
problem by throwing a quick and dirty hack with quadratic complexity
that happens to do well enough on current typical input, it doesn't
mean he shouldn't spend ten or thirty minutes more to write a proper
linear time solution, all else being equal or at least comparable
(elegance, conciseness, readability, etc.). Of course it's a tradeoff;
spending a week to save a few milliseconds on average is usually a
waste for most applications, but being a lazy keyboard banger writing
the first thing that pops into mind is not that good either.

George

Steven D'Aprano
Guest
Posts: n/a

 01-23-2008
On Tue, 22 Jan 2008 18:32:22 -0800, George Sakkis wrote:

> The OP didn't mention anything about the context; for all we know, this
> might be a homework problem or the body of a tight inner loop. There is
> this tendency on c.l.py to assume that every optimization question is
> about a tiny subproblem of a 100 KLOC application. Without further
> context, we just don't know.

Funny. As far as I can tell, the usual assumption on c.l.py is that every
tiny two-line piece of code is the absolute most critically important
heart of an application which gets called billions of times on petabytes
of data daily.

Given the human psychology displayed involved, in the absence of
definitive evidence one way or another it is a far safer bet to assume
that people are unnecessarily asking for "the fastest" out of a misguided
and often ignorant belief that they need it, rather than the opposite.
People who actually need a faster solution usually know enough to preface
their comments with an explanation of why their existing solution is too
slow rather than just a context-free demand for "the fastest" solution.

Fast code is like fast cars. There *are* people who really genuinely need
to have the fastest car available, but that number is dwarfed by the vast
legions of tossers trying to make up for their lack of self-esteem by
buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
on the way to the mall that the car is at risk of getting airborne, sure,
we believe you.

(The above sarcasm naturally doesn't apply to those who actually do need
to travel at 200mph in a school zone, like police, taxi drivers and stock
brokers.)

--
Steven