# Fun with fancy slicing

David Eppstein
 10-02-2003
In article <3f7bf423$0$20654$
Damien Wyart wrote:

* David Eppstein in comp.lang.python:
> > Of course we know that nonrandom quicksort pivoting can be quadratic
> > anyway (e.g. on sorted input) but this to my mind is worse because
> > randomization doesn't make it any better. The fact that some textbooks
> > (e.g. CLRS!) make this mistake doesn't excuse it either.

>
> Could you explain in more detail the error made in CLRS on this topic
> (with a reference if possible) ? I did not precisely catch your
> explanation here.
>
> Thanks in advance,

CLRS' partition routine ends up partitioning an n-item array into the
items <= the pivot, the pivot itself, and the items > the pivot.
If the items are all equal, that means that the first part gets n-1
items and the last part gets nothing, regardless of which item you
select as pivot. If you analyze the algorithm on this input, you get
the recurrence T(n)=O(n)+T(n-1)+T(0) which solves to O(n^2).

Throughout most of the quicksort chapter, CLRS at least include
exercises mentioning the case of all equal inputs, asking what happens
for those inputs, and in one exercise suggesting a partition routine
that might partition those inputs more equally (but who knows what it
should do when merely most of the inputs are equal...) However in the
randomized quicksort section they ignored this complication and seemed
to be claiming that their randomized quicksort has O(n log n) expected
time, unconditionally.

This problem was finally corrected in the fourth printing of the second
edition; see the errata at
<http://www.cs.dartmouth.edu/~thc/clrs-2e-bugs/bugs.php>
for more details.

David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

 10-03-2003
Greg Ewing (using news.cis.dfn.de) wrote:
> Alex Martelli wrote:
>
>> How sweet it would be to be able to unpack by coding:
>> head, *tail = alist

>
>
> Indeed! I came across another use case for this
> recently, as well. I was trying to parse some
> command strings of the form
>
> command arg arg ...
>
> where some commands had args and some didn't.
> I wanted to split the command off the front
> and keep the args for later processing once
> I'd decoded the command. My first attempt
> went something like
>
> command, args = cmdstring.split(" ", maxsplit = 1)
>
> but this fails when there are no arguments,
> because the returned list has only one element
> in that case.
>
> It would have been very nice to be able to
> simply say
>
> command, *args = cmdstring.split()
>
> and get the command as a string and a
> list of 0 or more argument strings.
>
> I really ought to write a PEP about this...
>

In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

command, args = first_rest(cmdstring.split())

or, in one step

def chop_word(s):
return first_rest(s.split())

command, args = chop_word(cmdstring)

David

 10-03-2003
|Alex Martelli wrote:
|> How sweet it would be to be able to unpack by coding:
|> head, *tail = alist

Greg Ewing (using news.cis.dfn.de) wrote previously:
| command arg arg ...
| command, *args = cmdstring.split()
|and get the command as a string and a list of 0 or more argument strings.

I think I've written on this list before that I would like this
too--Ruby does this, IIRC. If anyone writes a PEP, you have my +1 in
advance.

For Greg's particular case, one approach that doesn't look too bad is:

command = args.pop(0)
# ... do stuff ...
try:
more = args.pop(0)
except IndexError:
# no more

For a really long list, it would be faster to do an 'args.reverse()' up
front, and '.pop()' off the end (Greg and Alex know all this, of course).

Yours, Lulu...

 10-03-2003
Alex Martelli wrote:
> How sweet it would be to be able to unpack by coding:
> head, *tail = alist

Indeed! I came across another use case for this
recently, as well. I was trying to parse some
command strings of the form

command arg arg ...

where some commands had args and some didn't.
I wanted to split the command off the front
and keep the args for later processing once
I'd decoded the command. My first attempt
went something like

command, args = cmdstring.split(" ", maxsplit = 1)

but this fails when there are no arguments,
because the returned list has only one element
in that case.

It would have been very nice to be able to
simply say

command, *args = cmdstring.split()

and get the command as a string and a
list of 0 or more argument strings.

I really ought to write a PEP about this...

 10-03-2003
In article <Jd7fb.484435$cF.170532@rwcrnsc53>,
David C. Fox wrote:

> In the mean time, it isn't too hard to write a function which does this:
>
> def first_rest(x):
> return x[0], x[1:]

Shouldn't that be called cons?

 10-03-2003
David Eppstein wrote:

>> In the mean time, it isn't too hard to write a function which does this:
>>
>> def first_rest(x):
>> return x[0], x[1:]

>
> Shouldn't that be called cons?

Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
and gives me the car and cdr...

Alex

 10-03-2003
Lulu of the Lotus-Eaters wrote:
...
> |> head, *tail = alist
>
> "Greg Ewing (using news.cis.dfn.de)" <(E-Mail Removed)> wrote
> previously:
> | command arg arg ...
> | command, *args = cmdstring.split()
> |and get the command as a string and a list of 0 or more argument strings.
>
> I think I've written on this list before that I would like this
> too--Ruby does this, IIRC. If anyone writes a PEP, you have my +1 in
> advance.
>
> For Greg's particular case, one approach that doesn't look too bad is:
>
> command = args.pop(0)
> # ... do stuff ...
> try:
> more = args.pop(0)
> except IndexError:
> # no more

I'm not sure what this 'more' is about, actually. Greg's case is
currently best solved [IMHO] with
args = cmdstring.split()
command = args.pop(0)

> For a really long list, it would be faster to do an 'args.reverse()' up
> front, and '.pop()' off the end (Greg and Alex know all this, of course).

Actually, I don't -- both args.reverse() and args.pop(0) are O(len(args)),
so I don't know their relative speeds offhand. Interestingly enough,
timeit.py, for once, doesn't want to let me know about them either...:

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags.pop(0)'
Traceback (most recent call last):
File "timeit.py", line 249, in main
x = t.timeit(number)
File "timeit.py", line 158, in timeit
return self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
x=ags.pop(0)
IndexError: pop from empty list
[alex@lancelot python2.3]\$

....since each .pop is modifying the ags list, the once-only setup
statement doesn't suffice... OK, so we need to repeat (and thus,
alas, intrinsically measure) the list copying too (sigh):

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23.2 usec per loop

the results of the two snippets aren't identical, but that should
not affect the timing measured (as the use of ags[:] and the
repetition are time-measurement artefacts anyway). So, it does
look like reversing first IS faster -- by a hair. Trying to
remove the ags[:] part / overhead gave me a shock, though...:

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.2 usec per loop

So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???

Tim...? Please HELP a fellow bot in need...!!!

Alex

 10-03-2003
Alex Martelli writes:

> So -- it's clear to me that I do NOT understand what's going on
> here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
> be 24.5 ... ???

How quiet was your machine when you ran the tests? I see behaviour
more in line with what you'd expect:

[mwh@pc150 build]\$ ./python ../Lib/timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 31.6 usec per loop
[mwh@pc150 build]\$ ./python ../Lib/timeit.py -s'ags=range(1000)' 'x=ags[:].pop(0)'
10000 loops, best of 3: 39.8 usec per loop

Cheers,
mwh

 10-03-2003
In article <22cfb.159311$
Alex Martelli wrote:

> David Eppstein wrote:
>
> >> In the mean time, it isn't too hard to write a function which does this:
> >>
> >> def first_rest(x):
> >> return x[0], x[1:]

> >
> > Shouldn't that be called cons?

>
> Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
> and gives me the car and cdr...

Well, but it returns an object composed of a car and a cdr, which is
exactly what a cons is...

 10-03-2003
Michael Hudson wrote:

> Alex Martelli <(E-Mail Removed)> writes:
>
>> So -- it's clear to me that I do NOT understand what's going on
>> here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
>> be 24.5 ... ???

>
> How quiet was your machine when you ran the tests? I see behaviour

Very, and the numbers were highly repeatable:

-- running just now:

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 36.9 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.2 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.8 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.6 usec per loop

-- and from a copy & paste of the screen as of a while ago:

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

-- _however_ -- retrying the latter now:

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 48.8 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 46.1 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 46.5 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.4 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

-- _SO_ -- immediately going back to the just-copy tests...:

[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop

SO -- so much for the "quiet"... clearly there _IS_ something periodically
running and distorting the elapsed-time measurements (which are the default
here on Linux) by a hefty factor of 2. So much for this kind of casual
benchmarking...!-) I guess I've been doing a little too much of it...

> more in line with what you'd expect:
>
> [mwh@pc150 build]\$ ./python ../Lib/timeit.py -s'ags=range(1000)'
> ['x=ags[:]'
> 10000 loops, best of 3: 31.6 usec per loop
> [mwh@pc150 build]\$ ./python ../Lib/timeit.py -s'ags=range(1000)'
> ['x=ags[:].pop(0)'
> 10000 loops, best of 3: 39.8 usec per loop

Yep, makes more sense. So, moving to the more-stable -c (CPU time,
as given by time.clock):

[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 22 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]\$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop

it WOULD seem to be confirmed that reverse-then-pop() is VERY
slightly faster than pop(0) -- 22/23 instead of 23/24 usec for
a 1000-long list... of which 19.2 are the apparently-repeatable
overhead of copying that list. I still wouldn't say that I
"KNOW" this, though -- the margin is SO tiny and uncertain...!!!

It seems to scale up linearly going from 1000 to 5000: just
the copy, 105-108; ags[:].pop(0), 120-123; reverse then pop,
116-117. This is always with the -c (once burned...).

Alex

