Go Back   Velocity Reviews > Newsgroups > Python
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

Python - Delete all items in the list

 
Thread Tools Search this Thread
Old 02-28-2009, 03:59 PM   #21
Default Re: Delete all items in the list


Chris Rebert wrote:

> Obviously that equivalence is true, but in this case I'm emphasizing
> that it's even worse than that when constant factors are taken into
> account. Big-O is nice in the abstract, but in the real-world those
> constant factors can matter.
>
> In pure big-O, it is indeed O(M*N) vs. O(N)
> Including constant factors, the performance is roughly 2*M*N*X [X =
> overhead of remove()] vs. N, which makes the badness of the algorithm
> all the more apparent.



So, what you're saying is that if you include a constant factor on one side
of the comparison, and neglect the constant factors on the other, the side
with the constant factor is worse. Well, duh

I forget which specific O(N) algorithm you're referring to, but I think it
was probably a list comp:

L[:] = [x for x in L if x != 'a']

As a wise man once said *wink*, "Big-O is nice in the abstract, but in the
real-world those constant factors can matter". This is very true... in this
case, the list comp requires creating a new list, potentially resizing it
an arbitrary number of times, then possibly garbage collecting the previous
contents of L. These operations are not free, and for truly huge lists
requiring paging, it could get very expensive.

Big Oh notation is good for estimating asymptotic behaviour, which means it
is good for predicting how an algorithm will scale as the size of the input
increases. It is useless for predicting how fast that algorithm will run,
since the actual speed depends on those constant factors that Big Oh
neglects. That's not a criticism of Big Oh -- predicting execution speed is
not what it is for.

For what it's worth, for very small lists, the while...remove algorithm is
actually faster that using a list comprehension, despite being O(M*N)
versus O(N), at least according to my tests. It's *trivially* faster, but
if you're interested in (premature) micro-optimisation, you might save one
or two microseconds by using a while loop for short lists (say, around
N<=12 or so according to my tests), and swapping to a list comp only for
larger input.

Now, this sounds silly, and in fact it is silly for the specific problem
we're discussing, but as a general technique it is very powerful. For
instance, until Python 2.3, list.sort() used a low-overhead but O(N**2)
insertion sort for small lists, only kicking off a high-overhead but O(N)
sample sort above a certain size. The current timsort does something
similar, only faster and more complicated. If I understand correctly, it
uses insertion sort to make up short runs of increasing or decreasing
values, and then efficiently merges them.


--
Steven



Steven D'Aprano
  Reply With Quote
Old 02-28-2009, 05:46 PM   #22
Terry Reedy
 
Posts: n/a
Default Re: Delete all items in the list
Steven D'Aprano wrote:

> Big Oh notation is good for estimating asymptotic behaviour, which means it
> is good for predicting how an algorithm will scale as the size of the input
> increases. It is useless for predicting how fast that algorithm will run,
> since the actual speed depends on those constant factors that Big Oh
> neglects. That's not a criticism of Big Oh -- predicting execution speed is
> not what it is for.
>
> For what it's worth, for very small lists, the while...remove algorithm is
> actually faster that using a list comprehension, despite being O(M*N)
> versus O(N), at least according to my tests. It's *trivially* faster, but
> if you're interested in (premature) micro-optimisation, you might save one
> or two microseconds by using a while loop for short lists (say, around
> N<=12 or so according to my tests), and swapping to a list comp only for
> larger input.
>
> Now, this sounds silly, and in fact it is silly for the specific problem
> we're discussing, but as a general technique it is very powerful. For
> instance, until Python 2.3, list.sort() used a low-overhead but O(N**2)
> insertion sort for small lists, only kicking off a high-overhead but O(N)


O(NlogN)

> sample sort above a certain size. The current timsort does something
> similar, only faster and more complicated. If I understand correctly, it
> uses insertion sort to make up short runs of increasing or decreasing
> values, and then efficiently merges them.


It uses binary insertion sort to make runs of 64 (determined by
empirical testing). The 'binary' part means that it uses O(logN) binary
search rather than O(n) linear search to find the insertion point for
each of N items, so that finding insertion points uses only O(NlogN)
comparisions (which are relatively expensive). Each of the N insertions
is done with a single memmove() call, which typically is relatively
fast. So although binary insertion sort is still O(N*N), the
coefficient of the N*N term in the time formula is relatively small.

The other advantages of timsort are that it exploits existing order in a
list while preserving the order of items that compare equal.

Terry Jan Reedy




Terry Reedy
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can't delete file or folder? nork Software 1 11-02-2009 11:52 AM
asp.net : custom code to delete data from Grid view (Database) sara_23apr Software 1 06-19-2007 01:08 PM
Update and Delete From Gridview Usman Software 0 11-01-2006 10:05 AM
How do I: Correctly deny the delete permission? sundevilkid85 Software 0 05-08-2006 09:49 PM
Latest Tech Fiasco... Ghost A+ Certification 30 01-09-2004 12:15 PM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46