Chris Rebert wrote:

> Obviously that equivalence is true, but in this case I'm emphasizing

> that it's even worse than that when constant factors are taken into

> account. Big-O is nice in the abstract, but in the real-world those

> constant factors can matter.

>

> In pure big-O, it is indeed O(M*N) vs. O(N)

> Including constant factors, the performance is roughly 2*M*N*X [X =

> overhead of remove()] vs. N, which makes the badness of the algorithm

> all the more apparent.
So, what you're saying is that if you include a constant factor on one side

of the comparison, and neglect the constant factors on the other, the side

with the constant factor is worse. Well, duh

I forget which specific O(N) algorithm you're referring to, but I think it

was probably a list comp:

L[:] = [x for x in L if x != 'a']

As a wise man once said *wink*, "Big-O is nice in the abstract, but in the

real-world those constant factors can matter". This is very true... in this

case, the list comp requires creating a new list, potentially resizing it

an arbitrary number of times, then possibly garbage collecting the previous

contents of L. These operations are not free, and for truly huge lists

requiring paging, it could get very expensive.

Big Oh notation is good for estimating asymptotic behaviour, which means it

is good for predicting how an algorithm will scale as the size of the input

increases. It is useless for predicting how fast that algorithm will run,

since the actual speed depends on those constant factors that Big Oh

neglects. That's not a criticism of Big Oh -- predicting execution speed is

not what it is for.

For what it's worth, for very small lists, the while...remove algorithm is

actually faster that using a list comprehension, despite being O(M*N)

versus O(N), at least according to my tests. It's *trivially* faster, but

if you're interested in (premature) micro-optimisation, you might save one

or two microseconds by using a while loop for short lists (say, around

N<=12 or so according to my tests), and swapping to a list comp only for

larger input.

Now, this sounds silly, and in fact it is silly for the specific problem

we're discussing, but as a general technique it is very powerful. For

instance, until Python 2.3, list.sort() used a low-overhead but O(N**2)

insertion sort for small lists, only kicking off a high-overhead but O(N)

sample sort above a certain size. The current timsort does something

similar, only faster and more complicated. If I understand correctly, it

uses insertion sort to make up short runs of increasing or decreasing

values, and then efficiently merges them.

--

Steven