Velocity Reviews > sum for sequences?

# sum for sequences?

Alf P. Steinbach
Guest
Posts: n/a

 03-28-2010
* Steve Howell:
> On Mar 28, 8:17 am, Duncan Booth <(E-Mail Removed)> wrote:
>> Steve Howell <(E-Mail Removed)> wrote:
>>> The mildly surprising part of sum() is that is does add vs. add-in-
>>> place, which leads to O(N) vs. O(1) for the inner loop calls, for
>>> certain data structures, notably lists, even though none of the
>>> intermediate results get used by the caller. For lists, you could
>>> make a more efficient variant of sum() that clones the start value and

>> Doing add-in-place isn't the only way to make sum more efficient: if you
>> assume that addition is associative (which of course the builtin sum can't)
>> then you can form partial sums. e.g. instead of calculating:
>>
>> (((((((a + b) + c) + d) + e) + f) + g) + h)
>>
>> you calculate:
>>
>> (((a + b) + (c + d)) + ((e + f) + (g + h)))
>>
>> Obviously this requires more space than the naive sum, but not as much as
>> you might at first expect: you only need to hold log(n) intermediates
>> values at any time.
>>

>
> Yep, I did mention in my post that the outer loop does not *have* to
> be O(N), if you can parallelize it. Apart from reducing
> intermediates, the divide-and-conquer method does not reduce overall
> computation time unless you have multiple processors, correct?
>

With a single thread of execution it can reduce the time for large integers and
custom numerical types, with n the number of final bits, from O(n^2) to O(n log n).

I don't think the parallelism here is very relevant for Python.

From a more practical point of view, the sum efficiency could be improved by
doing the first addition using '+' and the rest using '+=', without changing the
behavior.

Cheers & hth.,

- Alf

Steve Howell
Guest
Posts: n/a

 03-28-2010
On Mar 28, 9:56*am, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> * Steve Howell:
>> (talking about summing a list of lists)

>
> *From a more practical point of view, the sum efficiency could be improved by
> doing the first addition using '+' and the rest using '+=', without changing the
> behavior.

I like that approach, because you don't have to assume that the start
object is clonable, although perhaps you need to make that assumption
anyway for the case where the list is empty.

Here is code that might shed light on Alf's suggested approach:

import timeit
M = 10
N = 8000

def in_place(
start = [],
sublists = ([[None] * M]) * N
):
# only macro-optimized
i = 0
for sublist in sublists:
if i == 0:
accum = start + sublist
i += 1
else:
accum.extend(sublist)
if i == 0:
return 'whatever' # semantics here?
return accum

def with_intermediates(
start = [],
sublists = ([[None] * M]) * N
):
accum = start
for sublist in sublists:
accum = accum + sublist
return accum

t1 = timeit.timeit('in_place()', 'from __main__ import in_place',
number=10)
t2 = timeit.timeit('with_intermediates()', 'from __main__ import
with_intermediates', number=10)
print M, N, t1, t2, int(t2 / t1)

For N=100, you get a significant speedup (x84):

10 1000 0.00102496147156 0.0867249965668 84

More data:

10 1000 0.00102496147156 0.0867249965668 84
10 2000 0.00211381912231 0.172616004944 81
10 4000 0.00491786003113 0.561800956726 114
10 8000 0.0706541538239 19.9019489288 281
10 16000 0.0161759853363 9.64171791077 596
10 32000 0.0351569652557 43.98346591 1251

Patrick Maupin
Guest
Posts: n/a

 03-28-2010
On Mar 28, 11:56*am, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> *From a more practical point of view, the sum efficiency could be improved by
> doing the first addition using '+' and the rest using '+=', without changing the
> behavior.

Mod parent up!!!!

Steve Howell
Guest
Posts: n/a

 03-28-2010
On Mar 28, 10:21*am, Steve Howell <(E-Mail Removed)> wrote:
> * * import timeit
> * * M = 10
> * * N = 8000
>
> * * def in_place(
> * * * * start = [],
> * * * * sublists = ([[None] * M]) * N
> * * * * ):
> * * * * # only macro-optimized
> * * * * i = 0
> * * * * for sublist in sublists:
> * * * * * * if i == 0:
> * * * * * * * *accum = start + sublist
> * * * * * * * *i += 1
> * * * * * * else:
> * * * * * * * * accum.extend(sublist)

FYI I later obtained similar results with the more general:
accum += sublist

> * * * * if i == 0:
> * * * * * * return 'whatever' # semantics here?
> * * * * return accum
>
> * * def with_intermediates(
> * * * * start = [],
> * * * * sublists = ([[None] * M]) * N
> * * * * ):
> * * * * accum = start
> * * * * for sublist in sublists:
> * * * * * * accum = accum + sublist
> * * * * return accum
>

Patrick Maupin
Guest
Posts: n/a

 03-28-2010
On Mar 28, 12:34*pm, Steve Howell <(E-Mail Removed)> wrote:
> FYI I later obtained similar results with the more general:
> * * * * * * * * * accum += sublist

Yeah, but you still have to create an object of the correct type for
accum. And for summing small lists, that will actually increase the
runtime. (Worst case, a list of a single item: you have to create the
accum object based on the type of the start value, then do two += into
it, for the start value and the first item in the list, vs just doing
a single + which automatically creates an object.)

Personally, I think the most general approach, if sum were open to
enhancements, would be to automatically apply Alf's suggestion of "+"
on summing the first item to the start value, and "+=" on subsequent
items. Also, you *could* add a parameter to sum(), for example,
default "partialsums=False" that would allow the use of partial sums
in those cases where that might give better behavior (such as floats
and tuples).

Steve Howell
Guest
Posts: n/a

 03-28-2010
On Mar 28, 11:16*am, Patrick Maupin <(E-Mail Removed)> wrote:
> On Mar 28, 12:34*pm, Steve Howell <(E-Mail Removed)> wrote:
>
> > FYI I later obtained similar results with the more general:
> > * * * * * * * * * accum += sublist

>
> Yeah, but you still have to create an object of the correct type for
> accum. *

I think you overlooked the surrounding code.

Here is the code again:

def in_place(
start = [],
sublists = ([[None] * M]) * N
):
# only macro-optimized
i = 0
for sublist in sublists:
if i == 0:
accum = start + sublist
i += 1
else:
accum += sublist
if i == 0:
return 'whatever' # semantics here?
return accum

No need to infer the type.

> And for summing small lists, that will actually increase the
> runtime. *(Worst case, a list of a single item: you have to create the
> accum object based on the type of the start value, then do two += into
> it, for the start value and the first item in the list, vs just doing
> a single + which automatically creates an object.)
>

For small lists, the performance of any sane implementation would
probably be so fast as to be negligible, unless you are in a really
tight loop. If you are in a tight loop, then your use case probably
eventually sums large lists. Even if it doesn't, the slowdown is just
a constant.

For M=5, I get these results on my machine:

M N t1 t2 (t2/t1)
5 1 3.50475311279e-05 3.2901763916e-05 0.938775510204
5 2 2.00271606445e-05 1.59740447998e-05 0.797619047619
5 4 6.79492950439e-05 6.31809234619e-05 0.929824561404
5 8 0.000124931335449 0.000126123428345 1.00954198473
5 64 0.000530958175659 0.00226187705994 4.25999101931
5 1024 0.00262904167175 0.27246594429 103.636981953

t1 = time with add only first element
t2 = time with add all elements

Very small penalty for small lists, very large gain for large lists.

> Personally, I think the most general approach, if sum were open to
> enhancements, would be to automatically apply Alf's suggestion of "+"
> on summing the first item to the start value, and "+=" on subsequent
> items. *

See above. That's the approach I would use as well.

Paul Rubin
Guest
Posts: n/a

 03-28-2010
Steve Howell <(E-Mail Removed)> writes:
> The documentation is pretty clear on the intention that sum() is
> intended for numbers: ...

I've never been big on the idea of duck-typing addition. Who would have
thought that (1,2,3)+(4.5.6) was something other than the vector sum?

I think itertools.chain more directly expresses what the OP was trying
to do, and is preferable for readability, independently of these
performance questions.

Steven D'Aprano
Guest
Posts: n/a

 03-29-2010
On Sun, 28 Mar 2010 13:49:32 -0700, Paul Rubin wrote:

> Steve Howell <(E-Mail Removed)> writes:
>> The documentation is pretty clear on the intention that sum() is
>> intended for numbers: ...

>
> I've never been big on the idea of duck-typing addition. Who would have
> thought that (1,2,3)+(4.5.6) was something other than the vector sum?

But your arguments are tuples, not vectors.

There are languages which do treat arithmetic operations on vectors as
vector operations, as do (e.g.) H-P scientific calculators. That's a fine
design choice, and it works for languages where the emphasis is on
scientific calculations. But Python is a more generalist language, so in
my mind it is more appropriate to treat lists as generic lists, and not
numeric vectors:

[1, 2, 3] + [4, "a"] => [1, 2, 3, 4, "a"]

just as

"123" + "4a" => "1234a"

--
Steven

Steven D'Aprano
Guest
Posts: n/a

 03-29-2010
On Sun, 28 Mar 2010 18:56:26 +0200, Alf P. Steinbach wrote:

> From a more practical point of view, the sum efficiency could be
> improved by doing the first addition using '+' and the rest using '+=',
> without changing the behavior.

But that would change the behaviour. The __iadd__ method is not the same
as the __add__ method, and you can't guarantee that they will behave the
same -- or even similarly.

And what about tuples? And subclasses of list/tuples? How many different
types need to be optimized?

In practical terms, does anyone actually ever use sum on more than a
handful of lists? I don't believe this is more than a hypothetical
problem.

The primary use case for sum is adding numbers when floating point
accuracy is not critical. If you need float accuracy, use math.fsum. If
you need to add more than a handful of small lists, don't use sum: just
calling extend in a loop is probably fast enough, or use itertools.chain.
Trying to turn sum into an all-singing all-dancing function optimised for
an ever-increasing number of objects is a fool's errand. The
implementation of sum in C is already complex enough: 95 lines, excluding
comments, blanks and braces, for something which is a lightweight
function with very simple semantics.

But if anyone wants to submit a patch to the bug tracker, go right ahead.
Without a patch though, I'd say that Python-Dev will consider this a non-
issue.

--
Steven

Alf P. Steinbach
Guest
Posts: n/a

 03-29-2010
* Steven D'Aprano:
> On Sun, 28 Mar 2010 18:56:26 +0200, Alf P. Steinbach wrote:
>
>> From a more practical point of view, the sum efficiency could be
>> improved by doing the first addition using '+' and the rest using '+=',
>> without changing the behavior.

>
> But that would change the behaviour. The __iadd__ method is not the same
> as the __add__ method, and you can't guarantee that they will behave the
> same -- or even similarly.

Hm, I don't think it's documented (except if the reference implementation serves
as documentation) which one is currently used.

> And what about tuples? And subclasses of list/tuples? How many different
> types need to be optimized?

Point. One would need to check for availability of '+='.

> In practical terms, does anyone actually ever use sum on more than a
> handful of lists? I don't believe this is more than a hypothetical
> problem.

Agreed.

Cheers,

- Alf