Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   sum works in sequences (Python 3) (http://www.velocityreviews.com/forums/t952416-sum-works-in-sequences-python-3-a.html)

 Franck Ditter 09-19-2012 02:41 PM

sum works in sequences (Python 3)

Hello,
I wonder why sum does not work on the string sequence in Python 3 :

>>> sum((8,5,9,3))

25
>>> sum([5,8,3,9,2])

27
>>> sum('rtarze')

TypeError: unsupported operand type(s) for +: 'int' and 'str'

I naively thought that sum('abc') would expand to 'a'+'b'+'c'
And the error message is somewhat cryptic...

franck

 Neil Cerutti 09-19-2012 02:57 PM

Re: sum works in sequences (Python 3)

On 2012-09-19, Franck Ditter <franck@ditter.org> wrote:
> Hello,
> I wonder why sum does not work on the string sequence in Python 3 :
>
>>>> sum((8,5,9,3))

> 25
>>>> sum([5,8,3,9,2])

> 27
>>>> sum('rtarze')

> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>
> I naively thought that sum('abc') would expand to 'a'+'b'+'c'
> And the error message is somewhat cryptic...

You got that error message because the default value for the
second 'start' argument is 0. The function tried to add 'r' to 0.
That said:

>>> sum('rtarze', '')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

--
Neil Cerutti

 Ian Kelly 09-19-2012 03:03 PM

Re: sum works in sequences (Python 3)

On Wed, Sep 19, 2012 at 8:41 AM, Franck Ditter <franck@ditter.org> wrote:
> Hello,
> I wonder why sum does not work on the string sequence in Python 3 :
>
>>>> sum((8,5,9,3))

> 25
>>>> sum([5,8,3,9,2])

> 27
>>>> sum('rtarze')

> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>
> I naively thought that sum('abc') would expand to 'a'+'b'+'c'
> And the error message is somewhat cryptic...

It notes in the doc string that it does not work on strings:

sum(...)
sum(sequence[, start]) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.

I think this restriction is mainly for efficiency. sum(['a', 'b',
'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' +
'e', which is an inefficient way to add together strings. You should

>>> ''.join('abc')

'abc'

 Neil Cerutti 09-19-2012 03:06 PM

Re: sum works in sequences (Python 3)

On 2012-09-19, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> It notes in the doc string that it does not work on strings:
>
> sum(...)
> sum(sequence[, start]) -> value
>
> Returns the sum of a sequence of numbers (NOT strings) plus
> the value of parameter 'start' (which defaults to 0). When
> the sequence is empty, returns start.
>
> I think this restriction is mainly for efficiency. sum(['a',
> 'b', 'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c'
> + 'd' + 'e', which is an inefficient way to add together
> strings. You should use ''.join instead:

While the docstring is still useful, it has diverged from the
documentation a little bit.

sum(iterable[, start])

Sums start and the items of an iterable from left to right and
returns the total. start defaults to 0. The iterable‘s items
are normally numbers, and the start value is not allowed to be
a string.

For some use cases, there are good alternatives to sum(). The
preferred, fast way to concatenate a sequence of strings is by
calling ''.join(sequence). To add floating point values with
extended precision, see math.fsum(). To concatenate a series of
iterables, consider using itertools.chain().

Are iterables and sequences different enough to warrant posting a
bug report?

--
Neil Cerutti

 Steve Howell 09-19-2012 03:37 PM

Re: sum works in sequences (Python 3)

On Sep 19, 8:06*am, Neil Cerutti <ne...@norwich.edu> wrote:
> On 2012-09-19, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>
> > It notes in the doc string that it does not work on strings:

>
> > sum(...)
> > * * sum(sequence[, start]) -> value

>
> > * * Returns the sum of a sequence of numbers (NOT strings) plus
> > * * the value of parameter 'start' (which defaults to 0). *When
> > * * the sequence is empty, returns start.

>
> > I think this restriction is mainly for efficiency. *sum(['a',
> > 'b', 'c', 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c'
> > + 'd' + 'e', which is an inefficient way to add together
> > strings. *You should use ''.join instead:

>
> While the docstring is still useful, it has diverged from the
> documentation a little bit.
>
> * sum(iterable[, start])
>
> * Sums start and the items of an iterable from left to right and
> * returns the total. start defaults to 0. The iterable‘s items
> * are normally numbers, and the start value is not allowed to be
> * a string.
>
> * For some use cases, there are good alternatives to sum(). The
> * preferred, fast way to concatenate a sequence of strings is by
> * calling ''.join(sequence). To add floating point values with
> * extended precision, see math.fsum(). To concatenate a series of
> * iterables, consider using itertools.chain().
>
> Are iterables and sequences different enough to warrant posting a
> bug report?
>

Sequences are iterables, so I'd say the docs are technically correct,
but maybe I'm misunderstanding what you would be trying to clarify.

 Steven D'Aprano 09-19-2012 04:14 PM

Re: sum works in sequences (Python 3)

On Wed, 19 Sep 2012 09:03:03 -0600, Ian Kelly wrote:

> I think this restriction is mainly for efficiency. sum(['a', 'b', 'c',
> 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' + 'e', which
> is an inefficient way to add together strings.

It might not be obvious to some people why repeated addition is so
inefficient, and in fact if people try it with modern Python (version 2.3
or better), they may not notice any inefficiency.

But the example given, 'a' + 'b' + 'c' + 'd' + 'e', potentially ends up
creating four strings, only to immediately throw away three of them:

* first it concats 'a' to 'b', giving the new string 'ab'
* then 'ab' + 'c', creating a new string 'abc'
* then 'abc' + 'd', creating a new string 'abcd'
* then 'abcd' + 'e', creating a new string 'abcde'

Each new string requires a block of memory to be allocated, potentially
requiring other blocks of memory to be moved out of the way (at least for
large blocks).

With only five characters in total, you won't really notice any slowdown,
but with large enough numbers of strings, Python could potentially spend
a lot of time building, and throwing away, intermediate strings. Pure
wasted effort.

For another look at this, see:
http://www.joelonsoftware.com/articl...000000319.html

I say "could" because starting in about Python 2.3, there is a nifty
optimization in Python (CPython only, not Jython or IronPython) that can
*sometimes* recognise repeated string concatenation and make it less
inefficient. It depends on the details of the specific strings used, and
the operating system's memory management. When it works, it can make
string concatenation almost as efficient as ''.join(). When it doesn't
work, repeated concatenation is PAINFULLY slow, hundreds or thousands of
times slower than join.

--
Steven

 Steven D'Aprano 09-19-2012 04:18 PM

Re: sum works in sequences (Python 3)

On Wed, 19 Sep 2012 15:07:04 +0000, Alister wrote:

> Summation is a mathematical function that works on numbers Concatenation
> is the process of appending 1 string to another
>
> although they are not related to each other they do share the same
> operator(+) which is the cause of confusion. attempting to duck type
> this function would cause ambiguity for example what would you expect
> from
>
> sum ('a','b',3,4)
>
> 'ab34' or 'ab7' ?

Neither. I would expect sum to do exactly what the + operator does if
given two incompatible arguments: raise an exception.

And in fact, that's exactly what it does.

py> sum ([1, 2, 'a'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

--
Steven

 Ian Kelly 09-19-2012 06:33 PM

Re: sum works in sequences (Python 3)

On Wed, Sep 19, 2012 at 9:37 AM, Steve Howell <showell30@yahoo.com> wrote:
> Sequences are iterables, so I'd say the docs are technically correct,
> but maybe I'm misunderstanding what you would be trying to clarify.

The doc string suggests that the argument to sum() must be a sequence,
when in fact any iterable will do. The restriction in the docs should
be relaxed to match the reality.

 Steve Howell 09-19-2012 06:43 PM

Re: sum works in sequences (Python 3)

On Sep 19, 11:34*am, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Wed, Sep 19, 2012 at 9:37 AM, Steve Howell <showel...@yahoo.com> wrote:
> > Sequences are iterables, so I'd say the docs are technically correct,
> > but maybe I'm misunderstanding what you would be trying to clarify.

>
> The doc string suggests that the argument to sum() must be a sequence,
> when in fact any iterable will do. *The restriction in the docs should
> be relaxed to match the reality.

Ah. The docstring looks to be fixed in 3.1.3, but not in Python 2.

Python 3.1.3 (r313:86834, Mar 13 2011, 00:40:38)
[GCC 4.4.5] on linux2
>>> sum.__doc__

"sum(iterable[, start]) -> value\n\nReturns the sum of an iterable of
numbers (NOT strings) plus the value\nof parameter 'start' (which
defaults to 0). When the iterable is\nempty, returns start."

Python 2.6.6 (r266:84292, Mar 13 2011, 00:35:19)
[GCC 4.4.5] on linux2
>>> sum.__doc__

"sum(sequence[, start]) -> value\n\nReturns the sum of a sequence of
numbers (NOT strings) plus the value\nof parameter 'start' (which
defaults to 0). When the sequence is\nempty, returns start."
>>>

 Terry Reedy 09-19-2012 06:49 PM

Re: sum works in sequences (Python 3)

On 9/19/2012 11:07 AM, Alister wrote:

> Summation is a mathematical function that works on numbers
> Concatenation is the process of appending 1 string to another
>
> although they are not related to each other they do share the same
> operator(+) which is the cause of confusion.

If one represents counts in unary, as a sequence or tally of 1s (or
other markers indicating 'successor' or 'increment'), then count
addition is sequence concatenation. I think Guido got it right.

It happens that when the members of all sequences are identical, there
is a much more compact exponential place value notation that enables
more efficient addition and other operations. When not, other tricks are
needed to avoid so much copying that an inherently O(N) operation
balloons into an O(N*N) operation.

--
Terry Jan Reedy

All times are GMT. The time now is 11:29 PM.