Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: restriction on sum: intentional bug?

Reply
Thread Tools

Re: restriction on sum: intentional bug?

 
 
Ethan Furman
Guest
Posts: n/a
 
      10-18-2009
Dave Angel wrote:
> Dieter Maurer wrote:
>
>> Christian Heimes <> writes on Fri, 16 Oct 2009
>> 17:58:29 +0200:
>>
>>
>>> Alan G Isaac schrieb:
>>>
>>>
>>>> I expected this to be fixed in Python 3:
>>>>
>>>>
>>>>
>>>>>>> sum(['ab','cd'],'')
>>>>>>>
>>>>
>>>> Traceback (most recent call last):
>>>> File "<stdin>", line 1, in <module>
>>>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>>>
>>>> Of course it is not a good way to join strings,
>>>> but it should work, should it not? Naturally,
>>>>
>>>
>>> It's not a bug. sum() doesn't work on strings deliberately. ''.join()
>>> *is* the right and good way to concatenate strings.
>>>

>>
>> Apparently, "sum" special cases 'str' in order to teach people to use
>> "join".
>> It would have been as much work and much more friendly, to just use
>> "join"
>> internally to implement "sum" when this is possible.
>>
>> Dieter
>>

>
> Earlier, I would have agreed with you. I assumed that this could be
> done invisibly, with the only difference being performance. But you
> can't know whether join will do the trick without error till you know
> that all the items are strings or Unicode strings. And you can't check
> that without going through the entire iterator. At that point it's too
> late to change your mind, as you can't back up an iterator. So the user
> who supplies a list with mixed strings and other stuff will get an
> unexpected error, one that join generates.
>
> To put it simply, I'd say that sum() should not dispatch to join()
> unless it could be sure that no errors might result.
>
> DaveA


How is this different than passing a list to sum with other incompatible
types?

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> class Dummy(object):

.... pass
....
>>> test1 = [1, 2, 3.4, Dummy()]
>>> sum(test1)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
>>> test2 = ['a', 'string', 'and', 'a', Dummy()]
>>> ''.join(test2)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 4: expected string, Dummy found

Looks like a TypeError either way, only the verbage changes.

~Ethan~
 
Reply With Quote
 
 
 
 
Carl Banks
Guest
Posts: n/a
 
      10-19-2009
On Oct 18, 4:07*pm, Ethan Furman <et...@stoneleaf.us> wrote:
> Dave Angel wrote:
> > Earlier, I would have agreed with you. *I assumed that this could be
> > done invisibly, with the only difference being performance. *But you
> > can't know whether join will do the trick without error till you know
> > that all the items are strings or Unicode strings. *And you can't check
> > that without going through the entire iterator. *At that point it's too
> > late to change your mind, as you can't back up an iterator. *So the user
> > who supplies a list with mixed strings and other stuff will get an
> > unexpected error, one that join generates.

>
> > To put it simply, I'd say that sum() should not dispatch to join()
> > unless it could be sure that no errors might result.

>
> How is this different than passing a list to sum with other incompatible
> types?
>
> Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> *>>> class Dummy(object):
> ... * * pass
> ...
> *>>> test1 = [1, 2, 3.4, Dummy()]
> *>>> sum(test1)
> Traceback (most recent call last):
> * *File "<stdin>", line 1, in <module>
> TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
> *>>> test2 = ['a', 'string', 'and', 'a', Dummy()]
> *>>> ''.join(test2)
> Traceback (most recent call last):
> * *File "<stdin>", line 1, in <module>
> TypeError: sequence item 4: expected string, Dummy found
>
> Looks like a TypeError either way, only the verbage changes.



This test doesn't mean very much since you didn't pass the the same
list to both calls. The claim is that "".join() might do something
different than a non-special-cased sum() would have when called on the
same list, and indeed that is true.

Consider this thought experiment:


class Something(object):
def __radd__(self,other):
return other + "q"

x = ["a","b","c",Something()]


If x were passed to "".join(), it would throw an exception; but if
passed to a sum() without any special casing, it would successfully
return "abcq".

Thus there is divergence in the two behaviors, thus transparently
calling "".join() to perform the summation is a Bad Thing Indeed, a
much worse special-case behavior than throwing an exception.


Carl Banks
 
Reply With Quote
 
 
 
 
Ethan Furman
Guest
Posts: n/a
 
      10-19-2009
Carl Banks wrote:
> On Oct 18, 4:07 pm, Ethan Furman <et...@stoneleaf.us> wrote:
>
>>Dave Angel wrote:
>>
>>>Earlier, I would have agreed with you. I assumed that this could be
>>>done invisibly, with the only difference being performance. But you
>>>can't know whether join will do the trick without error till you know
>>>that all the items are strings or Unicode strings. And you can't check
>>>that without going through the entire iterator. At that point it's too
>>>late to change your mind, as you can't back up an iterator. So the user
>>>who supplies a list with mixed strings and other stuff will get an
>>>unexpected error, one that join generates.

>>
>>>To put it simply, I'd say that sum() should not dispatch to join()
>>>unless it could be sure that no errors might result.

>>
>>How is this different than passing a list to sum with other incompatible
>>types?
>>
>>Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
>>(Intel)] on win32
>>Type "help", "copyright", "credits" or "license" for more information.
>> >>> class Dummy(object):

>>... pass
>>...
>> >>> test1 = [1, 2, 3.4, Dummy()]
>> >>> sum(test1)

>>Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>>TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
>> >>> test2 = ['a', 'string', 'and', 'a', Dummy()]
>> >>> ''.join(test2)

>>Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>>TypeError: sequence item 4: expected string, Dummy found
>>
>>Looks like a TypeError either way, only the verbage changes.

>
>
>
> This test doesn't mean very much since you didn't pass the the same
> list to both calls. The claim is that "".join() might do something
> different than a non-special-cased sum() would have when called on the
> same list, and indeed that is true.
>
> Consider this thought experiment:
>
>
> class Something(object):
> def __radd__(self,other):
> return other + "q"
>
> x = ["a","b","c",Something()]
>
>
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".
>
> Thus there is divergence in the two behaviors, thus transparently
> calling "".join() to perform the summation is a Bad Thing Indeed, a
> much worse special-case behavior than throwing an exception.
>
>
> Carl Banks


Unfortunately, I don't know enough about how join works to know that,
but I'll take your word for it. Perhaps the better solution then is to
not worry about optimization, and just call __add__ on the objects.
Then it either works, or throws the appropriate error.

This is obviously slow on strings, but mention of that is already in the
docs, and profiling will also turn up such bottlenecks. Get the code
working first, then optimize, yes? We've all seen questions on this
list with folk using the accumulator method for joining strings, and
then wondering why it's so slow -- the answer given is the same as we
would give for sum()ing a list of strings -- use join instead. Then we
have Python following the same advice we give out -- don't break
duck-typing, any ensuing errors are the responsibility of the caller.

~Ethan~
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-19-2009
On Sun, 18 Oct 2009 19:52:41 -0700, Ethan Furman wrote:

> This is obviously slow on strings, but mention of that is already in the
> docs, and profiling will also turn up such bottlenecks. Get the code
> working first, then optimize, yes?


Well, maybe. Premature optimization and all, but sometimes you just
*know* something is going to be slow, so you avoid it.

And it's amazing how O(N**2) algorithms can hide for years. Within the
last month or two, there was a bug reported for httplib involving
repeated string concatenation:

http://bugs.python.org/issue6838

I can only imagine that the hidden O(N**2) behaviour was there in the
code for years before somebody noticed it, reported it, spent hours
debugging it, and finally discovered the cause and produced a patch.

The amazing thing is, if you look in the httplib.py module, you see this
comment:

# XXX This accumulates chunks by repeated string concatenation,
# which is not efficient as the number or size of chunks gets big.




> We've all seen questions on this
> list with folk using the accumulator method for joining strings, and
> then wondering why it's so slow -- the answer given is the same as we
> would give for sum()ing a list of strings -- use join instead. Then we
> have Python following the same advice we give out -- don't break
> duck-typing, any ensuing errors are the responsibility of the caller.


I'd be happy for sum() to raise a warning rather than an exception, and
to do so for both strings and lists. Python, after all, is happy to let
people shoot themselves in the foot, but it's only fair to give them
warning the gun is loaded



--
Steven
 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      10-19-2009
Carl Banks wrote:
> Consider this thought experiment:
>
> class Something(object):
> def __radd__(self,other):
> return other + "q"
>
> x = ["a","b","c",Something()]
>
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".


Okay...this is the best argument I've heard for not using
"".join() {Awards Carl one (1) internet} It's a peculiar thing
to do as a programmer, but "".join() certainly produces an
unexpected behavior which I'd say is worse. And a lot of this
discussion has revolved around letting programmers do peculiar
things if they want.

So as of Carl's example, I'm now pretty solidly in the "Stop
throwing an exception, just sum the parts even if it's
inefficient" camp and no longer straddling between that and the
"".join() camp. But I'm definitely still not in the "throwing
exceptions is a good thing" camp.

-tkc


 
Reply With Quote
 
Carl Banks
Guest
Posts: n/a
 
      10-19-2009
On Oct 19, 3:24*am, Tim Chase <python.l...@tim.thechases.com> wrote:
> Carl Banks wrote:
> > Consider this thought experiment:

>
> > class Something(object):
> > * * def __radd__(self,other):
> > * * * * return other + "q"

>
> > x = ["a","b","c",Something()]

>
> > If x were passed to "".join(), it would throw an exception; but if
> > passed to a sum() without any special casing, it would successfully
> > return "abcq".

>
> Okay...this is the best argument I've heard for not using
> "".join() *{Awards Carl one (1) internet}


Well that was my argument in the last post you followed up to, I just
used a bad example. Actually this example was described by Dave
Angel, so you should give the internet to him.


Carl Banks
 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      10-20-2009
En Sun, 18 Oct 2009 21:50:55 -0300, Carl Banks <>
escribió:

> Consider this thought experiment:
>
>
> class Something(object):
> def __radd__(self,other):
> return other + "q"
>
> x = ["a","b","c",Something()]
>
>
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".
>
> Thus there is divergence in the two behaviors, thus transparently
> calling "".join() to perform the summation is a Bad Thing Indeed, a
> much worse special-case behavior than throwing an exception.


Just for completeness, and in case anyone would like to try this O(nČ)
process, sum(x) may be rewritten as:

x = ["a","b","c",Something()]
print reduce(operator.add, x)

which does exactly the same thing, with the same quadratic behavior as
sum(), but prints "abcq" as expected.

--
Gabriel Genellina

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: restriction on sum: intentional bug? Steve Python 3 10-27-2009 09:12 PM
Re: restriction on sum: intentional bug? Terry Reedy Python 10 10-18-2009 04:58 AM
Re: restriction on sum: intentional bug? Benjamin Peterson Python 3 10-17-2009 11:42 PM
Re: restriction on sum: intentional bug? Carl Banks Python 2 10-17-2009 10:46 AM
Re: restriction on sum: intentional bug? Tim Chase Python 4 10-17-2009 07:34 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57