Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: restriction on sum: intentional bug? (http://www.velocityreviews.com/forums/t702071-re-restriction-on-sum-intentional-bug.html)

Ethan Furman 10-18-2009 11:07 PM

Re: restriction on sum: intentional bug?
 
Dave Angel wrote:
> Dieter Maurer wrote:
>
>> Christian Heimes <lists@cheimes.de> writes on Fri, 16 Oct 2009
>> 17:58:29 +0200:
>>
>>
>>> Alan G Isaac schrieb:
>>>
>>>
>>>> I expected this to be fixed in Python 3:
>>>>
>>>>
>>>>
>>>>>>> sum(['ab','cd'],'')
>>>>>>>
>>>>
>>>> Traceback (most recent call last):
>>>> File "<stdin>", line 1, in <module>
>>>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>>>
>>>> Of course it is not a good way to join strings,
>>>> but it should work, should it not? Naturally,
>>>>
>>>
>>> It's not a bug. sum() doesn't work on strings deliberately. ''.join()
>>> *is* the right and good way to concatenate strings.
>>>

>>
>> Apparently, "sum" special cases 'str' in order to teach people to use
>> "join".
>> It would have been as much work and much more friendly, to just use
>> "join"
>> internally to implement "sum" when this is possible.
>>
>> Dieter
>>

>
> Earlier, I would have agreed with you. I assumed that this could be
> done invisibly, with the only difference being performance. But you
> can't know whether join will do the trick without error till you know
> that all the items are strings or Unicode strings. And you can't check
> that without going through the entire iterator. At that point it's too
> late to change your mind, as you can't back up an iterator. So the user
> who supplies a list with mixed strings and other stuff will get an
> unexpected error, one that join generates.
>
> To put it simply, I'd say that sum() should not dispatch to join()
> unless it could be sure that no errors might result.
>
> DaveA


How is this different than passing a list to sum with other incompatible
types?

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> class Dummy(object):

.... pass
....
>>> test1 = [1, 2, 3.4, Dummy()]
>>> sum(test1)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
>>> test2 = ['a', 'string', 'and', 'a', Dummy()]
>>> ''.join(test2)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 4: expected string, Dummy found

Looks like a TypeError either way, only the verbage changes.

~Ethan~

Carl Banks 10-19-2009 12:50 AM

Re: restriction on sum: intentional bug?
 
On Oct 18, 4:07*pm, Ethan Furman <et...@stoneleaf.us> wrote:
> Dave Angel wrote:
> > Earlier, I would have agreed with you. *I assumed that this could be
> > done invisibly, with the only difference being performance. *But you
> > can't know whether join will do the trick without error till you know
> > that all the items are strings or Unicode strings. *And you can't check
> > that without going through the entire iterator. *At that point it's too
> > late to change your mind, as you can't back up an iterator. *So the user
> > who supplies a list with mixed strings and other stuff will get an
> > unexpected error, one that join generates.

>
> > To put it simply, I'd say that sum() should not dispatch to join()
> > unless it could be sure that no errors might result.

>
> How is this different than passing a list to sum with other incompatible
> types?
>
> Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> *>>> class Dummy(object):
> ... * * pass
> ...
> *>>> test1 = [1, 2, 3.4, Dummy()]
> *>>> sum(test1)
> Traceback (most recent call last):
> * *File "<stdin>", line 1, in <module>
> TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
> *>>> test2 = ['a', 'string', 'and', 'a', Dummy()]
> *>>> ''.join(test2)
> Traceback (most recent call last):
> * *File "<stdin>", line 1, in <module>
> TypeError: sequence item 4: expected string, Dummy found
>
> Looks like a TypeError either way, only the verbage changes.



This test doesn't mean very much since you didn't pass the the same
list to both calls. The claim is that "".join() might do something
different than a non-special-cased sum() would have when called on the
same list, and indeed that is true.

Consider this thought experiment:


class Something(object):
def __radd__(self,other):
return other + "q"

x = ["a","b","c",Something()]


If x were passed to "".join(), it would throw an exception; but if
passed to a sum() without any special casing, it would successfully
return "abcq".

Thus there is divergence in the two behaviors, thus transparently
calling "".join() to perform the summation is a Bad Thing Indeed, a
much worse special-case behavior than throwing an exception.


Carl Banks

Ethan Furman 10-19-2009 02:52 AM

Re: restriction on sum: intentional bug?
 
Carl Banks wrote:
> On Oct 18, 4:07 pm, Ethan Furman <et...@stoneleaf.us> wrote:
>
>>Dave Angel wrote:
>>
>>>Earlier, I would have agreed with you. I assumed that this could be
>>>done invisibly, with the only difference being performance. But you
>>>can't know whether join will do the trick without error till you know
>>>that all the items are strings or Unicode strings. And you can't check
>>>that without going through the entire iterator. At that point it's too
>>>late to change your mind, as you can't back up an iterator. So the user
>>>who supplies a list with mixed strings and other stuff will get an
>>>unexpected error, one that join generates.

>>
>>>To put it simply, I'd say that sum() should not dispatch to join()
>>>unless it could be sure that no errors might result.

>>
>>How is this different than passing a list to sum with other incompatible
>>types?
>>
>>Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
>>(Intel)] on win32
>>Type "help", "copyright", "credits" or "license" for more information.
>> >>> class Dummy(object):

>>... pass
>>...
>> >>> test1 = [1, 2, 3.4, Dummy()]
>> >>> sum(test1)

>>Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>>TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
>> >>> test2 = ['a', 'string', 'and', 'a', Dummy()]
>> >>> ''.join(test2)

>>Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>>TypeError: sequence item 4: expected string, Dummy found
>>
>>Looks like a TypeError either way, only the verbage changes.

>
>
>
> This test doesn't mean very much since you didn't pass the the same
> list to both calls. The claim is that "".join() might do something
> different than a non-special-cased sum() would have when called on the
> same list, and indeed that is true.
>
> Consider this thought experiment:
>
>
> class Something(object):
> def __radd__(self,other):
> return other + "q"
>
> x = ["a","b","c",Something()]
>
>
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".
>
> Thus there is divergence in the two behaviors, thus transparently
> calling "".join() to perform the summation is a Bad Thing Indeed, a
> much worse special-case behavior than throwing an exception.
>
>
> Carl Banks


Unfortunately, I don't know enough about how join works to know that,
but I'll take your word for it. Perhaps the better solution then is to
not worry about optimization, and just call __add__ on the objects.
Then it either works, or throws the appropriate error.

This is obviously slow on strings, but mention of that is already in the
docs, and profiling will also turn up such bottlenecks. Get the code
working first, then optimize, yes? We've all seen questions on this
list with folk using the accumulator method for joining strings, and
then wondering why it's so slow -- the answer given is the same as we
would give for sum()ing a list of strings -- use join instead. Then we
have Python following the same advice we give out -- don't break
duck-typing, any ensuing errors are the responsibility of the caller.

~Ethan~

Steven D'Aprano 10-19-2009 03:15 AM

Re: restriction on sum: intentional bug?
 
On Sun, 18 Oct 2009 19:52:41 -0700, Ethan Furman wrote:

> This is obviously slow on strings, but mention of that is already in the
> docs, and profiling will also turn up such bottlenecks. Get the code
> working first, then optimize, yes?


Well, maybe. Premature optimization and all, but sometimes you just
*know* something is going to be slow, so you avoid it.

And it's amazing how O(N**2) algorithms can hide for years. Within the
last month or two, there was a bug reported for httplib involving
repeated string concatenation:

http://bugs.python.org/issue6838

I can only imagine that the hidden O(N**2) behaviour was there in the
code for years before somebody noticed it, reported it, spent hours
debugging it, and finally discovered the cause and produced a patch.

The amazing thing is, if you look in the httplib.py module, you see this
comment:

# XXX This accumulates chunks by repeated string concatenation,
# which is not efficient as the number or size of chunks gets big.




> We've all seen questions on this
> list with folk using the accumulator method for joining strings, and
> then wondering why it's so slow -- the answer given is the same as we
> would give for sum()ing a list of strings -- use join instead. Then we
> have Python following the same advice we give out -- don't break
> duck-typing, any ensuing errors are the responsibility of the caller.


I'd be happy for sum() to raise a warning rather than an exception, and
to do so for both strings and lists. Python, after all, is happy to let
people shoot themselves in the foot, but it's only fair to give them
warning the gun is loaded :)



--
Steven

Tim Chase 10-19-2009 10:24 AM

Re: restriction on sum: intentional bug?
 
Carl Banks wrote:
> Consider this thought experiment:
>
> class Something(object):
> def __radd__(self,other):
> return other + "q"
>
> x = ["a","b","c",Something()]
>
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".


Okay...this is the best argument I've heard for not using
"".join() {Awards Carl one (1) internet} It's a peculiar thing
to do as a programmer, but "".join() certainly produces an
unexpected behavior which I'd say is worse. And a lot of this
discussion has revolved around letting programmers do peculiar
things if they want.

So as of Carl's example, I'm now pretty solidly in the "Stop
throwing an exception, just sum the parts even if it's
inefficient" camp and no longer straddling between that and the
"".join() camp. But I'm definitely still not in the "throwing
exceptions is a good thing" camp.

-tkc



Carl Banks 10-19-2009 11:18 AM

Re: restriction on sum: intentional bug?
 
On Oct 19, 3:24*am, Tim Chase <python.l...@tim.thechases.com> wrote:
> Carl Banks wrote:
> > Consider this thought experiment:

>
> > class Something(object):
> > * * def __radd__(self,other):
> > * * * * return other + "q"

>
> > x = ["a","b","c",Something()]

>
> > If x were passed to "".join(), it would throw an exception; but if
> > passed to a sum() without any special casing, it would successfully
> > return "abcq".

>
> Okay...this is the best argument I've heard for not using
> "".join() *{Awards Carl one (1) internet}


Well that was my argument in the last post you followed up to, I just
used a bad example. Actually this example was described by Dave
Angel, so you should give the internet to him.


Carl Banks

Gabriel Genellina 10-20-2009 02:42 AM

Re: restriction on sum: intentional bug?
 
En Sun, 18 Oct 2009 21:50:55 -0300, Carl Banks <pavlovevidence@gmail.com>
escribió:

> Consider this thought experiment:
>
>
> class Something(object):
> def __radd__(self,other):
> return other + "q"
>
> x = ["a","b","c",Something()]
>
>
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".
>
> Thus there is divergence in the two behaviors, thus transparently
> calling "".join() to perform the summation is a Bad Thing Indeed, a
> much worse special-case behavior than throwing an exception.


Just for completeness, and in case anyone would like to try this O(n²)
process, sum(x) may be rewritten as:

x = ["a","b","c",Something()]
print reduce(operator.add, x)

which does exactly the same thing, with the same quadratic behavior as
sum(), but prints "abcq" as expected.

--
Gabriel Genellina



All times are GMT. The time now is 10:03 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.