Velocity Reviews > Re: Rich Comparisons Gotcha

# Re: Rich Comparisons Gotcha

James Stroud
Guest
Posts: n/a

 12-07-2008
Rasmus Fogh wrote:
> Current behaviour is both inconsistent and counterintuitive, as these
> examples show.
>
>>>> x = float('NaN')
>>>> x == x

> False

Perhaps this should raise an exception? I think the problem is not with
comparisons in general but with the fact that nan is type float:

py> type(float('NaN'))
<type 'float'>

No float can be equal to nan, but nan is a float. How can something be
not a number and a float at the same time? The illogicality of nan's
type creates the possibility for the illogical results of comparisons to
nan including comparing nan to itself.

>>>> ll = [x]
>>>> x in ll

> True
>>>> x == ll[0]

> False

But there is consistency on the basis of identity which is the test for
containment (in):

py> x is x
True
py> x in [x]
True

Identity and equality are two different concepts. Comparing identity to
equality is like comparing apples to oranges ;o)

>
>>>> import numpy
>>>> y = numpy.zeros((3,))
>>>> y

> array([ 0., 0., 0.])
>>>> bool(y==y)

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()

But the equality test is not what fails here. It's the cast to bool that
fails, which for numpy works like a unary ufunc. The designers of numpy
thought that this would be a more desirable behavior. The test for
equality likewise is a binary ufunc and the behavior was chosen in numpy
for practical reasons. I don't know if you can overload the == operator
in C, but if you can, you would be able to achieve the same behavior.

>>>> ll1 = [y,1]
>>>> y in ll1

> True
>>>> ll2 = [1,y]
>>>> y in ll2

> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()

I think you could be safe calling this a bug with numpy. But the fact
that someone can create a bug with a language is not a condemnation of
the language. For example, C makes it real easy to crash a program by
overrunning the limits of an array, but no one would suggest to remove
arrays from C.

> Can anybody see a way this could be fixed (please)? I may well have to
> live with it, but I would really prefer not to.

Your only hope is to somehow convince the language designers to remove
the ability to overload == then get them to agree on what you think the
proper behavior should be for comparisons. I think the probability of
that happening is about zero, though, because such a change would run
counter to the dynamic nature of the language.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com

James Stroud
Guest
Posts: n/a

 12-07-2008
James Stroud wrote:
>[cast to bool] for numpy works like a unary ufunc.

Scratch that. Not thinking and typing at same time.

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com

Steven D'Aprano
Guest
Posts: n/a

 12-07-2008
On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

> Rasmus Fogh wrote:
>> Current behaviour is both inconsistent and counterintuitive, as these
>> examples show.
>>
>>>>> x = float('NaN')
>>>>> x == x

>> False

>
> Perhaps this should raise an exception?

Why on earth would you want checking equality on NaN to raise an
exception??? What benefit does it give?

> I think the problem is not with
> comparisons in general but with the fact that nan is type float:
>
> py> type(float('NaN'))
> <type 'float'>
>
> No float can be equal to nan, but nan is a float. How can something be
> not a number and a float at the same time?

Because floats are not real numbers. They are *almost* numbers, they
often (but not always) behave like numbers, but they're actually not
numbers.

The difference is subtle enough that it is easy to forget that floats are
not numbers, but it's easy enough to find examples proving it:

Some perfectly good numbers don't exist as floats:

>>> 2**-10000 == 0.0

True

Try as you might, you can't get the number 0.1 *exactly* as a float:

>>> 0.1

0.10000000000000001

For any numbers x and y not equal to zero, x+y != x. But that fails for
floats:

>>> 1001.0 + 1e99 == 1e99

True

The above is because of overflow. But even avoiding overflow doesn't
solve the problem. With a little effort, you can also find examples of
"ordinary sized" floats where (x+y)-y != x.

>>> 0.9+0.1-0.9 == 0.1

False

>>>>> import numpy
>>>>> y = numpy.zeros((3,))
>>>>> y

>> array([ 0., 0., 0.])
>>>>> bool(y==y)

>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()

>
> But the equality test is not what fails here. It's the cast to bool that
> fails

And it is right to do so, because it is ambiguous and the library
designers rightly avoided the temptation of guessing what result is
needed.

>>>>> ll1 = [y,1]
>>>>> y in ll1

>> True
>>>>> ll2 = [1,y]
>>>>> y in ll2

>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()

>
> I think you could be safe calling this a bug with numpy.

Only in the sense that there are special cases where the array elements
are all true, or all false, and numpy *could* safely return a bool. But
special cases are not special enough to break the rules. Better for the
numpy caller to write this:

a.all() # or any()

try:
bool(a)
except ValueError:
a.all()

as they would need to do if numpy sometimes returned a bool and sometimes
raised an exception.

--
Steven

James Stroud
Guest
Posts: n/a

 12-08-2008
Steven D'Aprano wrote:
> On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
>
>> Rasmus Fogh wrote:

>>>>>> ll1 = [y,1]
>>>>>> y in ll1
>>> True
>>>>>> ll2 = [1,y]
>>>>>> y in ll2
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> ValueError: The truth value of an array with more than one element is
>>> ambiguous. Use a.any() or a.all()

>> I think you could be safe calling this a bug with numpy.

>
> Only in the sense that there are special cases where the array elements
> are all true, or all false, and numpy *could* safely return a bool. But
> special cases are not special enough to break the rules. Better for the
> numpy caller to write this:
>
> a.all() # or any()
>
>
> try:
> bool(a)
> except ValueError:
> a.all()
>
> as they would need to do if numpy sometimes returned a bool and sometimes
> raised an exception.

I'm missing how a.all() solves the problem Rasmus describes, namely that
the order of a python *list* affects the results of containment tests by
numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
results in his example. It still seems like a bug in numpy to me, even
if too much other stuff is broken if you fix it (in which case it
apparently becomes an "issue").

James

Robert Kern
Guest
Posts: n/a

 12-08-2008
James Stroud wrote:
> Steven D'Aprano wrote:
>> On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:
>>
>>> Rasmus Fogh wrote:

>
>>>>>>> ll1 = [y,1]
>>>>>>> y in ll1
>>>> True
>>>>>>> ll2 = [1,y]
>>>>>>> y in ll2
>>>> Traceback (most recent call last):
>>>> File "<stdin>", line 1, in <module>
>>>> ValueError: The truth value of an array with more than one element is
>>>> ambiguous. Use a.any() or a.all()
>>> I think you could be safe calling this a bug with numpy.

>>
>> Only in the sense that there are special cases where the array
>> elements are all true, or all false, and numpy *could* safely return a
>> bool. But special cases are not special enough to break the rules.
>> Better for the numpy caller to write this:
>>
>> a.all() # or any()
>>
>>
>> try:
>> bool(a)
>> except ValueError:
>> a.all()
>>
>> as they would need to do if numpy sometimes returned a bool and
>> sometimes raised an exception.

>
> I'm missing how a.all() solves the problem Rasmus describes, namely that
> the order of a python *list* affects the results of containment tests by
> numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
> results in his example. It still seems like a bug in numpy to me, even
> if too much other stuff is broken if you fix it (in which case it
> apparently becomes an "issue").

It's an issue, if anything, not a bug. There is no consistent implementation of
bool(some_array) that works in all cases. numpy's predecessor Numeric used to
implement this as returning True if at least one element was non-zero. This
works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not
work well for bool(x==y) (which should be (x==y).all()), but many people got
confused and thought that bool(x==y) worked. When we made numpy, we decided to
explicitly not allow bool(some_array) so that people will not write buggy code
like this again.

The deficiency is in the feature of rich comparisons, not numpy's implementation
of it. __eq__() is allowed to return non-booleans; however, there are some parts
of Python's implementation like list.__contains__() that still expect the return
value of __eq__() to be meaningfully cast to a boolean.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

James Stroud
Guest
Posts: n/a

 12-08-2008
Robert Kern wrote:
> James Stroud wrote:
>> I'm missing how a.all() solves the problem Rasmus describes, namely
>> that the order of a python *list* affects the results of containment
>> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
>> different results in his example. It still seems like a bug in numpy
>> to me, even if too much other stuff is broken if you fix it (in which
>> case it apparently becomes an "issue").

>
> It's an issue, if anything, not a bug. There is no consistent
> implementation of bool(some_array) that works in all cases. numpy's
> predecessor Numeric used to implement this as returning True if at least
> one element was non-zero. This works well for bool(x!=y) (which is
> equivalent to (x!=y).any()) but does not work well for bool(x==y) (which
> should be (x==y).all()), but many people got confused and thought that
> bool(x==y) worked. When we made numpy, we decided to explicitly not
> allow bool(some_array) so that people will not write buggy code like
> this again.
>
> The deficiency is in the feature of rich comparisons, not numpy's
> implementation of it. __eq__() is allowed to return non-booleans;
> however, there are some parts of Python's implementation like
> list.__contains__() that still expect the return value of __eq__() to be
> meaningfully cast to a boolean.
>

You have explained

py> 112 = [1, y]
py> y in 112
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is...

but not

py> ll1 = [y,1]
py> y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is
raised in the former case, which is perfectly reasonable to me.

All I can imagine is that something like the following lives in the
bowels of the python code for list:

def __contains__(self, other):
foundit = False
for i, v in enumerate(self):
if i == 0:
# evaluates to bool numpy array
foundit = one_kind_of_test(v, other)
else:
# raises exception for numpy array
foundit = another_kind_of_test(v, other)
if foundit:
break
return foundit

I'm trying to imagine some other way to get the results mentioned but I
honestly can't. It's beyond me why someone would do such a thing, but
perhaps it's an optimization of some sort.

James

Robert Kern
Guest
Posts: n/a

 12-08-2008
James Stroud wrote:
> Robert Kern wrote:
>> James Stroud wrote:
>>> I'm missing how a.all() solves the problem Rasmus describes, namely
>>> that the order of a python *list* affects the results of containment
>>> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
>>> different results in his example. It still seems like a bug in numpy
>>> to me, even if too much other stuff is broken if you fix it (in which
>>> case it apparently becomes an "issue").

>>
>> It's an issue, if anything, not a bug. There is no consistent
>> implementation of bool(some_array) that works in all cases. numpy's
>> predecessor Numeric used to implement this as returning True if at
>> least one element was non-zero. This works well for bool(x!=y) (which
>> is equivalent to (x!=y).any()) but does not work well for bool(x==y)
>> (which should be (x==y).all()), but many people got confused and
>> thought that bool(x==y) worked. When we made numpy, we decided to
>> explicitly not allow bool(some_array) so that people will not write
>> buggy code like this again.
>>
>> The deficiency is in the feature of rich comparisons, not numpy's
>> implementation of it. __eq__() is allowed to return non-booleans;
>> however, there are some parts of Python's implementation like
>> list.__contains__() that still expect the return value of __eq__() to
>> be meaningfully cast to a boolean.
>>

>
> You have explained
>
> py> 112 = [1, y]
> py> y in 112
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is...
>
> but not
>
> py> ll1 = [y,1]
> py> y in ll1
> True
>
> It's this discrepancy that seems like a bug, not that a ValueError is
> raised in the former case, which is perfectly reasonable to me.

Nothing to do with numpy. list.__contains__() checks for identity with "is"
before it goes to __eq__().

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

James Stroud
Guest
Posts: n/a

 12-08-2008
Robert Kern wrote:
> James Stroud wrote:
>> py> 112 = [1, y]
>> py> y in 112
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is...
>>
>> but not
>>
>> py> ll1 = [y,1]
>> py> y in ll1
>> True
>>
>> It's this discrepancy that seems like a bug, not that a ValueError is
>> raised in the former case, which is perfectly reasonable to me.

>
> Nothing to do with numpy. list.__contains__() checks for identity with
> "is" before it goes to __eq__().

....but only for the first element of the list:

py> import numpy
py> y = numpy.array([1,2,3])
py> y
array([1, 2, 3])
py> y in [1, y]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.ValueError'>: The truth value of an array with more
than one element is ambiguous. Use a.any() or a.all()
py> y is [1, y][1]
True

I think it skips straight to __eq__ if the element is not the first in
the list. That no one acknowledges this makes me feel like a conspiracy
is afoot.

Robert Kern
Guest
Posts: n/a

 12-08-2008
James Stroud wrote:
> Robert Kern wrote:
>> James Stroud wrote:
>>> py> 112 = [1, y]
>>> py> y in 112
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> ValueError: The truth value of an array with more than one element is...
>>>
>>> but not
>>>
>>> py> ll1 = [y,1]
>>> py> y in ll1
>>> True
>>>
>>> It's this discrepancy that seems like a bug, not that a ValueError is
>>> raised in the former case, which is perfectly reasonable to me.

>>
>> Nothing to do with numpy. list.__contains__() checks for identity with
>> "is" before it goes to __eq__().

>
> ...but only for the first element of the list:
>
> py> import numpy
> py> y = numpy.array([1,2,3])
> py> y
> array([1, 2, 3])
> py> y in [1, y]
> ------------------------------------------------------------
> Traceback (most recent call last):
> File "<ipython console>", line 1, in <module>
> <type 'exceptions.ValueError'>: The truth value of an array with more
> than one element is ambiguous. Use a.any() or a.all()
> py> y is [1, y][1]
> True
>
> I think it skips straight to __eq__ if the element is not the first in
> the list.

No, it doesn't skip straight to __eq__(). "y is 1" returns False, so (y==1) is
checked. When y is a numpy array, this returns an array of bools.
list.__contains__() tries to convert this array to a bool and
ndarray.__nonzero__() raises the exception.

list.__contains__() checks "is" then __eq__() for each element before moving on
to the next element. It does not try "is" for all elements, then try __eq__()
for all elements.

> That no one acknowledges this makes me feel like a conspiracy
> is afoot.

I don't know what you think I'm not acknowledging.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

James Stroud
Guest
Posts: n/a

 12-08-2008
Robert Kern wrote:
> James Stroud wrote:
>> I think it skips straight to __eq__ if the element is not the first in
>> the list.

>
> No, it doesn't skip straight to __eq__(). "y is 1" returns False, so
> (y==1) is checked. When y is a numpy array, this returns an array of
> bools. list.__contains__() tries to convert this array to a bool and
> ndarray.__nonzero__() raises the exception.
>
> list.__contains__() checks "is" then __eq__() for each element before
> moving on to the next element. It does not try "is" for all elements,
> then try __eq__() for all elements.

Ok. Thanks for the explanation.

> > That no one acknowledges this makes me feel like a conspiracy
> > is afoot.

>
> I don't know what you think I'm not acknowledging.

Sorry. That was a failed attempt at humor.

James

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post Steven D'Aprano Python 7 01-07-2009 10:21 AM Robert Kern Python 15 01-06-2009 12:55 AM Carlos Ribeiro Python 5 09-22-2004 01:53 AM Steven Bethard Python 3 09-21-2004 07:19 PM